BitMagic C++ Library: APIs

Oct, 2019 (last updated for BitMagic v.5.2.0)

BitMagic C++ is a header only library

Header Description Link(s)
bm.h / bm64.h Root of BitMagic APIs, implements bm::bvector<>. Iterators, methods and functions implementing container and main set algebraic operations. Very often, this is just enough to start working with bit sets. Main building block for inverted indexes, matrices of relationships, bloom filters, high dimentional binary classifications. Even when compiled on 64-bit system BitMagic by default uses 32-bit element address space (unsigned int). "bm64.h" allows you to turn on 64-bit mode (you have Big Data problem!). bm.h
bmserial.h Classes and methods to transform bvector<> into a compact BLOB, ready for disk I/O, network transfer or database storage. Utilities to restore/deserialize bit-vector. Methods to perform logical operations between bit-vector and its compact, serialized BLOB. bmserial.h
bmalgo.h bmalgo_impl.h Algorithms with bit vectors to compute similarity/distance metrics and components of complex metrics efficiently, without producing intermediate temporary bit vectors. Logical operations to efficiently combine bit vectors and C arrays (or STL iterators). Template algorithms to visit bvector<> set elements. bmalgo.h bmalgo_impl.h
bmaggregator.h Fast logical operations on groups of bit-vectors. OR, AND, SUB and SHIFT-AND aggregators. Works faster than separate logical operations as it uses cache line optimizations and SIMD. bmaggregator.h
bmintervals.h Algorithms for intervals and ranges in bit-vector. Bit-vectors can be interpreted as sets of contiguous 1s (flanked by 0s like '011110'), this header offers tools for that. bmintervals.h
bmrandom.h bm::random_subset<> - template utility to do random sampling of bit-vectors. bmrandom.h
bmsparsevec.h bm::sparse_vector<> - container for bit-transposed (integer) arrays. Each bit plain represented as compressed bit-vector, implements a form of in-memory compression for scalar integer types. Sparse vector supports NULL (unassigned) values semantics. Can be used for memory efficient time series vectors, events, histograms, pre-calculated relationship operations, elements of matrix storage, graph connect lists, bloom filters, b-bit indexing. bmsparsevec.h
bmstrsparsevec.h bm::str_sparse_vector<> - container for bit-transposed array of strings. This container is both memory compact and allows efficient search via BitMagic sparse vector scanner. bmstrsparsevec.h
bmsparsevec_compr.h bm::rsc_sparse_vector<> - rank-select succinct container for bit-transposed (integer) arrays. Just like bm::sparse_vector<> - each bit plain represented as compressed bit-vector, implements a form of in-memory compression for scalar integer types. Sparse vector supports NULL (unassigned) values semantics. Ranl-select succinct vector use this (not) NULL vector to drop missing values (bit columns in bit-transposed format) to improve memory efficiency. bmsparsevec_compr.h
bmsparsevec_algo.h Algorithms based on bit-transposed sparse vector properties. Dynamic range clipping, set-algebra on plains, efficient searches in unsorted vectors. bmsparsevec.h
bmsparsevec_serial.h Serialization and de-serialization of sparse vector into compact and portable BLOB. bmsparsevec_serial.h

API examples and technical notes

Bit-vector

Bit-vector container, iterators, serialization and algorithms.

Algebra of Sets

Name Description Links Examples
Algebra of Sets Algorithms implementing various operations of Algebra of Sets. Tutorial bvsetalgebra.cpp

bm::bvector<>

Name Description Links Examples
bm::bvector<> container Main container for inverted lists representation. Depending on settings it can store lists of integers or bit-vectors. More sample1.cpp sample2.cpp sample3.cpp
bm::bvector<>::reference Accessor to use regular C++/STL operator conventions to assign container content More sample1.cpp sample9.cpp
bm::combine_*() operations Set Alebra algorithms for interoprability between arrays and bit vectors More sample7.cpp
bm::bvector<>::enumerator Constant iterator to traverse inverted list without explicit conversion into unpacked list of integers More sample5.cpp sample8.cpp
bm::bvector<>::counted_enumerator Constant iterator to traverse inverted list without explicit conversion into unpacked list of integers. It offers extra functinality of keeping a running sum of all ON bits it traversed from position 0 to current. More sample11.cpp
bm::bvector<>::insert_iterator insert iterator to feed inverted list into bit-vector More sample8.cpp
Serialization Utilities and algorithms to compress and serialize bit-vector for database storage or network transfer More sample4.cpp
Allocator API to re-define standard bvector<> alocator type to create special pool allocation strategy, thread local mamory pools, collect bit-vector related memory activity statistics, etc. More sample6.cpp
Bit counting Various methods and techniques to do bit counting for the whole bit-vector, ranges, via accelerated structures or iterations. Performance benchmarking for different methods. Details sample11.cpp
Setting bits Various methods and techniques to set bits in bit-vectors, ranges of bits, bits flipping/swapping, load bits from a set of ints, extraction of bits. Performance benchmarking for different methods. sample12.cpp
Logical operations on serialized bit-vectors Example on how to perform logical operations on bit-vecor BLOBs without necessarily materializing them into a random-access bit-vector. sample14.cpp
Finding first and last elements of bit-vectors Example on how to determine bit-vector start and end bits (dynamic range). sample15.cpp
bm::aggregator<> - algorithms for fast AND, OR aggregation of groups of vectors. Example on how to initialize, setup and use bm::aggregator<> Tech.notes sample16.cpp
bm::bvector<> - methods for rank and select Example on how to build and use rank-select index Tech.notes sample17.cpp
bm::bvector<>::bulk_insert_iterator Example on how to set bits fast using STL compatible bulk insert iterator. sample18.cpp
bm::bvector<>::merge() Merge operation is an equivalent of logical OR, but faster because it moves memory blocks from the agrument vector. sample19.cpp
bm::bvector<>::shift_right(), bm::bvector<>::insert() Bit-shift right and insert operations. sample20.cpp
bm::bvector<>::shift_left(), bm::bvector<>::erase() Bit-shift left and erase operations. sample21.cpp
bm::bvector<>::is_all_one_range(..), bm::bvector<>::any_range(..), bm::bvector<>::keep_range(..), bm::bvector<>::clear_range(..), bm::bvector<>::set_range(..), bm::is_interval(..), bm::find_interval_end(..), bm::find_interval_start(..), bm::deserialize_range(..) RLE/dGAP compressed bit-vectors can be used for efficient operations on ranges and intervals. This example illustartes algorithms for bit-intervals. sample22.cpp
bm::interval_enumerator<> Forward iterator-like class for interpreting of bit-vector as a sequence of 1-set intervals ('01110'). sample23.cpp
bm::rank_range_split(..) Split bit-vector into inetrvals of equal rank. Use case: parallel algorithms guided by bit-vector. New algorithm helps to break bit-vector into ranges (or sub-vectors) with equal population count. Each sub-range becomes a processing job then. sample24.cpp

64-bit bvector

BitMagic supports 64-bit address space for bit-vectors and succinct containers. Current limitation is that BitMagic supports "only" 48-bit address space.

Name Description Links Examples
bm::bvector<> container How to turn on 64-bit address mode. bvsample01_64.cpp

Algorithms for bm::bvector<>

Name Description Links Examples
Distances Algorithms optimized for computing binary distances for clusterization of multi-dimentional binary spaces. More sample9.cpp
Random sub-set Algorithm binary subset randomization. More sample10.cpp
Miscelleneous Algorithms for combining(OR, XOR, SUB, AND) bvector<> with STL-compatible iterators. Count number of intervals in bit-vector, etc. More

Bit-transposed sparse vector bm::sparse_vector<>

Name Description Links Examples
bm::sparse_vector<> container Store array of integers using compact bit-transposed implementation and the arsenal of compression methods provided by bm::bvector<> More svsample01.cpp svsample02.cpp svsample03.cpp
Serialization Algorithms to compress and serialize bm::sparse_vector<> for storage or network transfer More svsample02.cpp
Algorithms. Dynamic range clipping. Algorithms on bit-transposed sparse vector. Dynamic range clipping. More
sparse vector with NULL support How to configure and use sparse vector with unassigned (NULL) elements svsample04.cpp
Algorithms. Set to set image function. Theory of groups image function using sparse vector as an association container defining set-to-set translation. Tech.notes svsample05.cpp
Algorithms. Search in unordered set. Search in sparse vector, using scanner class. How to bm::sparse_vector<> iterators. Benchmarks. Tech.notes svsample06.cpp
Algorithms. Lower Bound search in sorted container. Search in sparse vector, with scanner class and lower_bound(). This example shows how to implement insertion sort without unpacking succinct vector. svsample07.cpp
Selective deserialization How to extract range or list of elements from a compressed sparse vector. Tech.notes svsample08.cpp
Mismatch search Fast mismatch search in sparse vectors. Tech.notes svsample09.cpp

Rank-Select succinct bit-transposed sparse vector. bm::rsc_sparse_vector<>

Name Description Links Examples
bm::rsc_sparse_vector<> container Introduction to rsc_sparse_vector<>. Example shows how to load data, serialize and deserialize it. rscsample01.cpp
Selective deserialization How to extract range or list of elements from a compressed sparse vector. Tech.notes rscsample02.cpp
bm::rsc_sparse_vector<>::const_iterator How to use const_iterator to access compressed container. rscsample03.cpp
bm::rsc_sparse_vector<> known elements access Rank-Select container can be used for terms frequency counting (TF-IDF) in compact memory. This example illustartes how to construct a container with known elements and set or increment elements using Rank-Select index which is a lot faster than the default mode. Tech.notes rscsample04.cpp

String bit-transposed sparse vector. bm::str_sparse_vector<>

Name Description Links Examples
bm::str_sparse_vector<> container Introduction to str_sparse_vector<>. Example shows how to add values, optimize (memory compression), iterate elements. strsvsample01.cpp
bm::str_sparse_vector<> container, bm::sparse_vector_scanner<> insertion sort on bit-transposed collection of strings using BitMagic sparse_vector_scanner<> strsvsample02.cpp
str_sparse_vector<>::back_insert_iterator str_sparse_vector<>::back_insert_iterator is used to load the data, then sparse vector gets remaped, compressed and saved to disk using serialization API. This techniques help to create index good for fast search. strsvsample03.cpp
Vector with NULL (unassigned) values Operations with NULL(unassigned) values in bit-transposed sparse vectors. strsvsample04.cpp
Selective deserialization How to extract range or list of elements from a compressed sparse vector. Tech.notes strsvsample05.cpp

Advanced Examples (mixed techniques)

Name Description Links Examples
Memory consumption / sparse sets Comparison of different methods of handling of sparse sets. Memory and performance benchmarks. Described techniques can be used for in-memory acceleration of ER JOIN operations. Tech.notes xsample01.cpp
Counting Sort, construction of histograms Example, shows a how to make fast and memory efficient integer counting sort / histogram construction using BitMagic sparse_vector<>. Speed benchmark of various techniques based on QuickSort, R-B tree (stl::map<>), parallel version of counting sort. Tech.notes xsample02.cpp
SNP search with succinct sparse vector Example, shows a how to build and search memory compact vector of mutations (SNPs), using real data from NCBI. Memory and performance benchmark comparisons. Tech.notes xsample03.cpp
Substring search with fingerprints and SHIFT-AND Example, shows a how to build DNA fingerprints and use different algorithms to search for a k-mer subsequence. Benchmarking comparisons with more straight-forward approaches. Tech.notes xsample04.cpp
Fast bit-vector index construction Example, shows a how to build DNA fingerprints. This is related to xsample04.cpp but focuses of fast index construction, using SIMD and multi-threading. Tech.notes xsample04a.cpp
Memory efficient dictionary Example, shows a how to use bm::str_sparse_vector<>. Sample uses data from NASA NED extragalactic database to build an efficient dictionary using bit-transposed sparse matrix and characted remappings. Sample illustrates binary search and scan search techniques. Tech.notes xsample05.cpp
DNA compression and mismatch search Example, shows a how to use bm::str_sparse_vector<>. Sample uses DNA code tyo show 2-bit per char compression and shows how to construct comparison function without decompression. Benchmark comparisons with STL and C strcmp(). Tech.notes xsample06.cpp
Genomic Viewer layout calculation Example, shows a how to use bit-interval algorithms and succinct data structures to create a toy genomics viewer (ASCII graphics is used for illustration). Tech.notes xsample08.cpp