BitMagic C++ Library: APIs
Oct, 2019 (last updated for BitMagic v.5.2.0)
BitMagic C++ is a header only library
Header | Description | Link(s) |
---|---|---|
bm.h / bm64.h |
Root of BitMagic APIs, implements bm::bvector<> .
Iterators, methods and functions implementing container and main set algebraic operations.
Very often, this is just enough to start working with bit sets. Main building block for
inverted indexes, matrices of relationships, bloom filters, high dimentional
binary classifications. Even when compiled on 64-bit system BitMagic by
default uses 32-bit element address space (unsigned int). "bm64.h" allows you to
turn on 64-bit mode (you have Big Data problem!).
|
bm.h |
bmserial.h | Classes and methods to transform bvector<> into a compact BLOB, ready for disk I/O, network transfer or database storage. Utilities to restore/deserialize bit-vector. Methods to perform logical operations between bit-vector and its compact, serialized BLOB. | bmserial.h |
bmalgo.h bmalgo_impl.h | Algorithms with bit vectors to compute similarity/distance metrics and components of complex metrics efficiently, without producing intermediate temporary bit vectors. Logical operations to efficiently combine bit vectors and C arrays (or STL iterators). Template algorithms to visit bvector<> set elements. | bmalgo.h bmalgo_impl.h |
bmaggregator.h | Fast logical operations on groups of bit-vectors. OR, AND, SUB and SHIFT-AND aggregators. Works faster than separate logical operations as it uses cache line optimizations and SIMD. | bmaggregator.h |
bmintervals.h |
Algorithms for intervals and ranges in bit-vector.
Bit-vectors can be interpreted as sets of contiguous 1s
(flanked by 0s like '011110' ), this header offers tools for that.
|
bmintervals.h |
bmrandom.h |
bm::random_subset<> - template utility to do random sampling of bit-vectors.
|
bmrandom.h |
bm3vl.h | Collection of utility functions for 3-valued logic (Kleene). | bm3vl.h |
bmsparsevec.h |
bm::sparse_vector<> - container for bit-transposed (integer) arrays.
Each bit plain represented as compressed bit-vector, implements a form of in-memory
compression for scalar integer types. Sparse vector supports NULL (unassigned) values
semantics. Can be used for memory efficient time series vectors, events, histograms,
pre-calculated relationship operations, elements of matrix storage, graph connect lists,
bloom filters, b-bit indexing.
|
bmsparsevec.h |
bmstrsparsevec.h |
bm::str_sparse_vector<> - container for bit-transposed array of strings.
This container is both memory compact and allows efficient search via BitMagic
sparse vector scanner.
|
bmstrsparsevec.h |
bmsparsevec_compr.h |
bm::rsc_sparse_vector<> - rank-select succinct container for
bit-transposed (integer) arrays.
Just like bm::sparse_vector<> - each bit plain represented as compressed bit-vector, implements a form of in-memory
compression for scalar integer types. Sparse vector supports NULL (unassigned) values
semantics. Ranl-select succinct vector use this (not) NULL vector to drop missing values
(bit columns in bit-transposed format) to improve memory efficiency.
|
bmsparsevec_compr.h |
bmsparsevec_algo.h | Algorithms based on bit-transposed sparse vector properties. Dynamic range clipping, set-algebra on plains, efficient searches in unsorted vectors. | bmsparsevec.h |
bmsparsevec_serial.h | Serialization and de-serialization of sparse vector into compact and portable BLOB. | bmsparsevec_serial.h |
API examples and technical notes
Bit-vector
Bit-vector container, iterators, serialization and algorithms.
Algebra of Sets
Name | Description | Links | Examples |
---|---|---|---|
Algebra of Sets | Algorithms implementing various operations of Algebra of Sets. | Tutorial | bvsetalgebra.cpp |
Three-valued logic (Kleene) | Algorithms of 3-valued logic on bit-vectors. | bv3vlogic.cpp |
bm::bvector<>
Name | Description | Links | Examples |
---|---|---|---|
bm::bvector<> container | Main container for inverted lists representation. Depending on settings it can store lists of integers or bit-vectors. | More | sample1.cpp sample2.cpp sample3.cpp |
bm::bvector<>::reference | Accessor to use regular C++/STL operator conventions to assign container content | More | sample1.cpp sample9.cpp |
bm::combine_*() operations | Set Alebra algorithms for interoprability between arrays and bit vectors | More | sample7.cpp |
bm::bvector<>::enumerator | Constant iterator to traverse inverted list without explicit conversion into unpacked list of integers | More | sample5.cpp sample8.cpp |
bm::bvector<>::counted_enumerator | Constant iterator to traverse inverted list without explicit conversion into unpacked list of integers. It offers extra functinality of keeping a running sum of all ON bits it traversed from position 0 to current. | More | sample11.cpp |
bm::bvector<>::insert_iterator | insert iterator to feed inverted list into bit-vector | More | sample8.cpp |
Serialization | Utilities and algorithms to compress and serialize bit-vector for database storage or network transfer | More | sample4.cpp |
Allocator | API to re-define standard bvector<> alocator type to create special pool allocation strategy, thread local mamory pools, collect bit-vector related memory activity statistics, etc. | More | sample6.cpp |
Bit counting | Various methods and techniques to do bit counting for the whole bit-vector, ranges, via accelerated structures or iterations. Performance benchmarking for different methods. | Details | sample11.cpp |
Setting bits | Various methods and techniques to set bits in bit-vectors, ranges of bits, bits flipping/swapping, load bits from a set of ints, extraction of bits. Performance benchmarking for different methods. | sample12.cpp | |
Logical operations on serialized bit-vectors | Example on how to perform logical operations on bit-vecor BLOBs without necessarily materializing them into a random-access bit-vector. | sample14.cpp | |
Finding first and last elements of bit-vectors | Example on how to determine bit-vector start and end bits (dynamic range). | sample15.cpp | |
bm::aggregator<> - algorithms for fast AND, OR aggregation of groups of vectors. | Example on how to initialize, setup and use bm::aggregator<> | Tech.notes | sample16.cpp |
bm::bvector<> - methods for rank and select | Example on how to build and use rank-select index | Tech.notes | sample17.cpp |
bm::bvector<>::bulk_insert_iterator | Example on how to set bits fast using STL compatible bulk insert iterator. | sample18.cpp | |
bm::bvector<>::merge() | Merge operation is an equivalent of logical OR, but faster because it moves memory blocks from the agrument vector. | sample19.cpp | |
bm::bvector<>::shift_right(), bm::bvector<>::insert() | Bit-shift right and insert operations. | sample20.cpp | |
bm::bvector<>::shift_left(), bm::bvector<>::erase() | Bit-shift left and erase operations. | sample21.cpp | |
bm::bvector<>::is_all_one_range(..), bm::bvector<>::any_range(..), bm::bvector<>::keep_range(..), bm::bvector<>::clear_range(..), bm::bvector<>::set_range(..), bm::is_interval(..), bm::find_interval_end(..), bm::find_interval_start(..), bm::deserialize_range(..) | RLE/dGAP compressed bit-vectors can be used for efficient operations on ranges and intervals. This example illustartes algorithms for bit-intervals. | sample22.cpp | |
bm::interval_enumerator<> | Forward iterator-like class for interpreting of bit-vector as a sequence of 1-set intervals ('01110'). | sample23.cpp | |
bm::rank_range_split(..) | Split bit-vector into inetrvals of equal rank. Use case: parallel algorithms guided by bit-vector. New algorithm helps to break bit-vector into ranges (or sub-vectors) with equal population count. Each sub-range becomes a processing job then. | sample24.cpp |
64-bit bvector
BitMagic supports 64-bit address space for bit-vectors and succinct containers. Current limitation is that BitMagic supports "only" 48-bit address space.
Name | Description | Links | Examples |
---|---|---|---|
bm::bvector<> container | How to turn on 64-bit address mode. | bvsample01_64.cpp |
Algorithms for bm::bvector<>
Name | Description | Links | Examples |
---|---|---|---|
Distances | Algorithms optimized for computing binary distances for clusterization of multi-dimentional binary spaces. | More | sample9.cpp |
Random sub-set | Algorithm binary subset randomization. | More | sample10.cpp |
Miscelleneous | Algorithms for combining(OR, XOR, SUB, AND) bvector<> with STL-compatible iterators. Count number of intervals in bit-vector, etc. | More |
Bit-transposed sparse vector bm::sparse_vector<>
Name | Description | Links | Examples |
---|---|---|---|
bm::sparse_vector<> container | Store array of integers using compact bit-transposed implementation and the arsenal of compression methods provided by bm::bvector<> | More | svsample01.cpp svsample02.cpp svsample03.cpp |
Serialization | Algorithms to compress and serialize bm::sparse_vector<> for storage or network transfer | More | svsample02.cpp |
Algorithms. Dynamic range clipping. | Algorithms on bit-transposed sparse vector. Dynamic range clipping. | More | |
sparse vector with NULL support | How to configure and use sparse vector with unassigned (NULL) elements | svsample04.cpp | |
Algorithms. Set to set image function. | Theory of groups image function using sparse vector as an association container defining set-to-set translation. | Tech.notes | svsample05.cpp |
Algorithms. Search in unordered set. | Search in sparse vector, using scanner class. How to bm::sparse_vector<> iterators. Benchmarks. | Tech.notes | svsample06.cpp |
Algorithms. Lower Bound search in sorted container. | Search in sparse vector, with scanner class and lower_bound(). This example shows how to implement insertion sort without unpacking succinct vector. | svsample07.cpp | |
Selective deserialization | How to extract range or list of elements from a compressed sparse vector. | Tech.notes | svsample08.cpp |
Mismatch search | Fast mismatch search in sparse vectors. | Tech.notes | svsample09.cpp |
Rank-Select succinct bit-transposed sparse vector. bm::rsc_sparse_vector<>
Name | Description | Links | Examples |
---|---|---|---|
bm::rsc_sparse_vector<> container |
Introduction to rsc_sparse_vector<> .
Example shows how to load data, serialize and deserialize it.
|
rscsample01.cpp | |
Selective deserialization | How to extract range or list of elements from a compressed sparse vector. | Tech.notes | rscsample02.cpp |
bm::rsc_sparse_vector<>::const_iterator | How to use const_iterator to access compressed container. | rscsample03.cpp | |
bm::rsc_sparse_vector<> known elements access | Rank-Select container can be used for terms frequency counting (TF-IDF) in compact memory. This example illustartes how to construct a container with known elements and set or increment elements using Rank-Select index which is a lot faster than the default mode. | Tech.notes | rscsample04.cpp |
String bit-transposed sparse vector. bm::str_sparse_vector<>
Name | Description | Links | Examples |
---|---|---|---|
bm::str_sparse_vector<> container |
Introduction to str_sparse_vector<> .
Example shows how to add values, optimize (memory compression),
iterate elements.
|
strsvsample01.cpp | |
bm::str_sparse_vector<> container, bm::sparse_vector_scanner<> |
insertion sort on bit-transposed collection of strings using BitMagic
sparse_vector_scanner<>
|
strsvsample02.cpp | |
str_sparse_vector<>::back_insert_iterator |
str_sparse_vector<>::back_insert_iterator is used to load
the data, then sparse vector gets remaped, compressed and saved to disk using
serialization API. This techniques help to create index good for fast search.
|
strsvsample03.cpp | |
Vector with NULL (unassigned) values | Operations with NULL(unassigned) values in bit-transposed sparse vectors. | strsvsample04.cpp | |
Selective deserialization | How to extract range or list of elements from a compressed sparse vector. | Tech.notes | strsvsample05.cpp |
Advanced Examples (mixed techniques)
Name | Description | Links | Examples |
---|---|---|---|
Memory consumption / sparse sets | Comparison of different methods of handling of sparse sets. Memory and performance benchmarks. Described techniques can be used for in-memory acceleration of ER JOIN operations. | Tech.notes | xsample01.cpp |
Counting Sort, construction of histograms | Example, shows a how to make fast and memory efficient integer counting sort / histogram construction using BitMagic sparse_vector<>. Speed benchmark of various techniques based on QuickSort, R-B tree (stl::map<>), parallel version of counting sort. | Tech.notes | xsample02.cpp |
SNP search with succinct sparse vector | Example, shows a how to build and search memory compact vector of mutations (SNPs), using real data from NCBI. Memory and performance benchmark comparisons. | Tech.notes | xsample03.cpp |
Substring search with fingerprints and SHIFT-AND | Example, shows a how to build DNA fingerprints and use different algorithms to search for a k-mer subsequence. Benchmarking comparisons with more straight-forward approaches. | Tech.notes | xsample04.cpp |
Fast bit-vector index construction | Example, shows a how to build DNA fingerprints. This is related to xsample04.cpp but focuses of fast index construction, using SIMD and multi-threading. | Tech.notes | xsample04a.cpp |
Memory efficient dictionary |
Example, shows a how to use bm::str_sparse_vector<> .
Sample uses data from NASA NED extragalactic database to build an efficient
dictionary using bit-transposed sparse matrix and characted remappings.
Sample illustrates binary search and scan search techniques.
|
Tech.notes | xsample05.cpp |
DNA compression and mismatch search |
Example, shows a how to use bm::str_sparse_vector<> .
Sample uses DNA code tyo show 2-bit per char compression and
shows how to construct comparison function without decompression.
Benchmark comparisons with STL and C strcmp().
|
Tech.notes | xsample06.cpp |
Genomic Viewer layout calculation | Example, shows a how to use bit-interval algorithms and succinct data structures to create a toy genomics viewer (ASCII graphics is used for illustration). | Tech.notes | xsample08.cpp |