Use Cases & Design Patterns

Compressed dictionary

Search for terms from NASA NED Extragalactic Database using low memory footprint dictionaries. Tech.note describes hybrid algorithm, which combines binary search with SIMD accelerated final logical search with search space pruning.

DNA Search with bit-vector fingerprints

Search for substring within a DNA sequence using bit-vector fingerprints. This use case demonstrates and explains fingerprinting, vectorization, search space pruning, cache optimizations using readily available benchmarking application searching withing human chromosome 1.

SNP Search with succinct data structures

Bioinformatics is one of the data intensive areas. Situations when searching a needle in a hay stack happen every day, data sets are large and random as a nature itself. This use-case illustrates search for human mutations (SNPs) mapped to chromosome 1 without building a formal reverse index, just using traditional (STL) and bit-transposed (BitMagic) data structures. Use case comes with a code example and benchmark numbers, showing performance and memory footprint comparisons. Use case illustrates good use for memory compact bm::rsc_sparse_vector<>.

Histogram construction - counting sort

Historgams of integer values is a great way to compress redundant data, tool for statistical analysis, a way to store database index statistics and build optimal query execution plans. Contruction and storage of histograms needs to be memory efficient and performant. Class of tasks related to building histogram of integer set can be solved using bm::sparse_vector<>. This uses case provides sample code how to build a histogram using memory efficient sparse_vector and compares it to a few other methods, based on QuickSort and R-B tree.

Topological Algorithms for Trees with Bit Vector Indexes

Trees are very generic algorithmic structures with a wide range of algorithmic problems related to topological algorithms. In this use case we describe how problem of finding common ancestors can be solved with set algebra operations.

Job Queue Scheduling with Bit Vector Indexes

Large scale web applications often face the problem of optimal, near real-time job scheduling and messaging. In this use case we describe design principles of a system and use of compressed bit vectors to optimize workload distribution and solve monitoring and operational challenges.