Use Cases & Design Patterns
Search for terms from NASA NED Extragalactic Database using low memory footprint dictionaries. Tech.note describes hybrid algorithm, which combines binary search with SIMD accelerated final logical search with search space pruning.
DNA Search with bit-vector fingerprints
Search for substring within a DNA sequence using bit-vector fingerprints. This use case demonstrates and explains fingerprinting, vectorization, search space pruning, cache optimizations using readily available benchmarking application searching withing human chromosome 1.
SNP Search with succinct data structures
Bioinformatics is one of the data intensive areas. Situations when searching a needle
in a hay stack happen every day, data sets are large and random as a nature itself.
This use-case illustrates search for human mutations (SNPs) mapped to chromosome 1
without building a formal reverse index, just using traditional (STL) and
bit-transposed (BitMagic) data structures. Use case comes with a code example and
benchmark numbers, showing performance and memory footprint comparisons.
Use case illustrates good use for memory compact
Histogram construction - counting sort
Historgams of integer values is a great way to compress redundant data, tool for
statistical analysis, a way to store database index statistics and build optimal query execution plans.
Contruction and storage of histograms needs to be memory efficient and performant. Class of tasks related to
building histogram of integer set can be solved using
This uses case provides sample code how to build a histogram using memory efficient sparse_vector
and compares it to a few other methods, based on QuickSort and R-B tree.
Topological Algorithms for Trees with Bit Vector Indexes
Trees are very generic algorithmic structures with a wide range of algorithmic problems related to topological algorithms. In this use case we describe how problem of finding common ancestors can be solved with set algebra operations.
Job Queue Scheduling with Bit Vector Indexes
Large scale web applications often face the problem of optimal, near real-time job scheduling and messaging. In this use case we describe design principles of a system and use of compressed bit vectors to optimize workload distribution and solve monitoring and operational challenges.