Powerful Tool for Big Data Problems

BitMagic Library helps to develop high-throughput search systems, promote combination of hardware optimization and on the fly compression to fit inverted indexes and binary fingerprints into memory, minimize disk and network footprint.

Functions

  • compressed bit-vector container, implements random access methods, with range of set-algebraic functions, ranks, find and traverse methods, STL-style iterators.
  • set algebraic operations: AND, OR, XOR, MINUS for bit-vectors and integer sets. Interoperable with low level C arrays and STL compatible containers (via iterators).
  • serialization/hybernation of containers into compressed BLOBs for database persistence or in-memory compression.
  • memory management with focus on optimization (avoiding) allocations/de-allocations, minimization of heap fragmentation, custom allocators.
  • set algebraic operations on compressed bit-vector BLOBs
  • statistical engine to efficiently construct binary similarity and distance metrics (Tevrsky, Hamming, Tanamoto or your own).
  • containers for sparse vectors and collections for native integer types. Works throug bit-transposition and compression of each separate bit-plain. Supports for NULL semantics. Can be used for memory-compresses vector/columnar search systems with focus on memory efficiency.
  • algorithms on sparse vectors: dynamic range clipping (work in progress!)
  • functional operations on integer sets (theory of groups): translations between sets, mathematical images (work in progress!).
  • binary compressed matrices for ER-operations, materialized joins, one-to-many and many-to-many relationships, materialized RDBMS joins, graphs, etc. (work in progress!)
  • portable C-library layer as a bridge to Python, Java, .Net (work in progress!)

C and C++

BitMagic C++ Templates library offers STL friendly containers and iterators, all portable yet investing into low level optimizations. Our templates are header-only designed for easy integration into your big project. We provide lean (no RTTI, no STL, no exceptions) mapping into C language (JNI into Java and Scala - work in progress).

Storage and communications

Efficient serialization algorithms for saving containers. Serialization tools are provided for all containers, you can use it with embedded systems (like Berkeley DB), large scale RDBMS systems (Oracle, MS SQL, MySQL) or NoSQL (memcached).

Cross-platform

Bit-vectors can be serialized and sent over network for cross-platform data exchange and streaming, used for construction of network middleware, appliances and micro-services.

Know-how

The mission of our project is to share tools, and expertise, use cases and know-how of search systems, bit-vectors, inverted lists, compression techniques, libraries, programming language bindings, etc.

Getting started

BitMagic C++ Library implements easy, header only programming model.

Public code repository

BitMagic Library is hosted on GitHub and SourceForge.

Use cases

Use cases and design patterns for various applications for compressed bitvectors.

Design principles

Articles about design and performance optimizations.