BitMagic-C++
Data Structures | Public Types | Public Member Functions
bm::serializer< BV > Class Template Reference

Bit-vector serialization class. More...

#include <bmserial.h>

Inheritance diagram for bm::serializer< BV >:
Inheritance graph
[legend]

Data Structures

struct  bookmark_state
 Bookmark state structure. More...
 

Public Types

typedef BV bvector_type
 
typedef bvector_type::allocator_type allocator_type
 
typedef bvector_type::blocks_manager_type blocks_manager_type
 
typedef bvector_type::statistics statistics_type
 
typedef bvector_type::block_idx_type block_idx_type
 
typedef bvector_type::size_type size_type
 
typedef byte_buffer< allocator_typebuffer
 
typedef bm::bv_ref_vector< BV > bv_ref_vector_type
 

Public Member Functions

 serializer (const allocator_type &alloc=allocator_type(), bm::word_t *temp_block=0)
 Constructor. More...
 
 serializer (bm::word_t *temp_block)
 
 ~serializer ()
 
Compression level settings


void set_compression_level (unsigned clevel) BMNOEXCEPT
 Set compression level. More...
 
unsigned get_compression_level () const BMNOEXCEPT
 Get current compression level. More...
 

Serialization Methods


size_type serialize (const BV &bv, unsigned char *buf, size_t buf_size)
 Bitvector serialization into memory block. More...
 
void serialize (const BV &bv, typename serializer< BV >::buffer &buf, const statistics_type *bv_stat=0)
 Bitvector serialization into buffer object (resized automatically) More...
 
void optimize_serialize_destroy (BV &bv, typename serializer< BV >::buffer &buf)
 Bitvector serialization into buffer object (resized automatically) Input bit-vector gets optimized and then destroyed, content is NOT guaranteed after this operation. More...
 
const size_typeget_compression_stat () const BMNOEXCEPT
 Return serialization counter vector. More...
 
void gap_length_serialization (bool value) BMNOEXCEPT
 Set GAP length serialization (serializes GAP levels of the original vector) More...
 
void byte_order_serialization (bool value) BMNOEXCEPT
 Set byte-order serialization (for cross platform compatibility) More...
 
void set_bookmarks (bool enable, unsigned bm_interval=256) BMNOEXCEPT
 Add skip-markers to serialization BLOB for faster range decode at the expense of some BLOB size increase. More...
 
void set_sparse_cutoff (unsigned cutoff) BMNOEXCEPT
 Fine tuning for Binary Interpolative Compression (levels 5+) The parameter sets average population count per block (64Kbits) below which block is considered very sparse. More...
 
void set_ref_vectors (const bv_ref_vector_type *ref_vect)
 Attach collection of reference vectors for XOR serialization (no transfer of ownership for the pointer) More...
 
void set_curr_ref_idx (size_type ref_idx) BMNOEXCEPT
 Set current index in rer.vector collection (not a row idx or plain idx) More...
 
void encode_header (const BV &bv, bm::encoder &enc) BMNOEXCEPT
 Encode serialization header information. More...
 
void encode_gap_block (const bm::gap_word_t *gap_block, bm::encoder &enc)
 
void gamma_gap_block (const bm::gap_word_t *gap_block, bm::encoder &enc) BMNOEXCEPT
 
void gamma_gap_array (const bm::gap_word_t *gap_block, unsigned arr_len, bm::encoder &enc, bool inverted=false) BMNOEXCEPT
 Encode GAP block as delta-array with Elias Gamma coder. More...
 
void encode_bit_array (const bm::word_t *block, bm::encoder &enc, bool inverted) BMNOEXCEPT
 Encode bit-block as an array of bits. More...
 
void gamma_gap_bit_block (const bm::word_t *block, bm::encoder &enc) BMNOEXCEPT
 
void gamma_arr_bit_block (const bm::word_t *block, bm::encoder &enc, bool inverted) BMNOEXCEPT
 
void bienc_arr_bit_block (const bm::word_t *block, bm::encoder &enc, bool inverted) BMNOEXCEPT
 
void bienc_arr_sblock (const BV &bv, unsigned sb, bm::encoder &enc) BMNOEXCEPT
 
void bienc_gap_bit_block (const bm::word_t *block, bm::encoder &enc) BMNOEXCEPT
 encode bit-block as interpolated bit block of gaps More...
 
void interpolated_arr_bit_block (const bm::word_t *block, bm::encoder &enc, bool inverted) BMNOEXCEPT
 
void interpolated_gap_bit_block (const bm::word_t *block, bm::encoder &enc) BMNOEXCEPT
 encode bit-block as interpolated gap block More...
 
void interpolated_gap_array (const bm::gap_word_t *gap_block, unsigned arr_len, bm::encoder &enc, bool inverted) BMNOEXCEPT
 Encode GAP block as an array with binary interpolated coder. More...
 
void interpolated_gap_array_v0 (const bm::gap_word_t *gap_block, unsigned arr_len, bm::encoder &enc, bool inverted) BMNOEXCEPT
 
void interpolated_encode_gap_block (const bm::gap_word_t *gap_block, bm::encoder &enc) BMNOEXCEPT
 
void encode_bit_interval (const bm::word_t *blk, bm::encoder &enc, unsigned size_control) BMNOEXCEPT
 Encode BIT block with repeatable runs of zeroes. More...
 
void encode_bit_digest (const bm::word_t *blk, bm::encoder &enc, bm::id64_t d0) BMNOEXCEPT
 Encode bit-block using digest (hierarchical compression) More...
 
unsigned char find_gap_best_encoding (const bm::gap_word_t *gap_block) BMNOEXCEPT
 Determine best representation for GAP block based on current set compression level. More...
 
unsigned char find_bit_best_encoding (const bm::word_t *block) BMNOEXCEPT
 Determine best representation for a bit-block. More...
 
unsigned char find_bit_best_encoding_l5 (const bm::word_t *block) BMNOEXCEPT
 Determine best representation for a bit-block (level 5) More...
 
void reset_compression_stats () BMNOEXCEPT
 Reset all accumulated compression statistics. More...
 
void reset_models () BMNOEXCEPT
 
void add_model (unsigned char mod, unsigned score) BMNOEXCEPT
 
static void process_bookmark (block_idx_type nb, bookmark_state &bookm, bm::encoder &enc) BMNOEXCEPT
 Check if bookmark needs to be placed and if so, encode it into serialization BLOB. More...
 

Detailed Description

template<class BV>
class bm::serializer< BV >

Bit-vector serialization class.

Class designed to convert sparse bit-vectors into a single block of memory ready for file or database storage or network transfer.

Reuse of this class for multiple serializations (but not across threads). Class resue offers some performance advantage (helps with temp memory reallocations).

Examples
bvsetalgebra.cpp, inv_list.cpp, sample14.cpp, sample22.cpp, sample4.cpp, xsample01.cpp, and xsample07a.cpp.

Definition at line 75 of file bmserial.h.

Member Typedef Documentation

◆ allocator_type

template<class BV >
typedef bvector_type::allocator_type bm::serializer< BV >::allocator_type

Definition at line 79 of file bmserial.h.

◆ block_idx_type

template<class BV >
typedef bvector_type::block_idx_type bm::serializer< BV >::block_idx_type

Definition at line 82 of file bmserial.h.

◆ blocks_manager_type

Definition at line 80 of file bmserial.h.

◆ buffer

template<class BV >
typedef byte_buffer<allocator_type> bm::serializer< BV >::buffer

Definition at line 85 of file bmserial.h.

◆ bv_ref_vector_type

template<class BV >
typedef bm::bv_ref_vector<BV> bm::serializer< BV >::bv_ref_vector_type

Definition at line 86 of file bmserial.h.

◆ bvector_type

template<class BV >
typedef BV bm::serializer< BV >::bvector_type

Definition at line 78 of file bmserial.h.

◆ size_type

template<class BV >
typedef bvector_type::size_type bm::serializer< BV >::size_type

Definition at line 83 of file bmserial.h.

◆ statistics_type

template<class BV >
typedef bvector_type::statistics bm::serializer< BV >::statistics_type

Definition at line 81 of file bmserial.h.

Constructor & Destructor Documentation

◆ serializer() [1/2]

template<class BV >
bm::serializer< BV >::serializer ( const allocator_type alloc = allocator_type(),
bm::word_t temp_block = 0 
)

Constructor.

Parameters
alloc- memory allocator
temp_block- temporary block for various operations (if NULL it will be allocated and managed by serializer class) Temp block is used as a scratch memory during serialization, use of external temp block allows to avoid unnecessary re-allocations.

Temp block attached is not owned by the class and NOT deallocated on destruction.

Definition at line 1104 of file bmserial.h.

◆ serializer() [2/2]

template<class BV >
bm::serializer< BV >::serializer ( bm::word_t temp_block)

Definition at line 1135 of file bmserial.h.

◆ ~serializer()

template<class BV >
bm::serializer< BV >::~serializer

Definition at line 1165 of file bmserial.h.

Member Function Documentation

◆ add_model()

template<class BV >
void bm::serializer< BV >::add_model ( unsigned char  mod,
unsigned  score 
)
protected

Definition at line 1566 of file bmserial.h.

◆ bienc_arr_bit_block()

template<class BV >
void bm::serializer< BV >::bienc_arr_bit_block ( const bm::word_t block,
bm::encoder enc,
bool  inverted 
)
protected

Definition at line 2064 of file bmserial.h.

◆ bienc_arr_sblock()

template<class BV >
void bm::serializer< BV >::bienc_arr_sblock ( const BV &  bv,
unsigned  sb,
bm::encoder enc 
)
protected

Definition at line 2148 of file bmserial.h.

◆ bienc_gap_bit_block()

template<class BV >
void bm::serializer< BV >::bienc_gap_bit_block ( const bm::word_t block,
bm::encoder enc 
)
protected

encode bit-block as interpolated bit block of gaps

Definition at line 2089 of file bmserial.h.

◆ byte_order_serialization()

template<class BV >
void bm::serializer< BV >::byte_order_serialization ( bool  value)

Set byte-order serialization (for cross platform compatibility)

Parameters
value- TRUE serialization format includes byte-order marker

Definition at line 1211 of file bmserial.h.

Referenced by compress_inv_dump_file(), convert_bv2bvs(), main(), and bm::serialize().

◆ encode_bit_array()

template<class BV >
void bm::serializer< BV >::encode_bit_array ( const bm::word_t block,
bm::encoder enc,
bool  inverted 
)
protected

Encode bit-block as an array of bits.

Definition at line 2020 of file bmserial.h.

◆ encode_bit_digest()

template<class BV >
void bm::serializer< BV >::encode_bit_digest ( const bm::word_t blk,
bm::encoder enc,
bm::id64_t  d0 
)
protected

Encode bit-block using digest (hierarchical compression)

Definition at line 1925 of file bmserial.h.

◆ encode_bit_interval()

template<class BV >
void bm::serializer< BV >::encode_bit_interval ( const bm::word_t blk,
bm::encoder enc,
unsigned  size_control 
)
protected

Encode BIT block with repeatable runs of zeroes.

Definition at line 1873 of file bmserial.h.

◆ encode_gap_block()

template<class BV >
void bm::serializer< BV >::encode_gap_block ( const bm::gap_word_t gap_block,
bm::encoder enc 
)
protected

Encode GAP block

Definition at line 1813 of file bmserial.h.

◆ encode_header()

template<class BV >
void bm::serializer< BV >::encode_header ( const BV &  bv,
bm::encoder enc 
)
protected

Encode serialization header information.

Definition at line 1247 of file bmserial.h.

◆ find_bit_best_encoding()

template<class BV >
unsigned char bm::serializer< BV >::find_bit_best_encoding ( const bm::word_t block)
protected

Determine best representation for a bit-block.

Definition at line 1656 of file bmserial.h.

◆ find_bit_best_encoding_l5()

template<class BV >
unsigned char bm::serializer< BV >::find_bit_best_encoding_l5 ( const bm::word_t block)
protected

Determine best representation for a bit-block (level 5)

Definition at line 1575 of file bmserial.h.

◆ find_gap_best_encoding()

template<class BV >
unsigned char bm::serializer< BV >::find_gap_best_encoding ( const bm::gap_word_t gap_block)
protected

Determine best representation for GAP block based on current set compression level.

Returns
set_block_gap, set_block_bit_1bit, set_block_arrgap set_block_arrgap_egamma, set_block_arrgap_bienc set_block_arrgap_inv, set_block_arrgap_egamma_inv set_block_arrgap_bienc_inv, set_block_gap_egamma set_block_gap_bienc

Definition at line 1762 of file bmserial.h.

◆ gamma_arr_bit_block()

template<class BV >
void bm::serializer< BV >::gamma_arr_bit_block ( const bm::word_t block,
bm::encoder enc,
bool  inverted 
)
protected

Definition at line 2048 of file bmserial.h.

◆ gamma_gap_array()

template<class BV >
void bm::serializer< BV >::gamma_gap_array ( const bm::gap_word_t gap_block,
unsigned  arr_len,
bm::encoder enc,
bool  inverted = false 
)
protected

Encode GAP block as delta-array with Elias Gamma coder.

Definition at line 1398 of file bmserial.h.

◆ gamma_gap_bit_block()

template<class BV >
void bm::serializer< BV >::gamma_gap_bit_block ( const bm::word_t block,
bm::encoder enc 
)
protected

Definition at line 2039 of file bmserial.h.

◆ gamma_gap_block()

template<class BV >
void bm::serializer< BV >::gamma_gap_block ( const bm::gap_word_t gap_block,
bm::encoder enc 
)
protected

Encode GAP block with Elias Gamma coder

Definition at line 1359 of file bmserial.h.

◆ gap_length_serialization()

template<class BV >
void bm::serializer< BV >::gap_length_serialization ( bool  value)

Set GAP length serialization (serializes GAP levels of the original vector)

Parameters
value- when TRUE serialized vector includes GAP levels parameters

Definition at line 1205 of file bmserial.h.

Referenced by compress_inv_dump_file(), convert_bv2bvs(), main(), bm::compressed_collection_serializer< CBC >::serialize(), and bm::serialize().

◆ get_compression_level()

template<class BV >
unsigned bm::serializer< BV >::get_compression_level ( ) const
inline

Get current compression level.

Definition at line 128 of file bmserial.h.

◆ get_compression_stat()

template<class BV >
const size_type* bm::serializer< BV >::get_compression_stat ( ) const
inline

Return serialization counter vector.

Definition at line 191 of file bmserial.h.

◆ interpolated_arr_bit_block()

template<class BV >
void bm::serializer< BV >::interpolated_arr_bit_block ( const bm::word_t block,
bm::encoder enc,
bool  inverted 
)
protected

Definition at line 2239 of file bmserial.h.

◆ interpolated_encode_gap_block()

template<class BV >
void bm::serializer< BV >::interpolated_encode_gap_block ( const bm::gap_word_t gap_block,
bm::encoder enc 
)
protected

Encode GAP block with using binary interpolated encoder

Definition at line 1299 of file bmserial.h.

◆ interpolated_gap_array()

template<class BV >
void bm::serializer< BV >::interpolated_gap_array ( const bm::gap_word_t gap_block,
unsigned  arr_len,
bm::encoder enc,
bool  inverted 
)
protected

Encode GAP block as an array with binary interpolated coder.

Definition at line 1491 of file bmserial.h.

◆ interpolated_gap_array_v0()

template<class BV >
void bm::serializer< BV >::interpolated_gap_array_v0 ( const bm::gap_word_t gap_block,
unsigned  arr_len,
bm::encoder enc,
bool  inverted 
)
protected

Definition at line 1443 of file bmserial.h.

◆ interpolated_gap_bit_block()

template<class BV >
void bm::serializer< BV >::interpolated_gap_bit_block ( const bm::word_t block,
bm::encoder enc 
)
protected

encode bit-block as interpolated gap block

Definition at line 2079 of file bmserial.h.

◆ optimize_serialize_destroy()

template<class BV >
void bm::serializer< BV >::optimize_serialize_destroy ( BV &  bv,
typename serializer< BV >::buffer buf 
)

Bitvector serialization into buffer object (resized automatically) Input bit-vector gets optimized and then destroyed, content is NOT guaranteed after this operation.

Effectively it moves data into the buffer.

The reason this operation exsists is because it is faster to do all three operations in one single pass. This is a destructive serialization!

Parameters
bv- input/output bitvector
buf- output buffer object

Definition at line 2002 of file bmserial.h.

Referenced by main().

◆ process_bookmark()

template<class BV >
void bm::serializer< BV >::process_bookmark ( block_idx_type  nb,
bookmark_state bookm,
bm::encoder enc 
)
staticprotected

Check if bookmark needs to be placed and if so, encode it into serialization BLOB.

Parameters
nb- block idx
bookm- bookmark state structure
enc- BLOB encoder

Definition at line 2330 of file bmserial.h.

◆ reset_compression_stats()

template<class BV >
void bm::serializer< BV >::reset_compression_stats
protected

Reset all accumulated compression statistics.

Definition at line 1177 of file bmserial.h.

◆ reset_models()

template<class BV >
void bm::serializer< BV >::reset_models ( )
inlineprotected

Definition at line 345 of file bmserial.h.

◆ serialize() [1/2]

template<class BV >
void bm::serializer< BV >::serialize ( const BV &  bv,
typename serializer< BV >::buffer buf,
const statistics_type bv_stat = 0 
)

Bitvector serialization into buffer object (resized automatically)

Parameters
bv- input bitvector
buf- output buffer object
bv_stat- input (optional) bit-vector statistics object if NULL, serialize will compute the statistics

Definition at line 1980 of file bmserial.h.

◆ serialize() [2/2]

template<class BV >
serializer< BV >::size_type bm::serializer< BV >::serialize ( const BV &  bv,
unsigned char *  buf,
size_t  buf_size 
)

Bitvector serialization into memory block.

Parameters
bv- input bitvector
buf- out buffer (pre-allocated) No range checking is done in this method. It is responsibility of caller to allocate sufficient amount of memory using information from calc_stat() function.
buf_size- size of the output buffer
Returns
Size of serialization block.
See also
calc_stat
Examples
bvsetalgebra.cpp, and sample14.cpp.

Definition at line 2440 of file bmserial.h.

Referenced by convert_bv2bvs(), generate_k_mers(), main(), make_BLOB(), bm::compressed_collection_serializer< CBC >::serialize(), and bm::serialize().

◆ set_bookmarks()

template<class BV >
void bm::serializer< BV >::set_bookmarks ( bool  enable,
unsigned  bm_interval = 256 
)

Add skip-markers to serialization BLOB for faster range decode at the expense of some BLOB size increase.

Parameters
enable- TRUE searilization will add bookmark codes
bm_interval- bookmark interval in (number of blocks) suggested values between 4 and 512 (block size is 64K bits) smaller interval means more bookmarks added to the skip list allows faster range deserialization at the expense of
somewhat increased BLOB size.

Definition at line 1217 of file bmserial.h.

Referenced by generate_k_mers(), main(), and bm::sparse_vector_serializer< SV >::set_bookmarks().

◆ set_compression_level()

template<class BV >
void bm::serializer< BV >::set_compression_level ( unsigned  clevel)

Set compression level.

Higher compression takes more time to process.

Parameters
clevel- compression level (0-5) 0 - take as is 1, 2 - apply light weight RLE/GAP encodings, limited depth hierarchical compression, intervals encoding 3 - variant of 2 with different cut-offs 4 - delta transforms plus Elias Gamma encoding where possible legacy) 5 - Binary Interpolative Coding (Moffat, et al)
See also
get_compression_level

Definition at line 1184 of file bmserial.h.

Referenced by compress_inv_dump_file(), convert_bv2bvs(), main(), and make_BLOB().

◆ set_curr_ref_idx()

template<class BV >
void bm::serializer< BV >::set_curr_ref_idx ( size_type  ref_idx)

Set current index in rer.vector collection (not a row idx or plain idx)

Definition at line 1241 of file bmserial.h.

◆ set_ref_vectors()

template<class BV >
void bm::serializer< BV >::set_ref_vectors ( const bv_ref_vector_type ref_vect)

Attach collection of reference vectors for XOR serialization (no transfer of ownership for the pointer)

Definition at line 1232 of file bmserial.h.

◆ set_sparse_cutoff()

template<class BV >
void bm::serializer< BV >::set_sparse_cutoff ( unsigned  cutoff)

Fine tuning for Binary Interpolative Compression (levels 5+) The parameter sets average population count per block (64Kbits) below which block is considered very sparse.

If super block (group of 256 blocks) is very sparse it applies block size expansion (for the compression purposes) to improve compression rates.

Definition at line 1195 of file bmserial.h.


The documentation for this class was generated from the following file: