BitMagic-C++
strsvsample05.cpp

Example of how to use bm::str_sparse_vector<> - succinct container for bit-transposed string collections for deserialization of only select elements from the serialized BLOB

See also
bm::str_sparse_vector
bm::sparse_vector_deserializer
bm::sparse_vector_serializer
/*
Copyright(c) 2002-2017 Anatoliy Kuznetsov(anatoliy_kuznetsov at yahoo.com)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
For more information please visit: http://bitmagic.io
*/
/** \example strsvsample05.cpp
Example of how to use bm::str_sparse_vector<> - succinct container for
bit-transposed string collections for deserialization of only select elements
from the serialized BLOB
\sa bm::str_sparse_vector
\sa bm::sparse_vector_deserializer
\sa bm::sparse_vector_serializer
*/
/*! \file strsvsample05.cpp
\brief Example: str_sparse_vector<> gather deserialization example
This example loads a range of a sparse vector from an STL container to save
memory and improve deserialization performance
*/
#include <iostream>
#include <string>
#include <vector>
#include <assert.h>
#include "bm.h"
#include "bmstrsparsevec.h"
#include "bmundef.h" /* clear the pre-proc defines from BM */
using namespace std;
int main(void)
{
try
{
str_sv_type str_sv1;
str_sv_type str_sv2;
str_sv_type str_sv3;
{
str_sv_type str_sv0;
// here we generate collection of k-mer (4-mer) strings
// imitating a DNA sequence
{
auto bi = str_sv0.get_back_inserter();
for (unsigned i = 0; i < 100000; ++i)
{
bi = "ATGC";
bi = "GCTA";
bi = "GCAA";
bi = "TATA";
} // for
}
str_sv1.remap_from(str_sv0); // SV1 now contains a remapped(smaller) copy of SV0
}
str_sv1.optimize(tb);
// calculate memory footprint
//
str_sv1.calc_stat(&st);
cout << "Used memory: " << st.memory_used << std::endl;
// construct a serializer utility class, setup serialization parameters
//
// please note, use of "set_bookmarks()" to enable fast range
// deserialization. Bookmarks somewhat increase the BLOB size but allow
// more effeiciently skip parts which we would not need (paging) and
// avoid decompression of blocks we would never need
//
// This example sets "128" as a bookmarks parameter, but you have to
// experiment with what works for you, between 4 and 512
//
// Each block corresponds to 64K vector element
// making bookmarks after each block does not make much sense
// because decode is reasonably fast and some residual throw away
// is usually ok.
//
sv_serializer.set_bookmarks(true, 128);
// run str-vector serialization with compression
//
sv_serializer.serialize(str_sv1, sv_lay);
const unsigned char* buf = sv_lay.buf();
cout << "Serialized size = " << sv_lay.size() << endl;
// instantiate deserializer utility class
//
bvector_type::size_type from = 100000;
bvector_type::size_type to = from + 65536;
{
// 1.
// one way to deserialize is to provide a mask vector
// specifying which sparse vector elements needs to be
// decompressed from the BLOB
// mask vector does not necessarily has to be just one range
//
bvector_type bv_mask;
bv_mask.set_range(from, to);
sv_deserial.deserialize(str_sv2, buf, bv_mask);
// 2.
// If it is just one range (common use case for paging)
// it is faster and cleaner to use deserialize_range().
// It will produce the same result as with (1) just faster.
//
sv_deserial.deserialize_range(str_sv3, buf, from, to);
// run a quick comparison, that selected range matches values in
// the container str_sv2, str_sv3
//
char s1[16]; char s2[16]; char s3[16];
for (bvector_type::size_type j = from; j < to; ++j)
{
str_sv1.get(j, s1, sizeof(s1));
str_sv2.get(j, s2, sizeof(s2));
str_sv3.get(j, s3, sizeof(s3));
int cmp;
cmp = ::strcmp(s1, s2);
assert(cmp==0);
cmp = ::strcmp(s1, s3);
assert(cmp==0); (void)cmp;
} // for j
cout << "Gather deserialization check OK" << endl;
}
}
catch(std::exception& ex)
{
std::cerr << ex.what() << std::endl;
return 1;
}
return 0;
}
Compressed bit-vector bvector<> container, set algebraic methods, traversal iterators.
#define BM_DECLARE_TEMP_BLOCK(x)
Definition: bm.h:47
Serialization for sparse_vector<>
string sparse vector based on bit-transposed matrix
pre-processor un-defines to avoid global space pollution (internal)
Bitvector Bit-vector container with runtime compression of bits.
Definition: bm.h:115
bvector_size_type size_type
Definition: bm.h:121
bvector< Alloc > & set_range(size_type left, size_type right, bool value=true)
Sets all bits in the specified closed interval [left,right] Interval must be inside the bvector's siz...
Definition: bm.h:2333
sparse vector de-serializer
void deserialize(SV &sv, const unsigned char *buf, bool clear_sv=true)
void deserialize_range(SV &sv, const unsigned char *buf, size_type from, size_type to, bool clear_sv=true)
void set_bookmarks(bool enable, unsigned bm_interval=256) BMNOEXCEPT
Add skip-markers for faster range deserialization.
void serialize(const SV &sv, sparse_vector_serial_layout< SV > &sv_layout)
Serialize sparse vector into a memory buffer(s) structure.
succinct sparse vector for strings with compression using bit-slicing ( transposition) method
void optimize(bm::word_t *temp_block=0, typename bvector_type::optmode opt_mode=bvector_type::opt_compress, typename str_sparse_vector< CharType, BV, STR_SIZE >::statistics *stat=0)
run memory optimization for all vector planes
void calc_stat(struct str_sparse_vector< CharType, BV, STR_SIZE >::statistics *st) const BMNOEXCEPT
Calculates memory statistics.
void remap_from(const str_sparse_vector &str_sv, octet_freq_matrix_type *omatrix=0)
Build remapping profile and load content from another sparse vector Remapped vector likely saves memo...
size_type get(size_type idx, value_type *str, size_type buf_size) const BMNOEXCEPT
get specified element
back_insert_iterator get_back_inserter()
Provide back insert iterator Back insert iterator implements buffered insertion, which is faster,...
bm::bvector bvector_type
int main(void)
bm::str_sparse_vector< char, bvector_type, 5 > str_sv_type
size_t memory_used
memory usage for all blocks and service tables
Definition: bmfunc.h:62
layout class for serialization buffer structure
size_t size() const BMNOEXCEPT
return current serialized size
const unsigned char * buf() const BMNOEXCEPT
Return serialization buffer pointer.