1. Added AVX2 optimization option (
Code there is based on modified libpopcnt library by Kim Walisch.
Some code fragments are based on the paper "Faster Population Counts using AVX2 Instructions" by Daniel Lemire, Nathan Kurz and
Wojciech Mula (23 Nov 2016).
2. Improved SIMD vectorization algorithms for SSE2, SSE4.2
3. Optimization: faster testin and counting bits in compressed bit-vectors. 30% better performance in some test cases.
4. bit-vector enumerator (bvector<>::enumerator) added random positioning, can now go to any arbitrary start point and traverse bits from it
5. Added bvector<>::count_to() to efficiently work with range bit counting and bit-vector based prefix sums (see example)
6. Improved performance of sparse_vector<> extraction algorithms