Counting bits
published at 21.02.2015 23:25 by Jens Weller
Save to Instapaper Pocket
I did a bit of fun coding. I'm currently thinking on how to generate random bytes. The mersenne twister RNG is known to give very good randomness, so it would be a possible an easy source. But first, I wanted to know, how random is the mersenne twister really? So, when counting the bits in the result of a few thousand calls to a rng, the distribution should be even. So, today I wrote code that counts bits, and tested it on the mersenne twister.
Counting bits
Each bits represents a power of two as a numerical value. So, first thing is to generate an array of exact those power of twos. This array servers as a bitmask, and as its a very easy and basic calculation, I wanted to achieve this with constexpr. I've never used it before, and my first attempt with a loop failed, simply because that is only allowed from C++14 on. So I went with recursion, as other examples also show this path. Forgetting that a simple shift operation would do the same:
constexpr std::uint_fast64_t power_of_2(unsigned int pow) { return 1ull << pow; //return pow == 0 ? 1ull : 2ull * power_of_2(pow-1); }
Next, the class is needed, which does the actual bit counting, as the underlying type can be different (32bit vs. 64bit e.g.), I implemented it as a template, which holds an array of power of 2 values:
template class bitstats<class int_type> { static_assert(std::numeric_limits<int_type>::is_integer,"int_type must meet numeric_limits::is_integer"); std::vector bitcount{sizeof(int_type)*CHAR_BIT,0}; static constexpr std::uint_fast64_t bitvalue[64]={1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192 ,power_of_2(14),power_of_2(15),power_of_2(16),...
First, a static_assert checks if the type is an integer with std::numeric_limits<T>::is_integer. Maybe the support for operator & would be enough, but for now I think its good to only let the class compile for integers. Next I need to know how many bits the type has, so sizeof * CHAR_BIT should give me the correct result. A vector is created, which contains an entry for every bit. Next is the array containing the power of 2 values, maybe I should factor this out, as its independent from the stats class. The only public function is the actual counting:
void count(int_type n) { for(size_t i =0; i < bitcount.size(); ++i) { if(n & bitvalue[i]) bitcount[i]++; } }
And this is already the code which does the bit counting. The if is where the test is happening, if that bit is set. As I said, this class is just a fun side project, I decided to test the distribution of the bits with the 32 and 64 bit versions of mersenne twister. Whose returntype is std::uint64_fast_t, the bitvalue arrays type.
As I want to test RNGs, I need a small template function, which fills a vector with random numbers:
template<class RNG, class uint_type = typename RNG::result_type> std::vector<uint_type> fillRandom(RNG& rng,size_t num) { std::vector<uint_type> vec(num); std::generate(vec.begin(),vec.end(),rng); return vec; }
Testing the 32 and 64 bit versions, and maybe later also other RNGs, it makes sense to also setup the test class as a template:
template<class RNG> class random_test { bitstats<typename RNG::result_type> stats; RNG rng; public: random_test(typename RNG::result_type rd = std::time(0)):rng(rd){} void run() { auto vec = fillRandom(rng,5000000); for(auto& i: vec ) stats.count(i); } const bitstats<typename RNG::result_type>& getStats() const{return stats;} };
The class instantiates the RNG with a seed, and the run method does the work. All thats left is to put things together in the main function:
int main() { random_test<std::mt19937> mersenne32; random_test<std::mt19937_64> mersenne64; std::thread t64([&mersenne64](){mersenne64.run();}); mersenne32.run();
print_bitcount(mersenne32.getStats().getBitcount()); t64.join(); print_bitcount(mersenne64.getStats().getBitcount()); }
So, I run the actual code in parallel, the 64bit code in a std::thread, and the other in the main thread. The print_bitcount method simply prints the result to stdout via cout.
Results
As expected the distribution is for 32 and 64 bit quite even. I learned though, that std::random_device isn't working properly on MinGW. That's why the randomness currently is based on std::time(0), std::random_device would be a bit better though. One thing I want to measure now is the actual time it takes to generate 16 random bytes for the 32 and 64 bit versions.
Also, I do not write a lot of generic code, as often Qt already is all I need to write the programs that are currently running Meeting C++ (which is my main real-world programming task). It was once again nice to see, how powerful generic code is, and how it allows you to easily reuse code for different types.
Join the Meeting C++ patreon community!
This and other posts on Meeting C++ are enabled by my supporters on patreon!