Counting bits

published at 21.02.2015 23:25 by Jens Weller

I did a bit of fun coding. I'm currently thinking on how to generate random bytes. The mersenne twister RNG is known to give very good randomness, so it would be a possible an easy source. But first, I wanted to know, how random is the mersenne twister really? So, when counting the bits in the result of a few thousand calls to a rng, the distribution should be even. So, today I wrote code that counts bits, and tested it on the mersenne twister.

Counting bits

Each bits represents a power of two as a numerical value. So, first thing is to generate an array of exact those power of twos. This array servers as a bitmask, and as its a very easy and basic calculation, I wanted to achieve this with constexpr. I've never used it before, and my first attempt with a loop failed, simply because that is only allowed from C++14 on. So I went with recursion, as other examples also show this path. Forgetting that a simple shift operation would do the same:

constexpr std::uint_fast64_t power_of_2(unsigned int pow)
{
    return 1ull << pow; //return pow == 0 ? 1ull : 2ull * power_of_2(pow-1);
}

Next, the class is needed, which does the actual bit counting, as the underlying type can be different (32bit vs. 64bit e.g.), I implemented it as a template, which holds an array of power of 2 values:

template
class bitstats<class int_type>
{
    static_assert(std::numeric_limits<int_type>::is_integer,"int_type must meet numeric_limits::is_integer");
    std::vector bitcount{sizeof(int_type)*CHAR_BIT,0};
    static constexpr std::uint_fast64_t bitvalue[64]={1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192
                                                      ,power_of_2(14),power_of_2(15),power_of_2(16),...

First, a static_assert checks if the type is an integer with std::numeric_limits<T>::is_integer. Maybe the support for operator & would be enough, but for now I think its good to only let the class compile for integers. Next I need to know how many bits the type has, so sizeof * CHAR_BIT should give me the correct result. A vector is created, which contains an entry for every bit. Next is the array containing the power of 2 values, maybe I should factor this out, as its independent from the stats class. The only public function is the actual counting:

void count(int_type n)
{
  for(size_t i =0; i < bitcount.size(); ++i)
  {
    if(n & bitvalue[i])
      bitcount[i]++;
  }
}

And this is already the code which does the bit counting. The if is where the test is happening, if that bit is set. As I said, this class is just a fun side project, I decided to test the distribution of the bits with the 32 and 64 bit versions of mersenne twister. Whose returntype is std::uint64_fast_t, the bitvalue arrays type.

As I want to test RNGs, I need a small template function, which fills a vector with random numbers:

template<class RNG, class uint_type = typename RNG::result_type>
std::vector<uint_type> fillRandom(RNG& rng,size_t num)
{
    std::vector<uint_type> vec(num);
    std::generate(vec.begin(),vec.end(),rng);
    return vec;
}

Testing the 32 and 64 bit versions, and maybe later also other RNGs, it makes sense to also setup the test class as a template:

template<class RNG>
class random_test
{
    bitstats<typename RNG::result_type> stats;
    RNG rng;
public:
    random_test(typename RNG::result_type rd = std::time(0)):rng(rd){}
    void run()
    {
        auto vec = fillRandom(rng,5000000);
        for(auto& i: vec )
            stats.count(i);
    }
    const bitstats<typename RNG::result_type>& getStats() const{return stats;}
};

The class instantiates the RNG with a seed, and the run method does the work. All thats left is to put things together in the main function:

int main()
{
    random_test<std::mt19937> mersenne32;
    random_test<std::mt19937_64> mersenne64;
    std::thread t64([&mersenne64](){mersenne64.run();});
    mersenne32.run();
print_bitcount(mersenne32.getStats().getBitcount()); t64.join(); print_bitcount(mersenne64.getStats().getBitcount()); }

So, I run the actual code in parallel, the 64bit code in a std::thread, and the other in the main thread. The print_bitcount method simply prints the result to stdout via cout.

Results

As expected the distribution is for 32 and 64 bit quite even. I learned though, that std::random_device isn't working properly on MinGW. That's why the randomness currently is based on std::time(0), std::random_device would be a bit better though. One thing I want to measure now is the actual time it takes to generate 16 random bytes for the 32 and 64 bit versions.

Also, I do not write a lot of generic code, as often Qt already is all I need to write the programs that are currently running Meeting C++ (which is my main real-world programming task). It was once again nice to see, how powerful generic code is, and how it allows you to easily reuse code for different types.

Download the full code.