Word counting in C++11 - lessons learned

published at 31.01.2013 11:50 by Jens Weller
Save to Instapaper Pocket

This is a follow up post to my first Blogpost, where I showed a small word counting programm in C++11. It started as a small challenge, but ended up to be a good lesson to learn a lot about C++11. Which I'd like to share with you now. Some of them improved the code, some showed bugs in C++11 implementations. There has been a lot of feedback, and a very interesting discussion on facebook.

Also I did some work in implementing an InMemoryString class, which is able to make a copy of the string it is pointing at, and then take ownership. That way the allocation is only done once for each word in each thread. Still, this class is experimental, so you can switch its use on and off by defining USE_INMEMORY.

First to the changes and fixes I made thanks to the feedback I got. The biggest flaw I made, is I forgot to make the mutex mutable. This only worked, as I forgot to make isRunning() const. Also this puts up the question, wether the mutex should be part of the class, or if its better to have it outside. In our usecase I think its clear that it has to be part of the class, as no threading is exposed to the user, so the class itself should take care of the locking.

So, any further errors? Yes, there is one spot, which turns out to be the perfect usecase for auto in C++11, and was in my first implementation a possible nasty bug:


in.seekg(0,std::ios::end);
unsigned long size = std::streamoff(in.tellg());
in.seekg(0,std::ios::beg);

std::unique_ptr<char[]> data(new char[size]);
in.read(data.get(),size);

Do you spot the error? Here is a little hint. std::streamoff can return -1, as an error. Surprise, surprise. So, would long size actually fix this? Yes, maybe. The actual returntype of streamoff is not defined by the standard. So, auto is the perfect fit, it will always make sure that the right type is used. Also you need to add a test for size == -1, and return in this case:


in.seekg(0,std::ios::end);
auto size = std::streamoff(in.tellg());
if(size < 1)
    return;
in.seekg(0,std::ios::beg);

 I decided to also return if the file is empty. And you should add <cctype> as an include if you use std::isalnum.

So thats it? Almost :) Lets compile under VS12. As you remember, I could not compare to any other implementations, as they are eather windows only, or GCC lacking support for there C++11 regex features. And I still don't have VS12, so some else did compile my code. I don't have all his changes, so my new code is not 100% fixed for VS12. There were some problems with the compilation, which in detail you can see in the facebook comments.

GCC does some moves, which VC can't do, as it yet does not implement this the same way. So, for some reason, the Compiler generates also copys for moves, and requests the Copy Constructor and the op= to be implemented. The autogenerated Copy Constructor runs into a love affair with std::unique_ptr, Compiler Error! So, thats why added a definition for both Copy Constructor and op=. Also the datamember intialisation (aka bool m_foo = false) is not yet supported for VS, and neither is =delete.

So, lesson learned: C++11 is still heavily depending on the plattform you write for, this will improve in the future, and newer Compiler Versions with more and better C++11 support are needed. And counting words can be quite fun :D

And, here is the code.

Join the Meeting C++ patreon community!
This and other posts on Meeting C++ are enabled by my supporters on patreon!