Want fast C++? Know your hardware!
As C++ evolves, it provides us with better and more powerful tools for optimal performance. But often, knowing the language very well is not enough. It is just as important to know your hardware. Modern computer architectures have many properties that can impact the performance of C++ code, such as cache locality, cache associativity, true and false sharing between cores, memory alignment, the branch predictor, the instruction pipeline, denormals, and SIMD.
In this talk, I will give an overview over these properties, using C++ code. I will present a series of code examples, highlighting different effects, and benchmark their performance on different machines with different compilers, sometimes with surprising results.
The talk will draw a picture of what every C++ developer needs to know about hardware architecture, provide guidelines on how to write modern C++ code that is cache-friendly, pipeline-friendly, and well-vectorisable, and highlight what to look for when profiling it.
Speaker: Timur Doumler