Performance engineering - being friendly to your hardware

Speaker: Ignas Bagdonas

Audience level: [ Advanced ]

Practical software does not run in an abstract vacuum, it runs on underlying hardware platforms. Practical software engineering does not exist in an abstract vacuum either. The software layer sits in between the domain specific requirements on top and the underlying runtime platforms below. Many interesting developments have happened on all three of those layers over the years, and while contemporary hardware has gone a long way forward, it often suffers from the attention deficit caused by an overshadowing flood of advancements and “advancements” in the software part of the universe. This new shiny programming language is safe, performant, and solves a backlog of problems that have been dragging for long. While that new shiny programming paradigm automagically relieves from dealing with low level details and the toolchain is plain amazing. The hardware side brings into this fistfight a set of new architectures, ISAs, and hardware abstractions – just to stay on par with the software side. Looks perfect? What else would an engineer dream about, no?

Not really. Let’s take a look at the contemporary commodity hardware platforms of today, and also at the trendy software engineering waves of today, and try to sense how and why it could (and frequently does) cross out the potential benefits of hardware advancements – and what could be done to actually be friendly to your underlying hardware, and at what cost.  

This is a set of somewhat separate topics, bound together into a common logical set of performance engineering.

How do language constructs such as references, lambdas, inheritance, object representation, runtime checks, and selected STL examples map to the actual runnable platform level code, and at what cost.  

The notion of out-of-order execution, claims on whether OoO is superior and there is no need to look at the level of instruction selection. What specifically is out of order in the context of contemporary x86 platforms (surprisingly, it is not instructions), and what impact does it have to the overall performance. A brief look into where the complexity lies inside the contemporary high performance execution cores and sockets, and why aspects such as variable length instruction encoding are trivial to resolve.

Memory hierarchy operation, logical, physical, and geometrical address spaces, their relationship and translations, memory performance in virtual address spaces, and clever ways to hide capacitor array’s latency.

Vectorization and why it is still not there yet universally everywhere, and why it won’t be?

Data dependencies and what could be done about it.

How could one help the compiler to do the right thing, and yet more important – how could one stay away from making things more complex for the compiler to do the right thing.

Branching control.

ABI aspects, parameter passing, compilation unit scope and its impact.

The claims on the imminent obsolescence of x86 and how the new wave of ARM and RISC-V upstarts will overtake it once again after previous not so successful attempts.