Conference | Live Schedule | Talks | Get your ticket | Slides | Set your timezone
Speed for free - current state of auto-vectorizing compilers

Stefan Fuhrmann
On Day 2 at 17:15 (CET/Berlin) in Track A [Saphir Room and online]
For about a decade now, the locos of compute power in modern CPUs has shifted.
The scalar execution units have far fallen behind the vector units (SIMD) within the same core.
Failure to utilize SVE2 or AVX512 may leave a whole order of magnitude of performance on the table.
C++ has no built-in primitives to directly express SIMD operations. You may either use target-specific
libraries and extensions or let the compiler figure it out automatically. If the latter works out, you get
a speed-up for free without touching your source code. The latest GCC and Clang releases extended the
capabilities of their auto-vectorizers and enabled them by default in higher optimization levels.
In this talk, we will take a look at the current state of these auto-vectorizers, check which code they
work well on and where they still struggle. We'll compare the size and performance of the generated binary
to that of hand-vectorized code using intrinsics. An estimate for the theoretical maximum possible performance
of the target hardware will serve as a benchmark.
Please login to comment