Catching ⬆️: Unicode for C++ in Greater Detail - 2 of 5

Speaker: JeanHeyd Meneide

Audience level: Intermediate | Advanced

It's 2019 and Unicode is still barely supported in both the C and C++ standards. From the POSIX standard requiring a single-byte encoding by default, heavy limitations placed in codecvt facets in C++, and the utter lack of UTF8/16/32 multi-unit conversion functions by the standard, the programming languages that have shaped the face of development from operating systems, embedded devices and mobile applications has pushed forward a world that is incredibly unfriendly to a world of text beyond ASCII.

Yet we exist in a world where people have already rolled their own solutions. Even with char8_t coming, char has already been chosen in many codebases as the canonical UTF8 code unit type and std::string has made rounds on many Linux environments as a UTF8 encoded string. There is a lot to be backwards compatible with; how do we make sure to keep current investments relevant while guiding people to the new age?

This talk is going to dive into the details of a new Text Library that is coming for potential inclusion into C++23. It is going to discuss what Study Group 16 -- the Unicode Arm of C++ -- is doing about the problems with text handling. A demonstration of some of the interface and flexibility goals behind the new additions to text in the Standard Library will be shown. And, finally, the talk will explore fundamental reasons behind the library's choice of container adaptors over new containers as well as the core power behind Encoding Objects and their ability to convey compile-time information to library maintainers.

This talk will build off of a previous talk (https://www.youtube.com/watch?v=BdUipluIf1E).