This talk abstracts the proposals to parallelization( P2300,P2500, P3179 ) and its affect on oneDPL (Intel ParallelSTL). Parallelism and asynchronous task execution is predominant in most of hardware accelerators (namely GPUs, FPGAs etc), and while std provides implementations of mutex, semaphores, thread , atomic libraries to induce parallelism, it is often difficult to use them in accelerator concepts. P2300 focuses on senders , receivers for async task scheduling across a threadpool, and this concept can be advocated for accelerator interfaces as well. With a user defined execution policy targetting a specific accelerator, senders receivers can be used to schedule parallel tasks. Multiple senders can be used to create a dependency graph (which is mainly used in most accelerators) with existing policies. This talk focuses on how to induce parallelism in the concept of accelerators and use std::execution with policies to cater to standard algorithms as well as ranges.
This talk also focuses on adaptation of parallel algorithms, execution contexts and policies for accelerator backends - such as oneDPL and Thrust . Platfrom specific execution policies add on the std::execution principles to extend over an existing execution context . This follows the existing practice of using a single argument to specify both "where" and "how" to execute an algorithm.
This forces binding a policy with a context prior to the algorithm invocation, allowing for better handling of possible mismatches between the two in case the execution context cannot properly support the semantics of the policy, as well as for reuse of the resulting policy-aware scheduler instance. Along with this, this includes ranges parallel algorithms with execution policies to fuse certain computation calls and reduce the overhead of computation, with minimal changes to semantics over std::execution. Serial and parallel execution policies (ranges) will be added here.
This talk also showcases practical examples using oneDPL using these principles to solve computationally intensive problems on GPUs and other accelerators.