Parallel Algorithm Execution
Multithreading in C++ standard library algorithms using execution policies
So far, our programs have been constructed as a single stream of instructions that run sequentially. This form of execution is the easiest to understand, but rarely the most performant.
In most environments where our code will be executed, it is capable of running multiple processes concurrently. For example:
- a modern CPU can have dozens of cores
- a modern GPU has thousands of cores
- our code may even be executed in environments with multiple CPUs or GPUs, such as in a high-end workstation or across multiple machines in a data center
This capability is sometimes referred to as concurrency or parallelism and our programs can be orders of magnitude faster if we design them to take advantage of it.
Threads
In programming, a stream of execution is often called a thread. So far, our programs have been single-threaded - the thread starts with our main
function, and instructions are executed in order, one at a time, until our program ends.
To use concurrency, we can look for opportunities to break our program into multiple threads. For example, we might decide to create a separate thread to handle audio.
The user's operating system can then run that thread on a separate core from our main thread. And, now that the audio logic has been removed from the main thread, it has less work to do, so it can update and respond to user input faster.
We have a dedicated chapter on multi-threading later in the course, which covers creating and managing threads directly. This lesson is focused on multi-threading within existing standard library algorithms. Many of these algorithms can implement multi-threading for us, using execution policies.
Sequential Execution Policy
By default, the algorithms we use execute sequentially. We demonstrate this below using the for_each()
algorithm as we've done before.
Note, we're not using the range-based variation of for_each()
- we're using the iterator version. As of C++23, range-based algorithms do not yet support execution policies, but that is likely to change in later language versions.
We've simulated a slow-running function by including a one-second pause within the function body:
#include <algorithm>
#include <iostream>
#include <vector>
void Log(int Number){
using namespace std::chrono_literals;
std::this_thread::sleep_for(1s);
std::cout << "Number: " << Number << '\n';
}
int main(){
std::vector Numbers{1, 2, 3};
std::for_each(Numbers.begin(), Numbers.end(),
Log);
}
Running our program we'll see that it runs in sequence. Our program takes approximately 3 seconds to run, with each function call completing approximately a second apart:
Number: 1
Number: 2
Number: 3
Parallel Execution Policy
To change our execution policy to enable our program to run in parallel, we provide an execution policy as the first argument.
Execution policies are available within the <execution>
header, and the parallel policy is available as std::execution::par
:
#include <algorithm>
#include <iostream>
#include <vector>
#include <execution>
void Log(int Number){
using namespace std::chrono_literals;
std::this_thread::sleep_for(1s);
// There is now a bug here - see explanation below
std::cout << "Number: " << Number << '\n';
}
int main(){
std::vector Numbers{1, 2, 3};
std::for_each(std::execution::par,
Numbers.begin(), Numbers.end(),
Log);
}
Now, all our function calls should run in parallel. In most environments, this means our program should complete in around one second, with all our functions logging to the console at around the same time:
Number: 3
Number: 1
Number: 2
Implications of Parallelism
When we're designing our programs to be multi-threaded, there are several important implications we have to keep in mind.
The first is perhaps obvious from the output of the previous program - we don't know the order in which threads will complete. This means, every time we run our program, we could get a different output.
We might get:
Number: 1
Number: 3
Number: 2
Or:
Number: 2
Number: 3
Number: 1
Or any other combination
Another implication is that, when threads running concurrently are accessing or operating on the same resources, we don't know the order that this will happen.
In the previous code example, we highlighted a seemingly innocuous expression as having a bug:
std::cout << "Number: " << Number << '\n';
Specifically, if we run our program enough times, we'll eventually get an output that looks something like this:
Number: Number: 1
Number: 3
2
This is because we have multiple threads manipulating a shared resource - std::cout
. Each thread is performing a sequence of three operations on std::cout
- note the three <<
operators.
When writing sequential programs, this is fine. We know each function call will complete its three operations before the next function call starts.
But that is no longer guaranteed when we're writing "thread-safe" code - that is, code that can be run in parallel.
Our code is not currently thread-safe, because any other thread can start writing to std::cout
in the middle of our sequence of three <<
operations.
To fix this, we need to make our process atomic.
The previous simple example can be made atomic by formatting the desired output within the local thread, and then sending it to std::cout
all at the same time:
#include <algorithm>
#include <iostream>
#include <vector>
#include <execution>
void Log(int Number){
using namespace std::chrono_literals;
std::this_thread::sleep_for(1s);
std::cout << std::format("Number: {}\n",
Number);
}
int main(){
std::vector Numbers{1, 2, 3};
std::for_each(std::execution::par,
Numbers.begin(), Numbers.end(),
Log);
}
Number: 3
Number: 2
Number: 1
This example is using std::format()
, which we covered in a dedicated lesson here:
String Interpolation
A detailed guide to string formatting using C++20's std::format()
, and C++23's std::print()
We'll cover the full suite of tools to manage shared resources in multi-threaded contexts in a dedicated chapter later in the course.
Multithreaded Debugging and std::thread::id
The main way we can understand how our threads are being scheduled is through our debugger. The exact process for doing this depends on what IDE we're using.
A guide to multithreaded debugging in Visual Studio is available on the Microsoft site here.
Alternatively, to determine what thread a process is running in, we can also call std::this_thread::get_id()
This returns a std::thread::id
- an integer which we can use to distinguish threads:
#include <algorithm>
#include <iostream>
#include <vector>
#include <execution>
void Log(int Number){
using namespace std::chrono_literals;
std::this_thread::sleep_for(1s);
std::cout << std::format(
"Thread: {}, Number: {}\n",
std::this_thread::get_id(), Number);
}
int main(){
std::vector Numbers{1, 2, 3};
std::cout << "Main Thread: "
<< std::this_thread::get_id() << '\n';
std::for_each(std::execution::par,
Numbers.begin(), Numbers.end(),
Log);
}
Running this code, we see that the first execution of Log
was run on the same thread as our main
function, whilst the other two executions each got their own thread:
Main Thread: 19668
Thread: 19668, Number: 1
Thread: 57160, Number: 3
Thread: 51188, Number: 2
Summary
In this lesson, we explored how to harness the power of parallelism in C++ by utilizing execution policies with standard library algorithms. This can enhance program performance through concurrent execution.
We delved into the subtleties of multithreaded programming, demonstrating how to write thread-safe code and the implications of parallel execution on program behavior.
Key Takeaways
- The difference between sequential and parallel algorithms, and how modern hardware supports these concepts.
- The use of execution policies to enable parallel algorithm execution.
- The concept of threads in programming and the distinction between threads and cores.
- The default sequential execution policy and how to explicitly switch to parallel execution using
std::execution::par
. - The implications of parallelism, including non-deterministic execution order and the need for thread-safe programming practices.
- The significance of atomic operations and how to avoid race conditions by ensuring operations on shared resources are atomic.
- The introduction of
std::format
for safer string formatting in a multithreaded context. - Debugging multithreaded applications, with a focus on identifying which thread a given piece of code is executing on.
- Understanding that specifying a parallel execution policy does not guarantee parallel execution, as the decision is up to the implementation and runtime environment.
Search Algorithms
An introduction to the 8 main searching algorithms in the C++ standard library, including find()
, find_if()
, find_if_not()
, find_first_of()
, adjacent_find()
, search_n()
, search()
, and find_end()
.