So far, our programs have been constructed as a single stream of instructions that run sequentially. This form of execution is the easiest to understand, but rarely the most performant.

In most environments where our code will be executed, it is capable of running multiple processes concurrently. For example:

a modern CPU can have dozens of cores
a modern GPU has thousands of cores
our code may even be executed in environments with multiple CPUs or GPUs, such as in a high-end workstation or across multiple machines in a data center

This capability is sometimes referred to as concurrency or parallelism and our programs can be orders of magnitude faster if we design them to take advantage of it.

Threads

In programming, a stream of execution is often called a thread. So far, our programs have been single-threaded - the thread starts with our main function, and instructions are executed in order, one at a time, until our program ends.

To use concurrency, we can look for opportunities to break our program into multiple threads. For example, we might decide to create a separate thread to handle audio.

The user's operating system can then run that thread on a separate core from our main thread. And, now that the audio logic has been removed from the main thread, it has less work to do, so it can update and respond to user input faster.

We have a dedicated chapter on multi-threading later in the course, which covers creating and managing threads directly. This lesson is focused on multi-threading within existing standard library algorithms. Many of these algorithms can implement multi-threading for us, using execution policies.

Sequential Execution Policy

By default, the algorithms we use execute sequentially. We demonstrate this below using the for_each() algorithm as we've done before.

Note, we're not using the range-based variation of for_each() - we're using the iterator version. As of C++23, range-based algorithms do not yet support execution policies, but that is likely to change in later language versions.

We've simulated a slow-running function by including a one-second pause within the function body:

1#include <algorithm>
2#include <iostream>
3#include <vector>
4
5void Log(int Number){
6  using namespace std::chrono_literals;
7  std::this_thread::sleep_for(1s);
8  std::cout << "Number: " << Number << '\n';
9}
10
11int main(){
12  std::vector Numbers{1, 2, 3};
13
14  std::for_each(Numbers.begin(), Numbers.end(),
15                Log);
16}

Running our program we'll see that it runs in sequence. Our program takes approximately 3 seconds to run, with each function call completing approximately a second apart:

1Number: 1
2Number: 2
3Number: 3

Parallel Execution Policy

To change our execution policy to enable our program to run in parallel, we provide an execution policy as the first argument.

Execution policies are available within the <execution> header, and the parallel policy is available as std::execution::par:

1#include <algorithm>
2#include <iostream>
3#include <vector>
4#include <execution>
5
6void Log(int Number){
7  using namespace std::chrono_literals;
8  std::this_thread::sleep_for(1s);
9
10  // There is now a bug here - see explanation below
11  std::cout << "Number: " << Number << '\n';
12}
13
14int main(){
15  std::vector Numbers{1, 2, 3};
16
17  std::for_each(std::execution::par,
18                Numbers.begin(), Numbers.end(),
19                Log);
20}

Now, all our function calls should run in parallel. In most environments, this means our program should complete in around one second, with all our functions logging to the console at around the same time:

1Number: 3
2Number: 1
3Number: 2

Implications of Parallelism

When we're designing our programs to be multi-threaded, there are several important implications we have to keep in mind.

The first is perhaps obvious from the output of the previous program - we don't know the order in which threads will complete. This means, every time we run our program, we could get a different output.

We might get:

1Number: 1
2Number: 3
3Number: 2

Or:

1Number: 2
2Number: 3
3Number: 1

Or any other combination

Another implication is that, when threads running concurrently are accessing or operating on the same resources, we don't know the order that this will happen.

In the previous code example, we highlighted a seemingly innocuous expression as having a bug:

1std::cout << "Number: " << Number << '\n';

Specifically, if we run our program enough times, we'll eventually get an output that looks something like this:

1Number: Number: 1
2Number: 3
32

This is because we have multiple threads manipulating a shared resource - std::cout. Each thread is performing a sequence of three operations on std::cout - note the three << operators.

When writing sequential programs, this is fine. We know each function call will complete its three operations before the next function call starts.

But that is no longer guaranteed when we're writing "thread-safe" code - that is, code that can be run in parallel.

Our code is not currently thread-safe, because any other thread can start writing to std::cout in the middle of our sequence of three << operations.

To fix this, we need to make our process atomic.

The previous simple example can be made atomic by formatting the desired output within the local thread, and then sending it to std::cout all at the same time:

1#include <algorithm>
2#include <iostream>
3#include <vector>
4#include <execution>
5
6void Log(int Number){
7  using namespace std::chrono_literals;
8  std::this_thread::sleep_for(1s);
9
10  std::cout << std::format("Number: {}\n",
11                           Number);
12}
13
14int main(){
15  std::vector Numbers{1, 2, 3};
16
17  std::for_each(std::execution::par,
18                Numbers.begin(), Numbers.end(),
19                Log);
20}

1Number: 3
2Number: 2
3Number: 1

This example is using std::format(), which we covered in a dedicated lesson here:

String Interpolation

A detailed guide to string formatting using C++20's std::format(), and C++23's std::print()

View Related Lesson

We'll cover the full suite of tools to manage shared resources in multi-threaded contexts in a dedicated chapter later in the course.

Multithreaded Debugging and `std::thread::id`

The main way we can understand how our threads are being scheduled is through our debugger. The exact process for doing this depends on what IDE we're using.

A guide to multithreaded debugging in Visual Studio is available on the Microsoft site here.

Alternatively, to determine what thread a process is running in, we can also call std::this_thread::get_id()

This returns a std::thread::id - an integer which we can use to distinguish threads:

1#include <algorithm>
2#include <iostream>
3#include <vector>
4#include <execution>
5
6void Log(int Number){
7  using namespace std::chrono_literals;
8  std::this_thread::sleep_for(1s);
9  std::cout << std::format(
10    "Thread: {}, Number: {}\n",
11    std::this_thread::get_id(), Number);
12}
13
14int main(){
15  std::vector Numbers{1, 2, 3};
16
17  std::cout << "Main Thread: "
18    << std::this_thread::get_id() << '\n';
19
20  std::for_each(std::execution::par,
21                Numbers.begin(), Numbers.end(),
22                Log);
23}

Running this code, we see that the first execution of Log was run on the same thread as our main function, whilst the other two executions each got their own thread:

1Main Thread: 19668
2Thread: 19668, Number: 1
3Thread: 57160, Number: 3
4Thread: 51188, Number: 2

Parallel Execution Policy doesn't Guarantee Parallel Execution

By specifying a parallel execution policy, we are stating that our code is written such that it can be parallelized.

It is left to the operating system to decide what to do with that information. It doesn't necessarily mean it always will be parallelized.

For example, if we make our function run faster, by removing the one-second sleep, we're likely to see fewer threads, or possibly even a sequential execution:

1#include <algorithm>
2#include <execution>
3#include <iostream>
4#include <vector>
5
6void Log(int Number){
7  // using namespace std::chrono_literals;
8  // std::this_thread::sleep_for(1s);
9  std::cout << std::format(
10    "Thread: {}, Number: {}\n",
11    std::this_thread::get_id(), Number);
12}
13
14int main(){/*...*/};

1Main Thread: 31108
2Thread: 31108, Number: 1
3Thread: 31108, Number: 2
4Thread: 31108, Number: 3

Even though we stated our algorithm could be run in parallel, the platform decided it would be more performant to just run it sequentially.

Multithreading requires additional operations - creating threads and communicating between them has an overhead. If our function calls are fast, the platform may decide that just running our algorithm sequentially is more performant.

Summary

In this lesson, we explored how to harness the power of parallelism in C++ by utilizing execution policies with standard library algorithms. This can enhance program performance through concurrent execution.

We delved into the subtleties of multithreaded programming, demonstrating how to write thread-safe code and the implications of parallel execution on program behavior.

Key Takeaways

The difference between sequential and parallel algorithms, and how modern hardware supports these concepts.
The use of execution policies to enable parallel algorithm execution.
The concept of threads in programming and the distinction between threads and cores.
The default sequential execution policy and how to explicitly switch to parallel execution using std::execution::par.
The implications of parallelism, including non-deterministic execution order and the need for thread-safe programming practices.
The significance of atomic operations and how to avoid race conditions by ensuring operations on shared resources are atomic.
The introduction of std::format for safer string formatting in a multithreaded context.
Debugging multithreaded applications, with a focus on identifying which thread a given piece of code is executing on.
Understanding that specifying a parallel execution policy does not guarantee parallel execution, as the decision is up to the implementation and runtime environment.

Parallel Algorithm Execution

Threads

Sequential Execution Policy

Parallel Execution Policy

Implications of Parallelism

String Interpolation

Multithreaded Debugging and `std::thread::id`

Summary

Key Takeaways

Search Algorithms

Professional C++

Questions & Answers

Parallel Algorithm Execution

Threads

Sequential Execution Policy

Parallel Execution Policy

Implications of Parallelism

String Interpolation

Multithreaded Debugging and std::thread::id

Summary

Key Takeaways

Search Algorithms

Questions & Answers

Multithreaded Debugging and `std::thread::id`