So far, we've generally considered basic functions as the unit of work in our designs:

1void ProcessData() {
2  DoWork();
3}

This line of code transfers control of our execution over to the DoWork() function body, and we get control back once DoWork() returns.

There is another way we can set this up. The <future> header lets us declare asyncronous units of work using std::async, and we then wait for it to complete using the wait() method.

Our previous DoWork() example could be written like this:

1#include <future>   // for std::async
2
3void ProcessData() {
4  auto task = std::async(DoWork);
5  task.wait();
6}

Why would we prefer this approach over the simple DoWork() expression? The key benefit is that we can do other work whilst we wait for our task to complete.

Below, our Sequential() function performs TaskA() and then TaskB(), whilst our Concurrent() function can do both tasks at the same time:

1void Sequential() {
2  TaskA();
3  TaskB();
4  
5  // Both tasks are now complete
6  // ...
7}
8
9void Concurrent() {
10  auto task = std::async(TaskA);
11  TaskB();
12  task.wait();
13  
14  // Both tasks are now complete
15  // ...
16}

If each of our tasks require 200ms of CPU time, Sequential() will take 400ms to complete, whilst Concurrent() will do the same work in approximately 200ms.

As you may have guessed, the magic that makes this happen is multithreading. Our CPU has multiple cores. Our Sequential() example is using only one of them, whilst Concurrent() is using two:

We'll talk more about multithreading soon, but let's briefly cover what std::async() returns.

The `std::future` Type

The std::async() function doesn't return the result of the asynchronous operation directly. Instead, it returns a std::future object. Think of std::future as a promise or a handle to a result that will be available at some point in the future.

It's a placeholder object that will eventually hold the return value (or an exception) from the task we provided to std::async():

1// DoHeavyWork() returns nothing / void
2// So std::async(DoHeavyWork) returns a std::future<void>
3std::future<void> task1 = std::async(DoHeavyWork);
4
5// GetHeavyString() returns a std::string
6std::future<std::string> task2 = std::async(GetHeavyString);

This separation allows the calling thread to continue doing other work immediately after launching the asynchronous task, rather than waiting for the result.

The std::future object gives the calling function a way to check on the status of the asynchronous task, wait for it to complete, and retrieve its result when it's ready.

Blocking

When a function is waiting on another function to complete, the waiting function is sometimes described as being blocked. With a regular function call, the caller is blocked until the function it called completes:

1void FunctionA() {
2  // A blocking invocation - execution of FunctionA
3  // is blocked until FunctionB returns
4  FunctionB();
5  
6  // Function B is complete and I can now proceed
7}

If FunctionB is instead launched as an asyncronous task, FunctionA is blocked from the moment it calls wait() until the task is complete:

1void FunctionA() {
2  // A non-blocking invocation
3  std::future<void> task = std::async(FunctionB);
4  
5  // I am not blocked by FunctionB. I can do more work here, but
6  // I should be aware that FunctionB is probably still running
7  DoMoreWork();
8  
9  // If I want to ensure that FunctionB is complete, before I
10  // proceed, I should call wait
11  // Calling wait() will block me until FunctionB completes
12  task.wait();
13  
14  // FunctionB is complete and I can now proceed
15}

Excessive blocking can sometimes represent an inefficiency, however, some blocking is also a necessary part of multithreaded design.

Blocking execution using wait() is how we ensure a task is complete before we proceed to a part of the program that requires that task to be complete:

It is safe to call wait() on a std::future that is already complete. In that situation, wait() will immediately return control and let us proceed.

Getting the Result

If the asynchronous task returns a value, you can retrieve it using the get() method. Calling get() on a std::future object will:

Block the calling thread if the asynchronous task has not yet completed, just like .wait() does.
Return the result of the task at the end of the block.
Throw any exception that the asynchronous task might have thrown.

It's important to note that wait() can be called multiple times on a std::future, but get() can only be called once. Subsequent get() calls will result in undefined behavior. If you need to access the result multiple times, you should store it in a local variable after the first call.

Below, our Sort() function starts an asyncronous task to sort the array provided as an argument. When it is complete, it returns a std::string with a success message:

1#include <vector>
2#include <future>
3#include <thread>
4#include <chrono>
5#include <algorithm>
6#include <string>
7
8struct Player {
9  int score;
10};
11
12std::future<std::string> Sort(std::vector<Player>& players) {
13  return std::async([&players](){
14    std::ranges::sort(players, {}, &Player::score);
15    return std::string("Sort complete");
16  });
17}
18
19void ProcessAndGetResult() {
20  std::vector<Player> players(1000);
21
22  // Launch the task
23  std::future<std::string> asyncResult = Sort(players);
24
25  // Simulating other work...
26  std::this_thread::sleep_for(std::chrono::milliseconds(50));
27  
28  // Wait until the task is finished, and get the result
29  // If the task isn't done yet, this line blocks until it is.
30  std::string status = asyncResult.get();
31}

Multithreading

The key concept behind using std::async and std::future is multithreading. Multithreading is a programming model where multiple sequences of instructions, called threads, can run concurrently within the same program.

In our Sort() example, we use std::async() to identify a discrete block of work that can be performed independently of the work we're doing in the "main thread". These secondary threads are typically called "worker threads".

CPU Cores

Modern consumer CPUs tend to have 8-24 cores, and each each core can physically work on 1-2 threads at the same time.

However, across all the programs running on a system, there might be thousands of threads looking to be processed. The operating system juggles threads in and out of cores to keep everyone happy.

Most threads aren't very demanding, so there's usually a lot of capacity left over.

The operating system will happily fill these slots with our program's work to help it complete faster.

But, when all of our work is structured as a single thread, the best it can do is put that thread in a single core and keep it there. To use more capacity, we need to use more threads.

The Risks

Memory management is currently the reigning champion of C++ bugs, but only because the true heavyweight contender - concurrency - is a beast that most developers dare not wake. Concurrency is very difficult.

It introduces whole new classes of problems and levels of complexity. When there are 20 different parts of our program running at the same time, interacting with the same resources, how can we possibly understand the overall state?

We can read a variable, read the exact same variable again, and it might have changed:

1std::cout << x; // logs 32
2std::cout << x; // logs 7298

One thread can be iterating through an array of players; a second wants to sort the array by score; a third wants to update some scores. Things get chaotic fast.

Highly concurrent projects invest a lot of engineering effort into building the systems to manage all of this complexity.

The `std::async` Abstraction

We're using the simple std::async helper in this introductory course, which lets us weave concurrency patterns into our code incrementally in a relatively safe and controlled way.

Every time we have a function that is doing some heavy work, we can ask "is there an opportunity to offload some of this to another thread?"

Using concurrency in the small, controlled environment of a single function body makes it less likely we'll make a mistake. We trigger the async behavior - creating the std::async - and we return to the safe, synchronous world - calling .wait() - all within the same block of code.

In that context, it's usually pretty clear when our threads might be in conflict. Consider this attempt to sort a list of players in the background that also tries to to read and update that same data.

Because it is all happening within the same function body, it's easier to spot that we might be doing something problematic:

1#include <vector>
2#include <algorithm>
3#include <future>
4#include <string>
5#include <iostream>
6
7struct Player {
8  int score;
9};
10
11void Sort(std::vector<Player>& players) {
12  auto task = std::async([&players](){
13    std::ranges::sort(players, {}, &Player::score);
14    return std::string("Sort complete");
15  });
16  
17  // Wait, my async task is working on this data!
18  Player PlayerOne = players[0]; 
19
20  // Wait, my async task is working on this data!
21  players[0].score += 10; 
22
23  task.wait();
24}

This is the classic data race problem. We have no idea what PlayerOne is; we have no idea which player had their score increased; and we have no idea if our container even sorted properly.

However, because we can see the async launch and the data access in the same scope, these bugs are easier to spot.

It also means the async behavior doesn't "leak" into other parts of the program. When our function waits for its async tasks to complete before returning control, that function just becomes a regular, synchronous function as far as the outside world is concerned.

Someone calls it, and when they get control back, they know the work is done. From their perspective, it's just a normal function:

1void Sort(std::vector<Player>& players) {
2  auto task = std::async([&players](){
3    std::ranges::sort(players, {}, &Player::score);
4    return std::string("Sort complete");
5  });
6  
7  // Do other work if needed...
8  
9  // Block here - don't return control until
10  // the sort is complete
11  task.wait();
12}
13
14void SafeProcess(std::vector<Player>& players) {
15  // This function spawns threads, but it waits
16  // for them to finish
17  Sort(players); 
18
19  // I now know players have been sorted and can
20  // continue my work safely
21  int lowest_score = players[0].score;
22}

Increasing the Capability (and Danger)

Let's look at the next step up in capability, with an associated increase in design complexity and risk. The function that creates the task doesn't have to wait for it to complete before returning control. Instead, it can return the std::future.

This allows the caller to decide when to wait.

1// Returns a future, doesn't block
2std::future<void> Sort(std::vector<Player>& players) {
3  return std::async([&players](){
4    std::ranges::sort(players, {}, &Player::score);
5    return std::string("Sort complete");
6  });
7}

This is an important, fundamental change in how the function works, and the callers need to be aware. The return type gives them a hint, but it's common to also reflect the async nature within the function name.

This is usually done by adding a word like async to the name:

1std::future<void> Sort     (std::vector<Player>& players) { 
2std::future<void> SortAsync(std::vector<Player>& players) { 
3  return std::async([&players](){
4    std::ranges::sort(players, {}, &Player::score);
5    return std::string("Sort complete");
6  });
7}

At the call site, the implications of using this function should now be clear:

1void Caller(std::vector<Player>& players) {
2  // This might return faster than Sort()
3  SortAsync(players);
4  
5  // ...but we don't know if the work has been completed yet
6  
7  // We have no idea what players[0] is because
8  // the data may still be getting sorted
9  int lowest_score = players[0].score;
10}

The key benefit of our change is that it now gives the caller flexibility. The original Sort() forced them to wait, but SortAsync() lets them decide:

1#include <vector>
2#include <future>
3
4void ProcessPlayers(std::vector<Player>& players) {
5  // Option 1: I still want to wait before continuing
6  SortAsync(players).wait();
7  
8  // This is now safe
9  int lowest_score = players[0].score;
10
11  // Option 2: I want to wait, and I also want the result
12  std::string result = SortAsync(players).get();
13
14  // Option 3: I want to do other stuff while I wait
15  std::future<std::string> task = SortAsync(players);
16  
17  DoOtherStuff();
18  
19  // Sync up after using option 3
20  task.wait();
21}

Don't underestimate the risk of concurrency-related issues. As the asynchronous behavior begins to span across logical boundaries like function bodies, the complexity gets increasingly difficult to manage.

The nature of the resulting bugs makes them particularly insidious, too. It's very easy to write code that assumes task A will complete before task B without even realizing we've created a race. Even worse, that assumption might be correct 99.99% of the time, so we can test our code a thousand times and never encounter the bug.

But when we ship it and its now running billions of times, those 0.01% outcomes will be happening very frequently.

More Examples

Let's look at some more advanced uses and considerations for std::async.

Launch Policies

By default, std::async doesn't necessarily use multithreading. It's just a way to package up some work that could be multithreaded. It tells the underlying standard library implementation: "here's a discrete task that can safely be run on another thread, if you want".

This default flexibility allows std::async implementations to be conservative with thread creation if system resources are scarce. However, it can lead to unpredictable behavior, especially if you expect the task to start on a new thread immediately.

If you want to force the task to run on a new thread, you can specify the std::launch::async policy as the first argument:

1std::future<void> SortAsync(std::vector<Player>& players) {
2  // Explicitly request a new thread
3  return std::async(std::launch::async, [&players](){ 
4    std::ranges::sort(players, {}, &Player::score);
5    return std::string("Sort complete");
6  });
7}

This is typically more useful that the default behavior. When we explicitly set up an asyncronous task, we're usually doing it because we know it's going to take a while to complete. In such cases, std::launch::async is preferred because it guarantees immediate (or near-immediate) execution on a new thread, making the concurrency behavior explicit and predictable.

Relying on the default makes the behavior of our program less deterministic. This is especially true if we want our code to be portable, as different implementations have different criteria for deciding when to create new threads. One might be more conservative than another, and we don't want the performance profile of our program to significantly change depending on which compiler built it.

Multiple Async Tasks

Our examples have been based around a single function creating a single async task, but we have as much flexibility as we want. We can spawn many async tasks, and those async tasks can also create their own.

A common pattern is to launch several independent tasks and then wait for all of them to complete. This is ideal for scenarios where you can split a larger problem into smaller, parallelizable chunks:

1void ProcessFourThings() {
2  auto f1 = std::async(std::launch::async, []{ DoWork(); });
3  auto f2 = std::async(std::launch::async, []{ DoWork(); });
4  auto f3 = std::async(std::launch::async, []{ DoWork(); });
5  auto f4 = std::async(std::launch::async, []{ DoWork(); });
6
7  // Wait for all
8  f1.wait();
9  f2.wait();
10  f3.wait();
11  f4.wait();
12}

We can do this dynamically, perhaps using a std::vector to manage the std::future objects:

1#include <vector>
2#include <future>
3#include <algorithm>
4
5void ProcessBatch(int count) {
6  std::vector<std::future<void>> tasks;
7  tasks.reserve(count);
8
9  // 1. Launch all tasks
10  for (int i = 0; i < count; ++i) {
11    tasks.push_back(std::async(std::launch::async, []{ 
12      DoWork(); 
13    }));
14  }
15
16  // 2. Wait for all tasks
17  for (auto& task : tasks) {
18    task.wait();
19  }
20}

However, the creation of these tasks and the associated thread management has an overhead. If the work we need to do is relatively small, the cost of packaging it as a std::future can be greater than just doing the work.

As always, if something is performance-critical, we should benchmark our approach.

Benchmarking Multithreaded Code

When using multithreaded code in , there is some nuance we should be aware of in the results.

Let's imagine we have an algorithm that takes 10 milliseconds to run. We update it so 9 milliseconds of that CPU time is now done on a different thread:

1#include <benchmark/benchmark.h>
2#include <chrono>
3#include <thread>
4#include <future>
5
6using namespace std::chrono;
7
8void spin_for(duration){/*...*/}
15
16// Terrible code
17void SingleThreaded() {
18  spin_for(10ms);
19}
20
21// I can fix it
22void MultiThreadedWork() {
23  auto task = std::async([](){
24    spin_for(9ms);
25  });
26  spin_for(1ms);
27  task.wait();
28}
29

The benchmarks are going to be very impressed with our work. They will report we've made our code 10x more CPU efficient:

1-------------------------------------
2Benchmark              Time       CPU
3-------------------------------------
4BM_SingleThreaded   10.0 ms   10.0 ms
5BM_MultiThreaded    9.03 ms   1.05 ms

This is because the library prioritizes performance on the main thread, not all threads. In most real world scenarios, we share that priority, so this reported 10x improvement is often still valid.

This is because, when our performance becomes bottlenecked by the CPU, that is usually because of the main thread. There are other cores available with plenty of capacity - our program is just not structured in a way that can make use of it.

If we found a way to rewrite some part of the program such that that 90% of the work that was previously on the main thread can be moved out, that's pretty similar to 90% of the work just disappearing. We really have improved the real-world, observable performance.

Measuring Performance Across all Threads

If main thread performance isn't the priority, we can tell the library that we care about CPU usage across all threads using MeasureProcessCPUTime():

benchmarks/main.cpp

1// ...
2BENCHMARK(BM_SingleThreaded)->Unit(benchmark::kMillisecond);
3BENCHMARK(BM_MultiThreaded)->Unit(benchmark::kMillisecond);
4
5BENCHMARK(BM_MultiThreaded)
6  ->Unit(benchmark::kMillisecond)
7  ->MeasureProcessCPUTime();

Now, the library reports that 10ms of CPU time was consumed across all cores - 1ms on the main thread and 9ms on the worker thread. The real-world processing time is still only 9ms due to the 1ms of concurrency:

1-------------------------------------------------
2Benchmark                          Time       CPU
3-------------------------------------------------
4BM_SingleThreaded               10.0 ms   10.0 ms
5BM_MultiThreaded                9.03 ms   1.05 ms
6BM_MultiThreaded/process_time   9.03 ms   10.0 ms

Summary

In this lesson, we stepped into the world of concurrency, using multiple CPU cores to improve throughput.

We used std::async as our entry point to asynchronous tasks. It allows us to launch operations that can run on a separate thread, offloading heavy work from the main execution path.
std::future acts as a proxy for the result of an asynchronous task. It provides methods like wait() to block until completion and get() to retrieve the return value (or propagate exceptions).
We explored the concept of multithreading, understanding its benefits while also acknowledging the major risks.
std::launch::async can be used to explicitly force task execution on a new thread for predictable parallel behavior.
To benchmark multi-threaded code, we should distinguish between main thread CPU time and total process CPU time.

We have now seen how to use std::async for ad-hoc parallel tasks. But the C++ Standard Library offers even more specialized tools for parallelizing common algorithms.

In the next lesson, we will introduce execution policies, allowing us to simply tag standard algorithms like std::sort() with std::execution::par to automatically distribute their work across multiple CPU cores, without managing threads manually.

Asynchronous Tasks with `std::async`

The `std::future` Type

Blocking

Getting the Result

Multithreading

CPU Cores

The Risks

The `std::async` Abstraction

Increasing the Capability (and Danger)

More Examples

Launch Policies

Multiple Async Tasks

Benchmarking Multithreaded Code

Measuring Performance Across all Threads

benchmarks/main.cpp

Summary

Parallel Algorithms and Execution Policies

Introduction to Parallelism

Asynchronous Tasks with std::async

The std::future Type

Blocking

Getting the Result

Multithreading

CPU Cores

The Risks

The std::async Abstraction

Increasing the Capability (and Danger)

More Examples

Launch Policies

Multiple Async Tasks

Benchmarking Multithreaded Code

Measuring Performance Across all Threads

benchmarks/main.cpp

Summary

Parallel Algorithms and Execution Policies

Asynchronous Tasks with `std::async`

The `std::future` Type

The `std::async` Abstraction