The Four Algorithm Families

The Producer-Consumer model of C++20 Ranges and the hardware implications of eager vs lazy evaluation.

Ryan McCombe
Published

The C++ Standard Library contains over 100 algorithms. Looking at the <algorithm> header documentation can be intimidating. You see names like mismatch, adjacent_find, transform_exclusive_scan, and lexicographical_compare.

Memorizing all of them is unnecessary. In reality, almost all of these algorithms are just variations of four fundamental patterns. If you understand these four shapes, you understand the entire library.

  1. Mapping: Changing the shape or content of data.
  2. Terminating: Producing a result or side effect.
  3. Searching: Finding specific elements within the data.
  4. Rule-Checking: Answering boolean questions about the data.

In this lesson, we will organize the standard library into this taxonomy. We will explore the mechanics of the new C++23 fold() algorithms, and we will look at how modern compilers perform kernel fusion - merging these patterns into single, ruthlessly efficient loops.

Producers and Consumers

In the C++ Ranges world, we can conceptually categorize data operations based on their role in a pipeline.

A Producer (or Lazy Adapter) does not actually execute code. It describes intent. When you call std::views::transform(), you are building a blueprint. You are telling the compiler: "When the data flows through here, transform it using this lambda."

A Consumer (or Eager Algorithm) starts the execution. It pulls the trigger. It demands data from the pipeline, forcing the producers to wake up, fetch memory, and perform their calculations.

We covered the mechanics of these pipelines in the previous course. In this chapter, we focus on the algorithms that actually do the work.

Mappers

The first family consists of operations that map an input range to an output range.

  • Transform: NNN \to N. Every input produces exactly one output.
  • Filter: NMN \to M. Inputs are conditionally passed through.

Eager variations of the transformation algorithm are available in the form of std::transform() or std::ranges::transform(). They copy data from one container, transform it, and place the copy in the a new container.

The eager version of filtering is std::copy_if() and std::ranges::copy_if(). They look at each element in a container, check if it meets the requirements, and copies it to another container if so.

By default, we should prefer to use the view-based versions of these algorithms: std::views::transform() and std::views::filter().

Kernel Fusion

Why do we prefer the view versions? It isn't just about syntax - it is about kernel fusion. If we chain three eager algorithms together, we are reading and writing to RAM three times.

// Eager (Bad)
std::vector<int> temp1;

// READ + WRITE
std::transform(
  src.begin(), src.end(),
  std::back_inserter(temp1), op1
);

std::vector<int> temp2;

// READ + WRITE
std::transform(
  temp1.begin(), temp1.end(),
  std::back_inserter(temp2), op2
);

If we use views, the logic is merged. The compiler sees the entire chain and fuses the operations into a single loop behind the scenes.

// Lazy (Good)
auto pipeline = src
  | std::views::transform(op1)
  | std::views::transform(op2);

// The consumer (e.g., a loop or fold) triggers execution
for (int i : pipeline) { ... }

In the lazy version, the CPU loads a piece of data into a register. It applies op1 to that register. Then it applies op2 to that same register. The data never leaves the CPU core. We perform multiple logical steps for the price of one memory access.

This is the "zero-cost abstraction" in action: high-level, composable code compiling down to the optimal assembly instructions.

Terminators

Eventually, the pipeline must end. We need to turn our stream of data into a result. This is the domain of the terminators.

Using std::ranges::for_each()

The simplest terminator is for_each(). It runs a function for every element in the range. This is strictly for side effects - printing to the console, writing to a socket, or updating a UI.

// Pipeline: Filter -> Transform -> Consumer
auto pipeline = players
  | std::views::filter(is_alive)
  | std::views::transform(get_name);

// Trigger: Print every name
std::ranges::for_each(pipeline, [](const std::string& name) {
  std::cout << name << "\n";
});

The std::ranges::for_each() function tends to have similar use-cases as a basic range-based for loop, but at a higher level of abstraction and with more compositional flexibility:

// Using a named function instead of an ad-hoc lambda:
std::ranges::for_each(pipeline, log_name);

// Using an ad-hoc pipeline:
std::ranges::for_each(
  players | std::views::filter(is_alive)
          | std::views::transform(get_name),
  log_name
);

The for_each() function's return value doesn't tend to be particularly useful. If we want something that iterates our range and boils it down to a single summary value (like a total, a report, or a hash), we need a fold.

Using std::ranges::fold_left()

For decades, C++ developers used std::accumulate() to combine collections of data into single values. C++23 introduces std::ranges::fold_left(). This is the modern, range-friendly way to reduce a sequence to a single value.

It takes three arguments:

  1. The input range.
  2. The initial value (the seed).
  3. The binary operation (the combiner).
#include <vector>
#include <algorithm> // for ranges::fold_left()

int main() {
  std::vector<int> nums{1, 2, 3, 4, 5};

  // Summing numbers
  // Seed: 0
  // Op:   (a, b) -> a + b
  int sum = std::ranges::fold_left(nums, 0, std::plus{});

  // Multiplying numbers
  // Seed: 1
  // Op:   (a, b) -> a * b
  int product = std::ranges::fold_left(nums, 1, std::multiplies{});
}

Advanced Folds and State Machines

Thinking of fold as just "sum" or "product" is limiting. Mechanically, a fold is a state machine. The "initial value" is your starting state. The "binary operation" is your transition function.

You can use fold_left() to parse strings, compute hashes, or merge complicated data structures. For example, let's create a sentence from a list of words:

#include <vector>
#include <string>
#include <algorithm>
#include <iostream>

int main() {
  std::vector<std::string> words{"Just", "use", "auto"};

  // We want to join these words into a string
  std::string sentence = std::ranges::fold_left(
    words,
    std::string{}, // Start with empty string
    [](std::string a, const std::string& b) {
      if (a.empty()) return b;
      // Combine words with a space between them
      return a + " " + b;
    }
  );

  std::cout << sentence; // "Just use auto"
}
Just use auto

Why fold_left()?

Why the "left" suffix? Because the order of operations matters - fold_left() is left-associative:

(((1+2)+3)+4)+5 (((1 + 2) + 3) + 4) + 5

There is also a fold_right() algorithm, which is right-associative:

1+(2+(3+(4+5))) 1 + (2 + (3 + (4 + 5)))

For integer addition, the result is the same. But for operations like subtraction or string concatenation, the direction is important. Left folding is almost always the direction we want:

// Input range: {"Just", "use", "auto"}
// Initial value: std::cout
// Operator: <<
// Left fold:
((std::cout << "Just") << "use" ) << "auto";

// Right fold: (compilation error)
"Just" << ("use" << ("auto" << std::cout));

Searching for Elements

Sometimes, you don't need to process the entire pipeline. You just need to find one specific thing.

The search family of algorithms acts like a consumer with an escape hatch. It pulls data from the pipeline, but the moment it finds a match, it stops.

This "early exit" property is important for performance. If you have a vector of 1,000,000 items and the target is at index 0, a search algorithm performs 1 operation. A fold or transform performs 1,000,000.

Using std::ranges::find() and find_if()

The std::ranges::find() algorithm looks for a value, whilst std::ranges::find_if() looks for a predicate (a lambda returning true).

They return an iterator to the first matching element. If no match is found, they return the sentinel - usually end().

std::vector<int> ids{100, 205, 12, 999};

// Finds 12
auto it = std::ranges::find(ids, 12);

// Finds the first ID over 500
auto it2 = std::ranges::find_if(ids, [](int i){ return i > 500; });

Speculative Execution

Mechanically, a search is just a loop containing an if statement:

for (int i : data) {
  if (i == target) return i; // the branch
}

This mechanism interacts with the CPU's branch predictor. As this branch continues to return false on every iteration of the loop, the CPU can recognize the pattern and predict it will continue. It speculatively executes the next iteration of the loop before the current check is even finished. This allows the CPU to scan memory incredibly fast.

If a prediction is wrong, the CPU needs to perform a pipeline flush, discarding that speculative work and rolling back any state changes. This has a performance cost, typically called the branch misprediction penalty.

Branch prediction works particularly well with search algorithms. If the thing we're looking for is common, our algorithm will find it quickly. If it's rare, the predictor will soon realize our branch is consistently false and accelerate things accordingly. If we ever need to pay the branch misprediction penalty, that just means we've found what we were looking for, so there is no bad outcome.

Rule-Checking

The final family is really just a semantic wrapper around the search family.

The contains() Algorithm

Often, we don't care where the item is. We just want to know if it exists. This is equivalent to running find() or find_if() and comparing the return value to the sentinel:

if (std::ranges::find(ids, 999) != ids.end()) {
  // It exists!
}

C++23 gives us a more idiomatic way in the form of std::ranges::contains():

if (std::ranges::contains(ids, 999)) {
  // It exists!
}

The any_of() Algorithm

We also have the any_of() algorithm, which is similar to contains() except we pass a predicate function (like a lambda) instead of a value (like 999). This stops and returns true as soon as it finds a match, or returns returns false if no match found:

bool contains_even = std::ranges::any_of(values, [](int val) {
  return val % 2 == 0;
});

The none_of() and all_of() Algorithms

Finally, we have none_of(), which stops and returns false as soon as it finds a match, or returns true if no match is found.

And we have all_of(), which stops and returns false as soon as it finds something that does not match, or returns true if all match.

Here are some examples:

#include <vector>
#include <algorithm>
#include <ranges>

bool IsSafe(const std::vector<int>& sensor_readings) {
  // Requirement: All sensors must be below critical threshold
  bool safe = std::ranges::all_of(sensor_readings, [](int val) {
    return val < 1000;
  });

  return safe;
}

bool HasError(const std::vector<int>& error_codes) {
  // Requirement: Check if any error code is non-zero
  return std::ranges::any_of(error_codes, [](int code) {
    return code != 0;
  });
}

Summary

We have distilled the massive standard library into four shapes. Here are the rules of thumb:

  1. Mappers: Use transform() and filter() - preferably as views - to reshape the data. They are lazy blueprints that allow the compiler to fuse logic into a single pass (kernel fusion).
  2. Terminators (Iterating and Folding): Use a range-based for loop or the for_each() algorithm to iterate over a range to produce a side-effect. Use fold_left() (C++23) to boil data down and produce a summary.
  3. Searching: Use find(). This stops processing as soon as it finds a result, saving bandwidth and CPU cycles.
  4. Rule-Checking: Use contains() (C++23), all_of(), any_of(), and none_of() to check boolean conditions. They are just searches in disguise.

We'll expand this toolkit through the rest of the course but, by sticking to these patterns, you make your code easier to read for humans and easier to optimize for compilers.

Any time you feel like you want to make a raw loop, ask yourself if your problem can instead be solved by some composition of these techniques. Most small problems can be solved entirely in this way and, even in larger algorithms, these components can solve parts of it.

The first time you use them, they'll be slower and more frustrating than just writing the imperative loop you had in mind. But, over time, you'll build pattern recognition and be throwing pipelines together very quickly, and you'll find you barely write loops any more.

You'll also benefit from a huge range-based ecosystem. That's not just restricted to the standard library - there are many third-party libraries providing range-based algorithms and views, and you'll also likely have an internal collection dedicated to solving the specific needs of your projects.

Next Lesson
Lesson 2 of 5

Sorting and Materialization

Integrate sorting into C++20 pipelines, the difference between stable and unstable sorts, and how to use partial sorting.

Have a question about this lesson?
Answers are generated by AI models and may not be accurate