C++20 Ranges

Learn how C++20 Ranges and Views allow us to compose safe, lazy-evaluated data pipelines without sacrificing performance.

Ryan McCombe
Published

In the previous lesson, we explored the iterator pattern. We learned how it decouples containers from algorithms, allowing us to write a single std::sort() function that works on vectors, arrays, and even custom data structures.

However, the iterator pattern has a usability problem. To perform even the simplest operation, we have to manage pairs of iterators.

std::sort(myVector.begin(), myVector.end());

This verbosity isn't just annoying; it's dangerous. By requiring the programmer to manually provide the start and the end, we open the door to mismatch errors. If you accidentally pass vec1.begin() and vec2.end(), the compiler will happily generate code that walks off the end of the first vector, marching through memory until it creates a segmentation fault.

The C++20 Ranges library is the evolution of this pattern. It wraps the low-level mechanics of iterators into safer, higher-level abstractions.

As a quick win, we can just pass myVector into std::ranges::sort(), instead of an iterator pair into std::sort():

#include <algorithm>

// Before:
std::sort(myVector.begin(), myVector.end());

// After:
std::ranges::sort(myVector);

But ranges are more than just syntactic sugar. They introduce a fundamental shift in how we process data. They allow us to build lazy evaluation pipelines - complex chains of filters and transformations that don't execute until the moment we need the result.

In this lesson, we will peel back the abstraction to understand how ranges work mechanically. We will look at how sentinels allow for tighter loops, how views save memory bandwidth, and how pipes improve cache locality.

Ranges and Sentinels

Let's start with a definition. In C++20, a Range is simply any object that provides a begin() iterator and an end() sentinel. Notice I said "sentinel," not "iterator." This is an important distinction.

Prior to ranges, the begin() and end() iterators had to be the exact same type. This forced the CPU to perform a specific kind of check in every loop: comparing two pointers.

// C++17 Loop
for (auto it = begin; it != end; ++it) {
  // ...
}

This works fine when we know where the end is in advance. But what about a C-style string? To find the end of a string like "Hello", we first have to walk the entire string to find the null terminator \0.

To use an iterator-based algorithm here, we have to traverse the data twice. The first traversal is needed to find where the end of the string is, and only then can we set up our iterator-based begin->end to do the actual work we care about.

void ProcessString(const char* str) {
  // We must calculate the end pointer first
  // This requires an O(n) scan of the string via strlen()
  const char* end = str + std::strlen(str);
  
  // Now that we know where end is, we can do our work
  // This is another O(n) traversal
  for (const char* it = str; it != end; ++it) {
    // Do work...
  }
}

C++20 relaxes this rule. The end() object can be a different type, called a sentinel.

The Sentinel Optimization

A sentinel is a lightweight type that encodes a termination condition. When the compiler compares an iterator to a sentinel (it != sentinel), it doesn't necessarily compare two memory addresses. Instead, it checks a condition.

Let's look at how this changes the generated code for a null-terminated string loop. When we can compare our iterator to an object of any type, we can improve our approach to this:

struct StringSentinel {};

// Custom comparison operator
bool operator!=(const char* it, StringSentinel) {
  return *it != '\0'; // Stop if we hit null
}

void ProcessString(const char* str) {
  // The loop condition is now (*it != '\0')
  // No initial scan required
  for (const char* it = str; it != StringSentinel{}; ++it) {
    // Do work...
  }
}

In this new version, we don't need to know where the string ends before we start. The sentinel allows the loop to simply ask: "Is the current character a null terminator?"

Of course, we could just embed that logic directly into a traditional loop, perhaps with a break statement, but we're trying to move away from raw loops. We're using this loop just to explain the conceptual idea of a sentinel. This is the foundation of a C++ range, and we'll see the pay off soon.

A sentinel is an object that represents the end of a range. It does this by being comparable to an iterator traversing through that range. When iterator == sentinel returns true, the range has ended.

Iterators are Sentinels

Note that, in this definition, an iterator can still be a sentinel. The object returned from SomeVector.end() is both an iterator and a sentinel.

  • It is an iterator because it allows us to traverse the vector using expressions like SomeVector.end() - 5.
  • It is a sentinel because it supports expressions like SomeIterator == SomeVector.end() and, if this expression returns true, that is a signal that we've reached the end of the range.

But a sentinel doesn't need to be an iterator. Our StringSentinel example above is a sentinel, but it is not an iterator.

Views: Non-Owning References

The second pillar of the Ranges library is the view.

To understand views, we must distinguish between owning and viewing memory.

  • Containers (like std::vector) own their data. When you copy a vector, it allocates new memory and copies every element. This is an O(n)O(n) operation and puts significant pressure on the memory allocator.
  • Views (like std::span or std::string_view) borrow data. They simply hold a pointer to someone else's memory.

In the Ranges library, a view is a range that is cheap to copy and destroy. We cover std::span more later, but it's a basic view that works with a contiguous block of data:

std::vector<int> data{1, 2, 3, 4, 5};

// Heavy: Allocates new memory, copies elements
std::vector<int> copy = data;

// Light: Copies two pointers (begin/size)
std::span<int> view = data;

Because views are so lightweight (typically just two pointers - 16 bytes on a 64-bit system), we can pass them into functions, return them, and compose them without worrying about performance overhead.

Lazy Evaluation

The combination of sentinels and views unlocks a superpower: lazy evaluation.

When we chain operations together in traditional C++, we usually perform eager evaluation. We do the work immediately.

std::vector<int> nums{1, 2, 3, 4, 5};
std::vector<int> results;

// Eager: Loops immediately, allocates memory, writes results
std::transform(
  nums.begin(), nums.end(),
  std::back_inserter(results),
  [](int i){ return i * i; }
);

In the Ranges library, transformation views are lazy. When you create a view, no work is done. No loops run. No memory is allocated.

Instead, the view creates a small state machine. It holds the source range and the transformation logic (the lambda), and waits. The code only executes if and when we attempt to access an element through that view.

Example: std::views::transform()

Let's see a practical example using std::views::transform():

#include <vector>
#include <ranges>
#include <iostream>

int main() {
  std::vector<int> nums{1, 2, 3, 4, 5};

  // 1. Create the View
  // 'squares' is NOT a std::vector. It is a lightweight object
  // holding a pointer to 'nums' and the lambda function
  auto squares = std::views::transform(nums, [](int i) {
    return i * i;
  });
  
  // The view is lazily evaluated - no multiplication
  // has happened yet

  // 2. Iterate the View
  // Views are ranges, so we can use a range-based loop
  for (int i : squares) {
    // We need our data, so the multiplication is performed now
    std::cout << i << " ";
  }
}
1 4 9 16 25

Mechanically, std::views::transform() returns an iterator that is a proxy. When you ask that iterator for a value, it:

  1. Fetches the value from the underlying source iterator.
  2. Passes it through the transformation function.
  3. Returns the result to you.

It computes the value just in time, directly in the CPU registers. It's important to remember that the view does not own the data. In our original example using std::transform(), results was a fully fledged std::vector owning a transformed copy of the data.

In this example, our view is letting us see the contents of nums through the lens that std::views::transform() creates. If the original data changes and we look through the same lens again, we see those changes:

#include <vector>
#include <ranges>
#include <iostream>

int main() {
  std::vector<int> nums{1, 2, 3, 4, 5};
  
  auto squares = std::views::transform(nums, [](int i) {
    return i * i;
  });

  for (int i : squares) {
    std::cout << i << " ";
  }
  
  // Change the underlying data
  nums[0] = 10; 
  
  // Iterate the SAME view object again
  // It automatically sees the change
  std::cout << '\n';
  for (int i : squares) {
    std::cout << i << " ";
  }
}
1 4 9 16 25
100 4 9 16 25

If we want to save what we're currently seeing through our view to regular, data-owning container like a std::vector, we can use std::ranges::to().

Why auto?

Modern C++ algorithm design makes heavy use of lambdas, which often makes explicit typing impossible. In our previous example, we don't actually know the type of squares - it depends entirely on what the compiler decided to call the struct it generated to implement our lambda.

More importantly, even if we were using a named function instead of a lambda, we'd still just use auto. This is because the explicit type would be extremely verbose:

std::ranges::transform_view<
  std::ranges::ref_view<std::vector<int>>,
  MySquareFunc
>

The <ranges> library in particular is designed with auto in mind. It is heavily inspired by functional programming styles, which generally solve problems using compositions of functions that are both flexible and small.

Without automatic type inference, these two traits are in conflict. The flexibility requires heavy use of templating, which results in extremely long and complex type names. Meanwhile, the small amount of logic within each function causes the meaningful behaviour of our code to be hidden among all the type noise.

Materializing Views using std::ranges::to()

Lazy views are fantastic for processing, but sometimes, we need to physically store the result. We might need to return the data from a function, pass it to an API that expects a std::vector, or simply freeze the state so that changes to the original source no longer affect our results.

We call this materialization.

In C++20, this was slightly awkward - we had to write a manual loop. C++23 introduced std::ranges::to(), a dedicated function to flush a pipeline into a container:

#include <vector>
#include <ranges>

int main() {
  std::vector<int> nums{1, 2, 3, 4, 5};

  auto squares = std::views::transform(nums, [](int i) {
    return i * i;
  });

  // Materialize the view into a real vector.
  // This is the point where memory is allocated
  // and the multiplications are performed
  auto results = std::ranges::to<std::vector>(squares); 
  
  // 'results' is now a std::vector<int> containing
  // {1, 4, 9, 16, 25}.  It is completely independent
  // of 'nums'.
}

Think of std::ranges::to() as the "sink" at the end of your pipeline. It pulls data from the lazy views one by one and pushes it into the destination container.

Summary

In this lesson, we modernized our approach to data processing.

  1. Ranges: We replaced error-prone iterator pairs with single, self-contained objects.
  2. Sentinels: We learned how decoupling the start and end types allows for more efficient loops, especially for data with termination conditions (like null terminators).
  3. Views: We explored non-owning, lightweight references that allow us to pass data around without copying it.
  4. Lazy Evaluation: We saw how views defer work until the last possible moment, saving memory and CPU cycles.
  5. Materializing Views: We learned to use std::ranges::to() to "flush" a lazy pipeline into a physical container when we need to store the results permanently.

The iterator pattern gave us a common language for algorithms. Ranges give us a grammar to compose sentences in that language.

However, the syntax we used above - std::views::transform(nums, ...) - is still a bit clunky, especially if we want to chain multiple operations together. If we wanted to filter and then transform, we would end up nesting function calls inside function calls, creating a "bracket hell" that is hard to read.

In the next lesson, we will introduce view composition and pipes. We will see how C++ borrowed the pipe syntax | from Unix to create data processing pipelines that read naturally from left to right. An algorithm that takes an input, filters it, and then transforms it, could be expressed as input | filter | transform.

Next Lesson
Lesson 4 of 5

View Composition and Pipes

Learn how to build readable, lazy-evaluated data pipelines that execute in a single pass, and chain C++20 views together using the pipe operator |

Have a question about this lesson?
Answers are generated by AI models and may not be accurate