Composition, Zipping, and Indicies
Iterate multiple containers simultaneously with zip, handle indices with enumerate, and skip elements with drop and stride.
In the previous lessons, we have criticized the raw for loop because working with indicies introduces opportunities to make errors, and raw loops are opaque.
A loop starting with for (size_t i = 0; ...) tells us nothing about the intent of the code. We have to read the body line-by-line to deduce whether it's calculating a sum, finding a maximum, or shifting memory.
We replaced manual searching loops with std::find(). We replaced filtering loops with std::views::filter(). We replaced transformation loops with std::views::transform().
However, what if we needed to iterate two vectors simultaneously, such as the parallel arrays of the PlayerStorage in the previous lesson? What if we need the current iteration number (the famous i) within our loop body? What if we needed needed to process every 2nd element - i += 2 instead of ++i?
In this lesson, we will adopt the C++23 additions to the ranges library - zip, enumerate, and stride, to solve each of these problems.
Updating to C++23
The features in this lesson (zip, enumerate, stride) were officially standardized in C++23. If you're following along with the code samples, make sure you're using at least that standard.
If you're using the CMakeLists.txt file we provided in an earlier course, this was already set via the CMAKE_CXX_STANDARD variable:
CMakeLists.txt
cmake_minimum_required(VERSION 3.20)
project(DsaCourse VERSION 1.0 LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 23)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
# Add our subdirectories
add_subdirectory(dsa_core)
add_subdirectory(dsa_app)
add_subdirectory(benchmarks)Support for these features is available in GCC 13+, Clang 16+, and MSVC 19.35+.
The Reverse Iteration Problem
We don't always iterate through our containers in the same direction. Some problems are best solved by reverse iteration.
However, implementing reverse iteration in a raw loop is a surprisingly common source of bugs. The first element we access is based on the container size(), but not exactly: container[container.size()] is out-of bounds. The last index is container.size() - 1.
But what if the size() is 0? When decrementing the index, we must remember that size_t is unsigned. If we try to decrement it below 0, it will wrap around to a huge positive number, which is also out of bounds.
A working implementation that avoids the usual mistakes around size() and unsigned indicies might look something like this:
for (size_t i = container.size(); i-- > 0; ) {
// ...
}Instead, we can use std::views::reverse:
#include <iostream>
#include <vector>
#include <ranges>
int main() {
std::vector<int> v{1, 2, 3, 4, 5};
// Before:
for (size_t i = v.size(); i-- > 0; ) {
std::cout << v[i] << ' ';
}
// After:
for (int x : std::views::reverse(v)) {
std::cout << x << ' ';
}
}The Index Problem
Sometimes, the position of the data is just as important as the data itself.
- "Log this error with the line number."
- "Reset every 3rd element."
- "Draw this sprite at
x = i * width."
With most view-based techniques, we no longer have this i variable, and we don't even get it in a range-based for loop - for (auto& item : container) - either.
In Python or Rust, this is solved with enumerate. C++23 brings this exact same utility to the standard library. The std::views::enumerate helper wraps a range and produces a view where every element is {index, value}.
With a range-based loop, we tend to immediately destructure these right in the loop heading using the [] syntax:
#include <vector>
#include <iostream>
#include <ranges>
void LogWithLineNumbers(const std::vector<std::string>& lines) {
// Before:
for (size_t i = 0; i < lines.size(); ++i) {
std::cout << i << ": " << lines[i] << "\n";
}
// After:
for (auto [i, line] : std::views::enumerate(lines)) {
std::cout << i << ": " << line << "\n";
}
}How it Works: iota + zip
Under the hood, enumerate constructs a view that combines our input range (the std::vector<strings> in this example) with an iota view.
We've already seen std::views::iota(0) - it creates an infinite sequence of numbers ($0, 1, 2, \dots$). Because iota generates numbers mathematically (it just increments a register), it creates zero memory traffic. The "array of indices" exists only as a concept in the CPU, not as data in RAM. This makes it effectively free.
To combine the iota with our original data, it uses std::views::zip() to create the {index, value} tuple. We cover zip views in the next section, and tuples in a future chapter but, for now, all we need to know is that we can "unzip" our tuple using structured binding as in our auto [index, line] syntax above.
The Stride Problem
Sometimes data is interleaved. An audio buffer might store samples as L, R, L, R, L, R.
To process only the left channel, we need to skip every other element. In a raw loop, we'd perhap use i += 2 instead if ++i.
To process just the left channel, we need to skip every second element. To process just the right channel, we need to offset by 1, then skip every second element.
We can express this declaratively using std::views::stride and std::views::drop:
#include <vector>
#include <ranges>
void ProcessStereo(std::vector<float>& audio) {
// View 1: Left Channel (Starts at 0, Step 2)
auto left_channel = audio | std::views::stride(2);
// View 2: Right Channel (Starts at 1, Step 2)
auto right_channel = audio
| std::views::drop(1)
| std::views::stride(2);
// We can now iterate left_channel and right_channel independently
for (float sample : left_channel) {
// ...
}
}The compiler sees through all of this. It knows that stride(2) on a contiguous vector just means "add 2 * sizeof(float) to the pointer." The resulting assembly is a tight, optimized loop that advances two pointers through memory.
The Parallel Array Problem
In the previous lesson we established that splitting large objects into parallel arrays (structure of arrays) can drastically improve performance.
Instead of vector<Player>, we have vector<int> ids, vector<int> scores, and so on.
struct Player {
int id;
int score;
float health;
};
// Array of structures
std::vector<Player> Players;
// Structure of arrays
struct PlayerStorage {
std::vector<int> ids;
std::vector<int> scores;
std::vector<float> health;
};This performance optimization creates many usability problems, which we'll address in future chapters.
For this lesson, the immediate issue is that we need to keep track of an index to work with these parallel arrays. Below, we update all players whose score is over 100 using a raw loop:
// The misery of SoA
for (size_t i = 0; i < ids.size(); ++i) {
// One manual index managing three independent memory blocks
if (scores[i] > 100) {
UpdateHealth(health[i]);
Log(ids[i]);
}
}That's a lot of manual iteration management and index-wrangling that we can easily mess up. C++23's std::views::zip is the cure. It takes multiple ranges and creates a single view:
auto players = std::views::zip(ids, scores, health);When we then need to consume that data, we use structured bindings to pull each of the components out:
// Get data for one player:
auto [id0, score0, hp0] = players[0]; // By reference
const auto [id1, score1, hp1] = players[1]; // By const reference
// Iterate all players:
for (auto [id, score, hp] : players) {
// ...
}Here's a complete example:
#include <vector>
#include <ranges>
#include <iostream>
void ProcessPlayers(
std::vector<int>& ids,
std::vector<int>& scores,
std::vector<float>& health
) {
// Create a unified view
auto players = std::views::zip(ids, scores, health);
// Iterate as if it were a single object
for (auto [id, score, hp] : players) {
if (score > 100) {
// 'hp' is a reference (float&) to the value in the vector
// modifying it modifies the vector
hp += 10.0f;
std::cout << "Player " << id << " healed!\n";
}
}
}Conceptually, we can think of std::views::zip as creating a virtual struct.
To the hardware, the data is still stored in three separate arrays (SoA), guaranteeing perfect cache locality for each stream.
To the programmer, the data looks like a single object {id, score, hp} (AoS), guaranteeing ease of use.
The "struct" only exists for the brief moment the data is loaded into CPU registers inside the loop. It is transient. We get the performance of SoA with the ergonomics of AoS. We'll benchmark this and confirm the performance later in the lesson.
Safety: The Shortest Range Rule
One of the more dangerous aspects of manual loops over parallel arrays is mismatching sizes. If ids has 100 items (ids[0] to ids[99]) but scores only has 99 (scores[0] to scores[98]), a loop iterating 100 times will try to access scores[99] and segfault.
This is handled gracefully by std::views::zip. The length of a zipped view is determined by the shortest underlying range. If one vector is shorter than the others, iteration simply stops.
There might still be a bug elsewhere causing those arrays to have different lengths, but having our iteration logic handle bad data without crashing or corrupting memory is at least an improvement.
Using Zip Views in Pipelines
There is one friction point you will encounter when using zip and enumerate. Standard algorithms like std::ranges::filter() or std::views::filter() expect a predicate that takes a single argument.
When you zip ranges together, that "single argument" is a std::tuple:
const std::vector<int>& ids;
const std::vector<int>& scores;
auto pipeline = std::views::zip(ids, scores)
| std::views::filter([](const auto& tuple) {
// ...
});We can again use structured bindings to access the data we need - this time, inside the lambda body. The following example creates a view on top of SoA data, providing the IDs of players that have a high score:
#include <vector>
#include <ranges>
#include <algorithm>
void HighScores(
const std::vector<int>& ids,
const std::vector<int>& scores
) {
auto pipeline = std::views::zip(ids, scores)
// Only view tuples that have a high score
| std::views::filter([](const auto& tuple) {
// Unpack the tuple
auto [id, score] = tuple;
return score > 1000;
})
// Transform the tuples to just the ids (ints)
| std::views::transform([](const auto& tuple) {
auto [id, score] = tuple;
return id;
});
for (int id : pipeline) {
// ...
}
}Adding Indices to Zip Views
We can incorporate indices into our zip views, effectively replicating the behaviour of enumerate, by including an iota in our view:
#include <vector>
#include <ranges>
#include <algorithm>
void HighScores(
const std::vector<int>& ids,
const std::vector<int>& scores
) {
using std::views::zip, std::views::iota;
for (auto [i, id, score] : zip(iota(0), ids, scores)) {
// ...
}
}Using Zip Views for Interleaved Data
Previously, we saw how stride and drop can be used to iterate over interleaved data such as audio channels. We can combine this with zip views to reconsitute that interleaved data back into logical sets:
#include <vector>
#include <ranges>
void ProcessStereo(std::vector<float>& audio) {
// View 1: Left Channel (Starts at 0, Step 2)
auto left_channel = audio | std::views::stride(2);
// View 2: Right Channel (Starts at 1, Step 2)
auto right_channel = audio
| std::views::drop(1)
| std::views::stride(2);
// We can now zip them back together, but now as logical pairs
auto stereo_pairs = std::views::zip(left_channel, right_channel);
for (auto [l, r] : stereo_pairs) {
// l refers to audio[0], audio[2], audio[4]...
// r refers to audio[1], audio[3], audio[5]...
}
}Benchmarking Zip Views
Do these high-level abstractions actually compile down to efficient machine code, or are we paying a hidden tax for all this syntax? Let's verify in our lab.
We will bring back our Player type which has two data points we care about in our algorithm (id and score), along with a load of additional arbitrary data (payload).
The task we need to perform requires us to retrieve every player's id and score. We test three different algorithms:
AoS: the simple array-of-structures approach, where our data is stored as astd::vector<Player>.SoA_Loop: our data is stored using the structure-of-arrays approach. We'll iterate the parallel arrays using a raw, index-based loop.SoA_Zip: the same structure as above, but we'll create astd::views::zipto hide the parallel arrays and remove the index management.
benchmarks/main.cpp
#include <benchmark/benchmark.h>
#include <vector>
#include <string>
#include <ranges>
#include <numeric>
struct Player {
int id;
int score;
char payload[1024]; // Simulate heavy object
};
// 1. Array of Structs (AoS)
static void BM_AoS(benchmark::State& state) {
int n = state.range(0);
std::vector<Player> people(n);
// Initialize with some data
for (int i = 0; i < n; ++i) {
people[i].id = i;
people[i].score = i * 2;
}
long long sum = 0;
for (auto _ : state) {
for (const auto& p : people) {
sum += p.id + p.score;
}
}
benchmark::DoNotOptimize(sum);
}
// 2. Structure of Arrays (Manual Loop)
static void BM_SoA_Loop(benchmark::State& state) {
int n = state.range(0);
std::vector<int> ids(n);
std::vector<int> scores(n);
// Initialize with some data
for (int i = 0; i < n; ++i) {
ids[i] = i;
scores[i] = i * 2;
}
long long sum = 0;
for (auto _ : state) {
for (size_t i = 0; i < n; ++i) {
sum += ids[i] + scores[i];
}
}
benchmark::DoNotOptimize(sum);
}
// 3. Structure of Arrays (Zip View)
static void BM_SoA_Zip(benchmark::State& state) {
int n = state.range(0);
std::vector<int> ids(n);
std::vector<int> scores(n);
// Initialize with some data
for (int i = 0; i < n; ++i) {
ids[i] = i;
scores[i] = i * 2;
}
long long sum = 0;
for (auto _ : state) {
for (const auto [id, score] : std::views::zip(ids, scores)) {
sum += id + score;
}
}
benchmark::DoNotOptimize(sum);
}
BENCHMARK(BM_AoS)->Range(1024, 65536);
BENCHMARK(BM_SoA_Loop)->Range(1024, 65536);
BENCHMARK(BM_SoA_Zip)->Range(1024, 65536);
BENCHMARK_MAIN();-------------------------------
Benchmark CPU
-------------------------------
BM_AoS/1024 610 ns
BM_AoS/4096 2849 ns
BM_AoS/32768 34528 ns
BM_AoS/65536 87887 ns
BM_SoA_Loop/1024 547 ns
BM_SoA_Loop/4096 2148 ns
BM_SoA_Loop/32768 17264 ns
BM_SoA_Loop/65536 35296 ns
BM_SoA_Zip/1024 586 ns
BM_SoA_Zip/4096 2354 ns
BM_SoA_Zip/32768 18415 ns
BM_SoA_Zip/65536 39447 nsWe have significant improvements over the traditional array-of-structures (AoS) approach. The results between the raw BM_SoA_Loop and the BM_SoA_Zip abstraction are close, but the "shortest range" safety check implemented by std::views::zip does have a performance cost.
Compiler Hints using std::unreachable()
If we want to remove this safety check, we can give the compiler a hint that it is unnecessary. To tell the compiler something is impossible - that is, a boolean condition cannot be true - we mark the associated block of code as being unreachable:
benchmarks/main.cpp
#include <utility> // for std::unreachable
// ...
static void BM_SoA_Zip(benchmark::State& state) {
int n = state.range(0);
std::vector<int> ids(n);
std::vector<int> scores(n);
// Our arrays cannot have different sizes
if (ids.size() != scores.size()) {
std::unreachable(); // C++23
}
// Initialize with some data
for (int i = 0; i < n; ++i) {
ids[i] = i;
scores[i] = i * 2;
}
long long sum = 0;
for (auto _ : state) {
for (const auto [id, score] : std::views::zip(ids, scores)) {
sum += id + score;
}
}
benchmark::DoNotOptimize(sum);
}The std::unreachable() utility is also a C++23 addition. Prior to C++23, it can be implemented using compiler intrinsics like __builtin_unreachable() in GCC/Clang and __assume(0) in MSVC.
With that change, the performance gap disappears - std::views::zip is a zero-cost abstraction:
-------------------------------
Benchmark CPU
-------------------------------
BM_SoA_Loop/1024 544 ns
BM_SoA_Loop/4096 2222 ns
BM_SoA_Loop/32768 17648 ns
BM_SoA_Loop/65536 34319 ns
BM_SoA_Zip/1024 544 ns
BM_SoA_Zip/4096 2051 ns
BM_SoA_Zip/32768 16044 ns
BM_SoA_Zip/65536 32227 nsSummary
After thie lesson this lesson, we now have a named, safer, and composable tool for every iteration pattern:
- Reverse: Use
std::views::reversewhen you need to iterate through a range in reverse order. - Enumerate: Use
std::views::enumeratewhen you need an index. It combines the container with a mathematical sequence usingiota. - Stride: Use
std::views::strideandstd::views::dropto process interleaved data without index math. - Zip: Use
std::views::zipto iterate multiple containers at once. This solves the usability nightmare of "structure of arrays," allowing us to keep our data fast (split in memory) but our code clean (unified in logic). - Virtual Structs: We proved that
zipcreates a transient view that acts like an object but performs like raw arrays.