Composition, Zipping, and Indicies

Iterate multiple containers simultaneously with zip, handle indices with enumerate, and skip elements with drop and stride.

Ryan McCombe
Published

In the previous lessons, we have criticized the raw for loop because working with indicies introduces opportunities to make errors, and raw loops are opaque.

A loop starting with for (size_t i = 0; ...) tells us nothing about the intent of the code. We have to read the body line-by-line to deduce whether it's calculating a sum, finding a maximum, or shifting memory.

We replaced manual searching loops with std::find(). We replaced filtering loops with std::views::filter(). We replaced transformation loops with std::views::transform().

However, what if we needed to iterate two vectors simultaneously, such as the parallel arrays of the PlayerStorage in the previous lesson? What if we need the current iteration number (the famous i) within our loop body? What if we needed needed to process every 2nd element - i += 2 instead of ++i?

In this lesson, we will adopt the C++23 additions to the ranges library - zip, enumerate, and stride, to solve each of these problems.

Updating to C++23

The features in this lesson (zip, enumerate, stride) were officially standardized in C++23. If you're following along with the code samples, make sure you're using at least that standard.

If you're using the CMakeLists.txt file we provided in an earlier course, this was already set via the CMAKE_CXX_STANDARD variable:

CMakeLists.txt

cmake_minimum_required(VERSION 3.20)

project(DsaCourse VERSION 1.0 LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 23) 
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)

# Add our subdirectories
add_subdirectory(dsa_core)
add_subdirectory(dsa_app)
add_subdirectory(benchmarks)

Support for these features is available in GCC 13+, Clang 16+, and MSVC 19.35+.

The Reverse Iteration Problem

We don't always iterate through our containers in the same direction. Some problems are best solved by reverse iteration.

However, implementing reverse iteration in a raw loop is a surprisingly common source of bugs. The first element we access is based on the container size(), but not exactly: container[container.size()] is out-of bounds. The last index is container.size() - 1.

But what if the size() is 0? When decrementing the index, we must remember that size_t is unsigned. If we try to decrement it below 0, it will wrap around to a huge positive number, which is also out of bounds.

A working implementation that avoids the usual mistakes around size() and unsigned indicies might look something like this:

for (size_t i = container.size(); i-- > 0; ) {
  // ...
}

Instead, we can use std::views::reverse:

#include <iostream>
#include <vector>
#include <ranges>

int main() {
  std::vector<int> v{1, 2, 3, 4, 5};
  
  // Before:
  for (size_t i = v.size(); i-- > 0; ) {
    std::cout << v[i] << ' ';
  }
  
  // After:
  for (int x : std::views::reverse(v)) {
    std::cout << x << ' ';
  }
}

The Index Problem

Sometimes, the position of the data is just as important as the data itself.

  • "Log this error with the line number."
  • "Reset every 3rd element."
  • "Draw this sprite at x = i * width."

With most view-based techniques, we no longer have this i variable, and we don't even get it in a range-based for loop - for (auto& item : container) - either.

In Python or Rust, this is solved with enumerate. C++23 brings this exact same utility to the standard library. The std::views::enumerate helper wraps a range and produces a view where every element is {index, value}.

With a range-based loop, we tend to immediately destructure these right in the loop heading using the [] syntax:

#include <vector>
#include <iostream>
#include <ranges>

void LogWithLineNumbers(const std::vector<std::string>& lines) {
  // Before:
  for (size_t i = 0; i < lines.size(); ++i) {
    std::cout << i << ": " << lines[i] << "\n";
  }

  // After:
  for (auto [i, line] : std::views::enumerate(lines)) {
    std::cout << i << ": " << line << "\n";
  }
}

How it Works: iota + zip

Under the hood, enumerate constructs a view that combines our input range (the std::vector<strings> in this example) with an iota view.

We've already seen std::views::iota(0) - it creates an infinite sequence of numbers ($0, 1, 2, \dots$). Because iota generates numbers mathematically (it just increments a register), it creates zero memory traffic. The "array of indices" exists only as a concept in the CPU, not as data in RAM. This makes it effectively free.

To combine the iota with our original data, it uses std::views::zip() to create the {index, value} tuple. We cover zip views in the next section, and tuples in a future chapter but, for now, all we need to know is that we can "unzip" our tuple using structured binding as in our auto [index, line] syntax above.

The Stride Problem

Sometimes data is interleaved. An audio buffer might store samples as L, R, L, R, L, R.

To process only the left channel, we need to skip every other element. In a raw loop, we'd perhap use i += 2 instead if ++i.

To process just the left channel, we need to skip every second element. To process just the right channel, we need to offset by 1, then skip every second element.

We can express this declaratively using std::views::stride and std::views::drop:

#include <vector>
#include <ranges>

void ProcessStereo(std::vector<float>& audio) {
  // View 1: Left Channel (Starts at 0, Step 2)
  auto left_channel = audio | std::views::stride(2);

  // View 2: Right Channel (Starts at 1, Step 2)
  auto right_channel = audio
                     | std::views::drop(1)
                     | std::views::stride(2);
  
  // We can now iterate left_channel and right_channel independently
  for (float sample : left_channel) {
    // ...
  }
}

The compiler sees through all of this. It knows that stride(2) on a contiguous vector just means "add 2 * sizeof(float) to the pointer." The resulting assembly is a tight, optimized loop that advances two pointers through memory.

The Parallel Array Problem

In the previous lesson we established that splitting large objects into parallel arrays (structure of arrays) can drastically improve performance.

Instead of vector<Player>, we have vector<int> ids, vector<int> scores, and so on.

struct Player {
  int id;
  int score;
  float health;
};

// Array of structures
std::vector<Player> Players;

// Structure of arrays
struct PlayerStorage {
  std::vector<int> ids;
  std::vector<int> scores;
  std::vector<float> health;
};

This performance optimization creates many usability problems, which we'll address in future chapters.

For this lesson, the immediate issue is that we need to keep track of an index to work with these parallel arrays. Below, we update all players whose score is over 100 using a raw loop:

// The misery of SoA
for (size_t i = 0; i < ids.size(); ++i) {
  // One manual index managing three independent memory blocks
  if (scores[i] > 100) {
    UpdateHealth(health[i]);
    Log(ids[i]);
  }
}

That's a lot of manual iteration management and index-wrangling that we can easily mess up. C++23's std::views::zip is the cure. It takes multiple ranges and creates a single view:

auto players = std::views::zip(ids, scores, health);

When we then need to consume that data, we use structured bindings to pull each of the components out:

// Get data for one player:
auto [id0, score0, hp0] = players[0]; // By reference
const auto [id1, score1, hp1] = players[1]; // By const reference

// Iterate all players:
for (auto [id, score, hp] : players) {
  // ...
}

Here's a complete example:

#include <vector>
#include <ranges>
#include <iostream>

void ProcessPlayers(
  std::vector<int>& ids,
  std::vector<int>& scores,
  std::vector<float>& health
) {

  // Create a unified view
  auto players = std::views::zip(ids, scores, health);

  // Iterate as if it were a single object
  for (auto [id, score, hp] : players) {
    if (score > 100) {
      // 'hp' is a reference (float&) to the value in the vector
      // modifying it modifies the vector
      hp += 10.0f;
      std::cout << "Player " << id << " healed!\n";
    }
  }
}

Conceptually, we can think of std::views::zip as creating a virtual struct.

To the hardware, the data is still stored in three separate arrays (SoA), guaranteeing perfect cache locality for each stream.

To the programmer, the data looks like a single object {id, score, hp} (AoS), guaranteeing ease of use.

The "struct" only exists for the brief moment the data is loaded into CPU registers inside the loop. It is transient. We get the performance of SoA with the ergonomics of AoS. We'll benchmark this and confirm the performance later in the lesson.

Safety: The Shortest Range Rule

One of the more dangerous aspects of manual loops over parallel arrays is mismatching sizes. If ids has 100 items (ids[0] to ids[99]) but scores only has 99 (scores[0] to scores[98]), a loop iterating 100 times will try to access scores[99] and segfault.

This is handled gracefully by std::views::zip. The length of a zipped view is determined by the shortest underlying range. If one vector is shorter than the others, iteration simply stops.

There might still be a bug elsewhere causing those arrays to have different lengths, but having our iteration logic handle bad data without crashing or corrupting memory is at least an improvement.

Using Zip Views in Pipelines

There is one friction point you will encounter when using zip and enumerate. Standard algorithms like std::ranges::filter() or std::views::filter() expect a predicate that takes a single argument.

When you zip ranges together, that "single argument" is a std::tuple:

const std::vector<int>& ids;
const std::vector<int>& scores;

auto pipeline = std::views::zip(ids, scores)
  | std::views::filter([](const auto& tuple) {
      // ...
    });

We can again use structured bindings to access the data we need - this time, inside the lambda body. The following example creates a view on top of SoA data, providing the IDs of players that have a high score:

#include <vector>
#include <ranges>
#include <algorithm>

void HighScores(
  const std::vector<int>& ids,
  const std::vector<int>& scores
) {
  auto pipeline = std::views::zip(ids, scores)
    // Only view tuples that have a high score
    | std::views::filter([](const auto& tuple) {
        // Unpack the tuple
        auto [id, score] = tuple;
        return score > 1000;
      })
    // Transform the tuples to just the ids (ints)
    | std::views::transform([](const auto& tuple) {
        auto [id, score] = tuple;
        return id;
      });
  
  for (int id : pipeline) {
    // ...
  }
}

Adding Indices to Zip Views

We can incorporate indices into our zip views, effectively replicating the behaviour of enumerate, by including an iota in our view:

#include <vector>
#include <ranges>
#include <algorithm>

void HighScores(
  const std::vector<int>& ids,
  const std::vector<int>& scores
) {
  using std::views::zip, std::views::iota;
  for (auto [i, id, score] : zip(iota(0), ids, scores)) {
    // ...
  }
}

Using Zip Views for Interleaved Data

Previously, we saw how stride and drop can be used to iterate over interleaved data such as audio channels. We can combine this with zip views to reconsitute that interleaved data back into logical sets:

#include <vector>
#include <ranges>

void ProcessStereo(std::vector<float>& audio) {
  // View 1: Left Channel (Starts at 0, Step 2)
  auto left_channel = audio | std::views::stride(2);

  // View 2: Right Channel (Starts at 1, Step 2)
  auto right_channel = audio
                     | std::views::drop(1)
                     | std::views::stride(2);

  // We can now zip them back together, but now as logical pairs
  auto stereo_pairs = std::views::zip(left_channel, right_channel);

  for (auto [l, r] : stereo_pairs) {
    // l refers to audio[0], audio[2], audio[4]...
    // r refers to audio[1], audio[3], audio[5]...
  }
}

Benchmarking Zip Views

Do these high-level abstractions actually compile down to efficient machine code, or are we paying a hidden tax for all this syntax? Let's verify in our lab.

We will bring back our Player type which has two data points we care about in our algorithm (id and score), along with a load of additional arbitrary data (payload).

The task we need to perform requires us to retrieve every player's id and score. We test three different algorithms:

  1. AoS: the simple array-of-structures approach, where our data is stored as a std::vector<Player>.
  2. SoA_Loop: our data is stored using the structure-of-arrays approach. We'll iterate the parallel arrays using a raw, index-based loop.
  3. SoA_Zip: the same structure as above, but we'll create a std::views::zip to hide the parallel arrays and remove the index management.

benchmarks/main.cpp

#include <benchmark/benchmark.h>
#include <vector>
#include <string>
#include <ranges>
#include <numeric>

struct Player {
  int id;
  int score;
  char payload[1024]; // Simulate heavy object
};

// 1. Array of Structs (AoS)
static void BM_AoS(benchmark::State& state) {
  int n = state.range(0);
  std::vector<Player> people(n);

  // Initialize with some data
  for (int i = 0; i < n; ++i) {
    people[i].id = i;
    people[i].score = i * 2;
  }

  long long sum = 0;
  for (auto _ : state) {
    for (const auto& p : people) {
      sum += p.id + p.score;
    }
  }
  benchmark::DoNotOptimize(sum);
}

// 2. Structure of Arrays (Manual Loop)
static void BM_SoA_Loop(benchmark::State& state) {
  int n = state.range(0);
  std::vector<int> ids(n);
  std::vector<int> scores(n);

  // Initialize with some data
  for (int i = 0; i < n; ++i) {
    ids[i] = i;
    scores[i] = i * 2;
  }

  long long sum = 0;
  for (auto _ : state) {
    for (size_t i = 0; i < n; ++i) {
      sum += ids[i] + scores[i];
    }
  }
  benchmark::DoNotOptimize(sum);
}

// 3. Structure of Arrays (Zip View)
static void BM_SoA_Zip(benchmark::State& state) {
  int n = state.range(0);
  std::vector<int> ids(n);
  std::vector<int> scores(n);

  // Initialize with some data
  for (int i = 0; i < n; ++i) {
    ids[i] = i;
    scores[i] = i * 2;
  }

  long long sum = 0;
  for (auto _ : state) {
    for (const auto [id, score] : std::views::zip(ids, scores)) {
      sum += id + score;
    }
  }
  benchmark::DoNotOptimize(sum);
}

BENCHMARK(BM_AoS)->Range(1024, 65536);
BENCHMARK(BM_SoA_Loop)->Range(1024, 65536);
BENCHMARK(BM_SoA_Zip)->Range(1024, 65536);

BENCHMARK_MAIN();
-------------------------------
Benchmark                   CPU
-------------------------------
BM_AoS/1024              610 ns
BM_AoS/4096             2849 ns
BM_AoS/32768           34528 ns
BM_AoS/65536           87887 ns
BM_SoA_Loop/1024         547 ns
BM_SoA_Loop/4096        2148 ns
BM_SoA_Loop/32768      17264 ns
BM_SoA_Loop/65536      35296 ns
BM_SoA_Zip/1024          586 ns
BM_SoA_Zip/4096         2354 ns
BM_SoA_Zip/32768       18415 ns
BM_SoA_Zip/65536       39447 ns

We have significant improvements over the traditional array-of-structures (AoS) approach. The results between the raw BM_SoA_Loop and the BM_SoA_Zip abstraction are close, but the "shortest range" safety check implemented by std::views::zip does have a performance cost.

Compiler Hints using std::unreachable()

If we want to remove this safety check, we can give the compiler a hint that it is unnecessary. To tell the compiler something is impossible - that is, a boolean condition cannot be true - we mark the associated block of code as being unreachable:

benchmarks/main.cpp

#include <utility> // for std::unreachable 

// ...

static void BM_SoA_Zip(benchmark::State& state) {
  int n = state.range(0);
  std::vector<int> ids(n);
  std::vector<int> scores(n);
  
  // Our arrays cannot have different sizes
  if (ids.size() != scores.size()) {
    std::unreachable(); // C++23
  }

  // Initialize with some data
  for (int i = 0; i < n; ++i) {
    ids[i] = i;
    scores[i] = i * 2;
  }
    
  long long sum = 0;
  for (auto _ : state) {
    for (const auto [id, score] : std::views::zip(ids, scores)) {
      sum += id + score;
    }
  }
  benchmark::DoNotOptimize(sum);
}

The std::unreachable() utility is also a C++23 addition. Prior to C++23, it can be implemented using compiler intrinsics like __builtin_unreachable() in GCC/Clang and __assume(0) in MSVC.

With that change, the performance gap disappears - std::views::zip is a zero-cost abstraction:

-------------------------------
Benchmark                   CPU
-------------------------------
BM_SoA_Loop/1024         544 ns
BM_SoA_Loop/4096        2222 ns
BM_SoA_Loop/32768      17648 ns
BM_SoA_Loop/65536      34319 ns
BM_SoA_Zip/1024          544 ns
BM_SoA_Zip/4096         2051 ns
BM_SoA_Zip/32768       16044 ns
BM_SoA_Zip/65536       32227 ns

Summary

After thie lesson this lesson, we now have a named, safer, and composable tool for every iteration pattern:

  1. Reverse: Use std::views::reverse when you need to iterate through a range in reverse order.
  2. Enumerate: Use std::views::enumerate when you need an index. It combines the container with a mathematical sequence using iota.
  3. Stride: Use std::views::stride and std::views::drop to process interleaved data without index math.
  4. Zip: Use std::views::zip to iterate multiple containers at once. This solves the usability nightmare of "structure of arrays," allowing us to keep our data fast (split in memory) but our code clean (unified in logic).
  5. Virtual Structs: We proved that zip creates a transient view that acts like an object but performs like raw arrays.
Have a question about this lesson?
Answers are generated by AI models and may not be accurate