In previous chapters, we focused on taking existing containers - like std::vector - and manipulating them using algorithms, views, and pipelines.

In the rest of the course, we'll focus on creating our own data structure. In this lesson, we start by marrying the performance and cache-efficiency of the structure of arrays (SoA) pattern with the clean, familiar ergonomics of working with objects.

We will build a container that looks like a collection of objects to consumers, but acts like a parallel stream of primitives to the CPU.

The Ergonomics Problem

We have established that splitting data into parallel arrays (structure of arrays) can drastically improve performance over the more familiar array of structures pattern:

1// Array of Structures (AoS)
2struct Player {
3  int id;
4  int score;
5  float health;
6  std::string name;
7}
8
9std::vector<Player> Players;

1// Structure of Arrays (SoA)
2struct PlayerStorage {
3  std::vector<int> ids;
4  std::vector<int> scores;
5  std::vector<float> health;
6  std::vector<std::string> names;
7};

Whilst SoA is more performant, it immediately presents a usability problem. We prefer to think of our design in terms of cohesive entities, such as Player objects, and our SoA data layout no longer works that way.

Zip Views

In a , we saw how std::views::zip allows us to iterate a collection of parallel arrays simultaneously without managing raw indices.

This would allow us to iterate over our PlayerStorage using code that looks something like this:

1// std::views::zip requires C++23
2auto players = std::views::zip(ids, scores, health, names);
3
4for (auto [id, score, hp, name] : players) {
5  if (score > 100) {
6    hp += 10;
7  }
8}

While this works, it introduces some friction. Compare that to the object-oriented syntax we are used to:

1// The Logical Ideal (AoS Syntax)
2for (auto& player : players) {
3  if (player.score > 100) {
4    player.health += 10;
5  }
6}

This is readable and intuitive. Our goal is to build a container that stores data in the efficient, parallel SoA format, but exposes it via the clean, intuitive OOP syntax. And we want to do this without sacrificing performance

The Proxy Reference

There are many ways we can do this, but the simplest involves the concept of a proxy object.

In a standard array-of-structures like std::vector<Player>, when we dereference an iterator, we get a Player&. This is a reference to a real block of memory where a Player struct exists.

In our SoA container, that Player struct does not exist. The ID is in one array; the Score in another. We cannot return a Player& because there is no Player to point to.

Instead, we can return a transient proxy. This is a lightweight, temporary object that looks and acts like a player, but is actually just a bundle of references pointing to the scattered data.

Let's define this proxy. Note that it uses reference members (&), meaning it must be bound to existing data upon construction:

PlayerStorage.h

1#include <vector>
2#include <string>
3
4// The Virtual Object
5// This structure essentially reassembles the user's mental model
6// of a Player from the scattered physical memory
7struct PlayerRef {
8  int& id;
9  int& score;
10  float& health;
11  std::string& name;
12  
13  // We can still have methods and all the other OOP tricks
14  void Heal(float amount) {
15    health += amount;
16  }
17};

The Storage

Let's set up our structure-of-arrays layout, as well as a basic function demonstrating how we can add a player to our collection.

We're not focused too much on API design here - our first goal is to build the foundations of our system and ensure it is performant. We could add things like encapsulation later, but we'll just keep everything public for now, to make it easier to test and experiment on:

PlayerStorage.h

1#include <vector>
2#include <string>
3#include <ranges>
4#include <tuple>
5
6struct PlayerRef {/*...*/};
17
18class PlayerStorage {
19public:
20  // The Physical Storage (SoA)
21  std::vector<int> m_ids;
22  std::vector<int> m_scores;
23  std::vector<float> m_health;
24  std::vector<std::string> m_names;
25
26  // Add a player by splitting it into components immediately
27  void AddPlayer(int id, int score, float hp, std::string name) {
28    m_ids.push_back(id);
29    m_scores.push_back(score);
30    m_health.push_back(hp);
31    m_names.push_back(std::move(name));
32  }
33};

The Transformer Pipeline

Now we have our storage (the parallel vectors) and our interface (the PlayerRef). We need a bridge to connect them.

We can build this bridge using the composable views we learned about earlier.

Zip: Combine the parallel vectors into a single view of tuples.
Transform: Convert the tuples into PlayerRef objects.

We will define a helper method, GetView(), to construct the pipeline:

1// ...
2
3class PlayerStorage {
4public:
5  // ...
6  
7  auto GetView() {
8    return std::views::zip(
9      m_ids, m_scores, m_health, m_names 
10    ) | std::views::transform([](auto&& tuple) {
11          // Unpack the tuple...
12          auto& [id, score, hp, name] = tuple;
13
14          // ...and reassemble into our Proxy
15          return PlayerRef{id, score, hp, name};
16        });
17  }
18};

The return type of this function is incredibly complex. Thankfully, auto handles this for us.

These additions mean our PlayerStorage has become compatible with range-based for loops and range-based algorithms.

Consuming the Container

Our system is very basic, but just by implementing the range interface, we've already unlocked a lot of capabilities:

main.cpp

1#include <dsa/PlayerStorage.h>
2#include <iostream>
3#include <algorithm>
4#include <ranges>
5
6int main() {
7  PlayerStorage party;
8  
9  party.AddPlayer(1, 500, 100.0f, "Hero");
10  party.AddPlayer(2, 50, 25.0f, "Sidekick");
11  party.AddPlayer(3, 9000, 150.0f, "Boss");
12
13  // GetView() returns a range, so we can use a range-based for
14  for (PlayerRef p : party.GetView()) {
15    // This looks like standard OOP patterns
16    if (p.score > 100) {
17      p.Heal(10.0f); 
18    }
19  }
20
21  // The proxy works seamlessly with range-based algorithms
22  // We can find the max element by projecting the 'health' member
23  auto toughest = std::ranges::max_element(party.GetView(), {},
24    [](const PlayerRef& p) {
25      return p.health;
26    }
27  );
28  
29  if (toughest != party.end()) {
30    std::cout << "Toughest: " << (*toughest).name << "\n";
31  }
32
33  // We can pipe our container into filters and transforms
34  // This pipeline views the names of high-scoring players
35  auto mvps = party.GetView()
36    | std::views::filter([](const PlayerRef& p) {
37        return p.score > 100;
38      })
39    | std::views::transform([](const PlayerRef& p) {
40        return p.name;
41      });
42
43  std::cout << "MVPs: ";
44  std::ranges::for_each(mvps, [](const std::string& name) {
45    std::cout << name << " ";
46  });
47}

We've completely hidden the ugly SoA layout from consumers. The user writes code that reasons about "Players," but the hardware executes code that streams through parallel arrays of basic primitives.

The previous code showing our system being used confirms that we can deliver an AoS-like experience. Let's now check how well it performs.

Benchmarking

When we iterate through our collection, our GetView() function is creating a tuple inside the zip, unpacking it, constructing these PlayerRef objects, and passing them to the loop body. That sounds like a lot of work - let's find out how much we're paying for it.

We'll do this using the we set up in an earlier chapter.

We'll benchmark the same algorithm accessing the same data across three different data structures:

BM_AoS - the conventional Array of Structures approach.
BM_PlayerStorage - our new system.
BM_RawLoop - accessing the same key data directly through an array, like our PlayerStorage does behind the scenes. This is to check how much performance we're losing to provide the friendly PlayerRef API.

benchmarks/main.cpp

1#include <benchmark/benchmark.h>
2#include <dsa/PlayerStorage.h>
3
4// 1. Original AoS
5struct Player {
6  int id;
7  int score;
8  float health;
9  std::string name;
10};
11
12static void BM_AoS(benchmark::State& state) {
13  int n = state.range(0);
14  std::vector<Player> players(n, Player{1, 2, 3.f, "Name"});
15
16  for (auto _ : state) {
17    float sum = 0.0f;
18    for (const Player& player : players) {
19      sum += player.score;
20    }
21    benchmark::DoNotOptimize(sum);
22  }
23}
24
25// 2. PlayerStorage
26static void BM_PlayerStorage(benchmark::State& state) {
27  int n = state.range(0);
28  PlayerStorage ps;
29  for(int i=0; i<n; ++i) ps.AddPlayer(1, 2, 3.0f, "Name");
30
31  for (auto _ : state) {
32    float sum = 0.0f;
33    for (const PlayerRef player : ps.GetView()) {
34      sum += player.score;
35    }
36    benchmark::DoNotOptimize(sum);
37  }
38}
39
40// 3. Raw Manual Loop
41static void BM_RawLoop(benchmark::State& state) {
42  int n = state.range(0);
43  std::vector<int> scores(n, 2);
44
45  for (auto _ : state) {
46    float sum = 0.0f;
47    for (int i=0; i<n; ++i) {
48      sum += scores[i];
49    }
50    benchmark::DoNotOptimize(sum);
51  }
52}
53
54#define BENCHMARK_STD(func) \
55  BENCHMARK(func) \
56    ->RangeMultiplier(10) \
57    ->Range(10 * 1000, 1000 * 1000) \
58    ->Unit(benchmark::kMillisecond)
59
60BENCHMARK_STD(BM_AoS);
61BENCHMARK_STD(BM_PlayerStorage);
62BENCHMARK_STD(BM_RawLoop);

The original std::vector<Player> approach keeps up with small collections, but the inefficient memory layout causes compounding issues as things scale up.

The good news is that our PlayerStorage is a big improvement over BM_AoS, and it even seems to be equivalent to the abstraction-free BM_RawLoop approach:

1-----------------------------------
2Benchmark                       CPU
3-----------------------------------
4BM_AoS/10000               0.007 ms
5BM_AoS/100000              0.075 ms
6BM_AoS/1000000              2.25 ms
7BM_PlayerStorage/10000     0.007 ms
8BM_PlayerStorage/100000    0.074 ms
9BM_PlayerStorage/1000000   0.710 ms
10BM_RawLoop/10000           0.008 ms
11BM_RawLoop/100000          0.071 ms
12BM_RawLoop/1000000         0.715 ms

But how can this be a zero-cost abstraction over the direct approach? Our view is delivering PlayerRef objects to the consumer on every iteration of the loop - how are we creating these objects for free?

1auto GetView() {
2  return std::views::zip(
3    m_ids, m_scores, m_health, m_names 
4  ) | std::views::transform([](auto&& tuple) {
5        auto& [id, score, hp, name] = tuple;
6        return PlayerRef{id, score, hp, name}; // ?? 
7      });
8}

Scalar Replacement of Aggregates

The secret is, we're not creating the objects. Remember the as-if rule. Our code is not asking the compiler to create a PlayerRef at this point in the code. It's asking the compiler to create a program that behaves as if a PlayerRef were created at this point in the code.

The compiler avoids creating these objects due to a specific category of optimization called SROA (Scalar Replacement of Aggregates).

Firstly, PlayerRef is a simple "aggregate" type - it's just a bag carrying around some data. It doesn't have any constructor logic.

Secondly, let's look at how the consumer is using these conceptual PlayerRef objects:

1static void BM_PlayerStorage(benchmark::State& state) {
2  int n = state.range(0);
3  PlayerStorage ps;
4  for(int i=0; i<n; ++i) ps.AddPlayer(1, 2, 3.0f, "Name");
5
6  for (auto _ : state) {
7    float sum = 0.0f;
8    for (PlayerRef player : ps.GetView()) {
9      sum += player.score;
10    }
11    benchmark::DoNotOptimize(sum);
12  }
13}

Our player never leaves the loop body - it's created at the start of the iteration, and it's destroyed at the end. Between those points, we don't do anything that would require it to be manifested in memory, such as asking for its memory address using &.

The compiler recognizes it can create the requested behavior as if the object existed, without needing to create it. It can just go directly to the underlying PlayerStorage arrays for the data, just as if we were accessing PlayerStorage.scores directly.

The player is a compile-time ghost that exists only to give consumers the intuitive, object-based API they're familiar with.

Complete Code

Complete versions of the files we created in this lesson are below. We'll continue working on these in the next lesson:

Files

dsa_core

dsa_app

benchmarks

Select a file to view its content

Summary

In this lesson, we built the foundations of a domain-specific container, tailored directly for our program's needs.

The Facade Pattern: We used a class to hide the complex "physical" memory layout (SoA) behind a simple "logical" API (OOP).
Proxy Objects: We learned that iterators don't have to point to real data. They can yield transient proxy objects that act as windows into scattered memory.
Range Adaptation: We used std::views::zip and std::views::transform to implement the iteration logic without writing any raw iterator code.
Zero-Cost Abstraction: We verified via benchmarking that this high-level abstraction compiles down to the most optimal machine code possible.

Domain-Specific Containers

The Ergonomics Problem

Zip Views

The Proxy Reference

PlayerStorage.h

The Storage

PlayerStorage.h

The Transformer Pipeline

Consuming the Container

main.cpp

Benchmarking

benchmarks/main.cpp

Scalar Replacement of Aggregates

Complete Code

Files

Summary

Sorting and Permuting Containers

Implementing a Structure of Arrays