In the previous lessons, we proved that splitting our data into relational component pools is excellent for hardware performance. By using contiguous arrays, we keep our CPU caches hot. By using bitmasks, we avoid fetching data for entities that don't need it.

However, the code required to use these systems is currently a disaster. We wrote a ProximityAudioSystem that requires nearly 50 lines of code just to find entities that have both a Proximity component and an Audio component. We had to manually find the smallest set, manually look up signatures, and manually fetch indices.

In this lesson, we are going to fix that by building a view system. This will package our optimized "smallest set" algorithm, allowing us to write systems like this:

1// The Dream API
2auto noisy_neighbors = world.GetView<Proximity, Audio>();
3
4for (auto[entity, prox, audio] : noisy_neighbors) {
5  std::print(
6    "Processing audio for {} (distance = {}, volume = {})\n",
7    entity.name, prox.distance, audio.volume
8  );
9}

This single loop will do everything our 50-line function did: it will automatically pick the best iteration strategy, check the bitmasks, and hand us references to the data we need. Best of all, it will do this with zero runtime overhead compared to the manual version.

The Proxy Objects

Before we can iterate, we need to solve a data access problem. In our "Dream API" above, the prox variable acts like an object with a .distance member. However, our ProximityStorage is a structure of arrays. It doesn't store objects.

To bridge this gap, we need to bring back the proxy objects we introduced earlier in the course. These are temporary, lightweight structs that hold references to the data inside our vectors.

Let's implement ProximityRef and AudioRef, and add a helper function Get() to our storage classes to create them:

Files

dsa_core

Select a file to view its content

These proxies are free to create. The compiler will inline their construction using the scalar replacement of aggregates optimization, meaning code like prox.distance will compile down to a direct load from the m_distance vector.

The View Pipeline

Now we need a container that knows how to find entities matching our component requests. Because we want to support querying any number of component pools, GetView() will need to use advanced compile-time techniques. Familiarity with and is highly recommended.

To process our components, we will construct a C++20 range pipeline. Let's start by including <ranges>, and adding a variadic GetView template to Registry.h:

dsa_core/include/dsa/Registry.h

1#pragma once
2#include <tuple>
3#include <ranges> 
4#include "EntityStorage.h"
5#include "AudioStorage.h"
6#include "ProximityStorage.h"
7
8// ...
9
10class Registry {
11public:
12  // ...
13
14  template <typename... ComponentTypes> 
15  auto GetView() { 
16    // View logic coming next... 
17  } 
18};

Calculating the Required Signature

To execute our join algorithm, the view needs to calculate the combined signature bitmask of the requested components. We can do this by using the bitwise | operator as before.

For example, if our view was asking for entities that have two types of components, our mask would look like this:

1uint8_t mask = (
2  (1 << GetBitIndex<TypeA>()) | (1 << GetBitIndex<TypeB>())
3);

In our case, we don't know in advance what types are requested, or even how many types are involved. Those depend on the ComponentTypes that were provided as template arguments.

As such, we need to use a fold expression:

dsa_core/include/dsa/Registry.h

1// ...
2
3class Registry {
4public:
5  // ...
6
7  template <typename... ComponentTypes>
8  auto GetView() {
9    // Generate the combined mask using a fold expression
10    uint8_t mask = ((1 << GetBitIndex<ComponentTypes>()) | ...); 
11
12    // ...
13  }
14};

The Signature Assumption

This logic assumes that every component we ask for has a corresponding bit in the entity's signature. This will be a valid assumption if we're using automatic signature generation, such as option 2 in the previous lesson.

If we were customizing our approach such that some component types do not contribute to the signature, our GetView as it is currently implemented would not support those types. If GetBitIndex() returns an invalid index, such as -1, our bitwise left-shift 1 << -1 would trigger undefined behavior.

In a robust, production-grade system, we should be mindful of all of the assumptions we're making, what might happen if they're incorrect, and mitigate those risks. Compile-time approaches are particularly useful here, as the compiler will detect a lot of issues without any effort required from us.

It's also relatively easy to expand those capabilities. For example, we can use to ensure our ComponentTypes meet our requirements, or we can scatter static assertions throughout our templates:

dsa_core/include/dsa/Registry.h

1// ...
2
3class Registry {
4public:
5  // ...
6
7  template <typename... ComponentTypes>
8  auto GetView() {
9    // Ensure that every requested component actually has a bit 
10    static_assert(((GetBitIndex<ComponentTypes>() >= 0) && ...), 
11      "All components must have a valid signature bit."); 
12
13    // Ensure that our bits fit inside our uint8_t signature 
14    static_assert(((GetBitIndex<ComponentTypes>() < 8) && ...), 
15      "Component bit index exceeds signature capacity."); 
16
17    uint8_t mask = ((1 << GetBitIndex<ComponentTypes>()) | ...);
18  }
19};

Finding the Smallest Set

Next, we need to find the smallest component pool to use as the iteration driver. This involves evaluating our sparse sets based on their dense array sizes. A simple two-step approach might look something like this:

dsa_core/include/dsa/Registry.h

1class Registry {
2public:
3  // ...
4
5  template <typename... ComponentTypes>
6  auto GetView() {
7    uint8_t mask = ((1 << GetBitIndex<ComponentTypes>()) | ...);
8
9    // Step 1: Gather all sparse sets involved in this query
10    SparseSet* sets[] = { &GetStorage<ComponentTypes>().m_map... }; 
11
12    // Step 2: Find the smallest set to drive iteration
13    SparseSet* driver = sets[0]; 
14    for (auto* s : sets) { 
15      if (s->m_dense.size() < driver->m_dense.size()) { 
16        driver = s; 
17      } 
18    } 
19  }
20};

This selects the driver based on real-time array sizes, but we only do it once when GetView() is called. The actual loop we run later won't contain any branching logic about how to iterate.

Filtering and Transforming

The bulk of the work that our view needs to perform is the familiar filtering and transformation pattern. We filter out the entities that don't have the required components. We then transform those values into a container that provides all of the entity and component data to support our goal API:

1for (auto[entity, prox, audio] : GetView<Proximity, Audio>()) {
2  // ...
3}

The Get() methods of our component pools already provide the component proxies - ProximityRef and AudioRef. We'll also need something to represent our entity. For this example, we'll repeat the pattern, adding a simple EntityRef type to our Registry:

dsa_core/include/dsa/Registry.h

1class Registry {
2public:
3  // ...
4
5  struct EntityRef { 
6    int id; 
7    std::string& name; 
8  }; 
9
10  // ...
11};

For the filtering and transformation step, we've covered many approaches through the earlier parts of this course. A simple view pipeline comprising std::views::filter and std::views::transform is one of the simplest options.

Our filter step can remove entities that don't have the required signature, and our transform step can package the surviving entities alongside the requested components as a std::tuple, enabling the auto [entity, prox, audio] structured binding syntax we're aiming for.

Again, we don't know which components are required - that is determined by the template parameters - so we need to use a fold expression with ... when making our tuple:

dsa_core/include/dsa/Registry.h

1class Registry {
2public:
3  // ...
4
5  struct EntityRef { 
6    int id; 
7    std::string& name; 
8  }; 
9
10  template <typename... ComponentTypes>
11  auto GetView() {
12    // ... bitmask setup and driver selection ...
13
14    // Return a range pipeline
15    // We drive from the smallest dense array
16    return driver->m_dense 
17      | std::views::filter([this, mask](int id) { 
18          // Filter out entities that lack the required bits 
19          return (entities.m_signatures[id] & mask) == mask; 
20        }) 
21      | std::views::transform([this](int id) { 
22          // Then transform the IDs into tuples of proxies 
23          return std::make_tuple( 
24            // The entity proxy 
25            EntityRef{id, entities.m_names[id]}, 
26            // The proxies for every requested component 
27            GetStorage<ComponentTypes>().Get(id)... 
28          ); 
29        }); 
30  }
31};

Modifying Template Logic using `if constexpr`

This GetView() provides a solid, highly optimized foundation, but it can be further expanded and optimized to meet our project's needs. The code that is created by the template at compile time can be modified using if constexpr expressions.

These modifications will often depend on which component types are requested, or how many. For example, if only one component type is requested in the view, we don't need to find a driver or perform any filtering. That single component pool is the driver, and every component maps to an entity.

Our approach could therefore be simplified to this:

dsa_core/include/dsa/Registry.h

1// ...
2
3class Registry {
4public:
5  // ...
6  
7  template <typename T>
8  auto GetSingleComponentView() {
9    auto& storage = GetStorage<T>();
10    // m_dense provides entity indices
11    return storage.m_map.m_dense
12      // enumerate provides component indicies (0, 1, 2, ...)
13      | std::views::enumerate // Requires C++23
14      | std::views::transform([this, &storage](auto&& indexed) {
15          auto& [component_id, entity_id] = indexed;
16          return std::make_tuple(
17            EntityRef{entity_id, entities.m_names[entity_id]},
18            storage.m_components[component_id]
19          );
20      });
21  }
22
23  // ...
24};

Our primary GetView template can defer to this implementation using if constexpr when only a single component type is requested:

dsa_core/include/dsa/Registry.h

1// ...
2
3class Registry {
4public:
5  // ...
6
7  template <typename... ComponentTypes>
8  auto GetView() {
9    if constexpr (sizeof...(ComponentTypes) == 1) {
10      return GetSingleComponentView<ComponentTypes...>();
11    }
12    
13    // ... multi-component algorithm unchanged ...
14  }
15};

There are further opportunities for optimization. For example, Get(entity_id) performs a lookup into each pool's m_sparse array to locate the component ID within m_dense. This is unnecessary for the driver, as our loop is already iterating over that pool's m_dense.

However, these optimizations have increasingly expensive tradeoffs and diminishing returns. A project that needs every drip of performance is likely to invest more effort in intervening in the underlying memory layout, such as grouping entities with the same signature together.

Usage Examples

Our registry now provides a safe, friendly interface for any of our systems to efficiently get the data they need:

dsa_app/main.cpp

1#include <dsa/Registry.h>
2#include <dsa/EntityHandle.h>
3#include <print>
4
5int main() {
6  Registry world;
7
8  // Create Alice with both components
9  world.CreateEntity("Alice")
10    .AddComponent<Proximity>(10.0f, 0.0f, 0.0f)
11    .AddComponent<Audio>(0.8f, 1.0f, 0.0f);
12
13  // Create Bob with only Proximity
14  world.CreateEntity("Bob")
15    .AddComponent<Proximity>(5.0f, 0.0f, 0.0f);
16
17  // Create View
18  auto view = world.GetView<Proximity, Audio>();
19
20  // Iterate entities with both proximity and audio
21  // Alice will appear, Bob will be skipped:
22  for (auto [entity, prox, audio] : view) {
23    std::print(
24      "Entity: {} | Dist: {} | Vol: {}\n",
25      entity.name, prox.distance, audio.volume
26    );
27  }
28}

1Entity: Alice | Dist: 10 | Vol: 0.8

Benchmarking the Abstraction

Our ECS is now much easier to use, but have we made it slower? Abstractions usually come with a cost.

Let's verify this with a benchmark comparing two approaches:

Manual: The direct implementation of the smallest set algorithm we had before, joining our Proximity and Audio component pools.
View: Our new GetView() ranges pipeline.

benchmarks/main.cpp

1#include <benchmark/benchmark.h>
2#include <dsa/Registry.h>
3#include <dsa/EntityHandle.h>
4#include <random>
5
6// Helper to populate world with a mixture of components
7void SetupWorld(Registry& world, int n) {
8  std::mt19937 rng(42);
9
10  // ~10% of entities get a Proximity component
11  std::uniform_int_distribution<int> dist_prox(0, 9);
12
13  // ~5% of entities get an Audio component
14  std::uniform_int_distribution<int> dist_audio(0, 19);
15
16  for (int i = 0; i < n; ++i) {
17    auto e = world.CreateEntity("E");
18
19    if (dist_prox(rng) == 0) {
20      e.AddComponent<Proximity>(1.0f, 1.0f, 1.0f);
21    }
22
23    if (dist_audio(rng) == 0) {
24      e.AddComponent<Audio>(1.0f, 1.0f, 1.0f);
25    }
26  }
27}
28
29static void BM_Manual(benchmark::State& state) {
30  int n = state.range(0);
31  Registry world;
32  SetupWorld(world, n);
33
34  // Get references to raw storage
35  auto& prox = world.GetStorage<Proximity>();
36  auto& audio = world.GetStorage<Audio>();
37  auto& entities = world.entities;
38
39  // Calculate combined mask
40  uint8_t mask_prox = 1 << Registry::GetBitIndex<Proximity>();
41  uint8_t mask_audio = 1 << Registry::GetBitIndex<Audio>();
42  uint8_t mask_both = mask_prox | mask_audio;
43
44  for (auto _ : state) {
45    int count = 0;
46
47    // Manual Driver Logic: Identify and iterate the smallest set
48    if (prox.m_distance.size() < audio.m_volume.size()) {
49      // Driver: Proximity
50      for (size_t i = 0; i < prox.m_distance.size(); ++i) {
51        int id = prox.m_map.m_dense[i];
52        if ((entities.m_signatures[id] & mask_both) == mask_both) {
53          count++; // Simulating work
54        }
55      }
56    } else {
57      // Driver: Audio
58      for (size_t i = 0; i < audio.m_volume.size(); ++i) {
59        int id = audio.m_map.m_dense[i];
60        if ((entities.m_signatures[id] & mask_both) == mask_both) {
61          count++; // Simulating work
62        }
63      }
64    }
65    benchmark::DoNotOptimize(count);
66  }
67}
68
69static void BM_View(benchmark::State& state) {
70  int n = state.range(0);
71  Registry world;
72  SetupWorld(world, n);
73
74  for (auto _ : state) {
75    int count = 0;
76
77    // The View Abstraction
78    auto view = world.GetView<Proximity, Audio>();
79    for (auto[entity, prox, audio] : view) {
80      count++; // Simulating work
81    }
82    benchmark::DoNotOptimize(count);
83  }
84}
85
86#define BENCHMARK_CONFIG(name) \\
87  BENCHMARK(name) \\
88    ->RangeMultiplier(10) \\
89    ->Range(100'000, 10'000'000) \\
90    ->Unit(benchmark::kMillisecond)
91
92BENCHMARK_CONFIG(BM_Manual);
93BENCHMARK_CONFIG(BM_View);

1------------------------------
2Benchmark                  CPU
3------------------------------
4BM_Manual/100000      0.004 ms
5BM_Manual/1000000     0.059 ms
6BM_Manual/10000000    0.680 ms
7BM_View/100000        0.003 ms
8BM_View/1000000       0.056 ms
9BM_View/10000000      0.628 ms

The compiler is smart enough to see through all of this abstraction and fully optimize our view pipeline, fusing the filter and transform stages down to the bare metal.

Complete Code

A complete version of the example from this lesson is available below:

Files

dsa_app

dsa_core

Select a file to view its content

Summary

In this chapter, we stripped object-oriented design down to the bare metal and rebuilt it. We started by abandoning traditional class hierarchies, which scatter data across the heap and destroy performance through cache misses.

Instead, we proved that relational data - arranged in contiguous component pools connected by sparse sets - is vastly superior for the hardware.

We then took the raw, mechanical complexity of the join algorithm, which manually scanned bitmasks to avoid pipeline stalls, and wrapped it in a zero-overhead API. By using the C++20 range interface, our systems can simply request views that automatically filter and transform the smallest possible dataset, fetching exactly what they need, all while maintaining performance.

Creating Views

The Proxy Objects

Files

The View Pipeline

dsa_core/include/dsa/Registry.h

Calculating the Required Signature

dsa_core/include/dsa/Registry.h

The Signature Assumption

dsa_core/include/dsa/Registry.h

Finding the Smallest Set

dsa_core/include/dsa/Registry.h

Filtering and Transforming

dsa_core/include/dsa/Registry.h

dsa_core/include/dsa/Registry.h

Modifying Template Logic using `if constexpr`

dsa_core/include/dsa/Registry.h

dsa_core/include/dsa/Registry.h

Usage Examples

dsa_app/main.cpp

Benchmarking the Abstraction

benchmarks/main.cpp

Complete Code

Files

Summary

Entity-Component-System

Creating Views

The Proxy Objects

Files

The View Pipeline

dsa_core/include/dsa/Registry.h

Calculating the Required Signature

dsa_core/include/dsa/Registry.h

The Signature Assumption

dsa_core/include/dsa/Registry.h

Finding the Smallest Set

dsa_core/include/dsa/Registry.h

Filtering and Transforming

dsa_core/include/dsa/Registry.h

dsa_core/include/dsa/Registry.h

Modifying Template Logic using if constexpr

dsa_core/include/dsa/Registry.h

dsa_core/include/dsa/Registry.h

Usage Examples

dsa_app/main.cpp

Benchmarking the Abstraction

benchmarks/main.cpp

Complete Code

Files

Summary

Modifying Template Logic using `if constexpr`