Creating Views
Updating our ECS to support composable, range-based views that handle the smallest-set algorithm automatically using C++20 ranges.
In the previous lessons, we proved that splitting our data into relational component pools is excellent for hardware performance. By using contiguous arrays, we keep our CPU caches hot. By using bitmasks, we avoid fetching data for entities that don't need it.
However, the code required to use these systems is currently a disaster. We wrote a ProximityAudioSystem that requires nearly 50 lines of code just to find entities that have both a Proximity component and an Audio component. We had to manually find the smallest set, manually look up signatures, and manually fetch indices.
In this lesson, we are going to fix that by building a view system. This will package our optimized "smallest set" algorithm, allowing us to write systems like this:
// The Dream API
auto noisy_neighbors = world.GetView<Proximity, Audio>();
for (auto[entity, prox, audio] : noisy_neighbors) {
std::print(
"Processing audio for {} (distance = {}, volume = {})\n",
entity.name, prox.distance, audio.volume
);
}This single loop will do everything our 50-line function did: it will automatically pick the best iteration strategy, check the bitmasks, and hand us references to the data we need. Best of all, it will do this with zero runtime overhead compared to the manual version.
The Proxy Objects
Before we can iterate, we need to solve a data access problem. In our "Dream API" above, the prox variable acts like an object with a .distance member. However, our ProximityStorage is a structure of arrays. It doesn't store objects.
To bridge this gap, we need to bring back the proxy objects we introduced earlier in the course. These are temporary, lightweight structs that hold references to the data inside our vectors.
Let's implement ProximityRef and AudioRef, and add a helper function Get() to our storage classes to create them:
Files
These proxies are free to create. The compiler will inline their construction using the scalar replacement of aggregates optimization, meaning code like prox.distance will compile down to a direct load from the m_distance vector.
The View Pipeline
Now we need a container that knows how to find entities matching our component requests. Because we want to support querying any number of component pools, GetView() will need to use advanced compile-time techniques. Familiarity with and is highly recommended.
To process our components, we will construct a C++20 range pipeline. Let's start by including <ranges>, and adding a variadic GetView template to Registry.h:
dsa_core/include/dsa/Registry.h
#pragma once
#include <tuple>
#include <ranges>
#include "EntityStorage.h"
#include "AudioStorage.h"
#include "ProximityStorage.h"
// ...
class Registry {
public:
// ...
template <typename... ComponentTypes>
auto GetView() {
// View logic coming next...
}
};Calculating the Required Signature
To execute our join algorithm, the view needs to calculate the combined signature bitmask of the requested components. We can do this by using the bitwise | operator as before.
For example, if our view was asking for entities that have two types of components, our mask would look like this:
uint8_t mask = (
(1 << GetBitIndex<TypeA>()) | (1 << GetBitIndex<TypeB>())
);In our case, we don't know in advance what types are requested, or even how many types are involved. Those depend on the ComponentTypes that were provided as template arguments.
As such, we need to use a fold expression:
dsa_core/include/dsa/Registry.h
// ...
class Registry {
public:
// ...
template <typename... ComponentTypes>
auto GetView() {
// Generate the combined mask using a fold expression
uint8_t mask = ((1 << GetBitIndex<ComponentTypes>()) | ...);
// ...
}
};The Signature Assumption
This logic assumes that every component we ask for has a corresponding bit in the entity's signature. This will be a valid assumption if we're using automatic signature generation, such as option 2 in the previous lesson.
If we were customizing our approach such that some component types do not contribute to the signature, our GetView as it is currently implemented would not support those types. If GetBitIndex() returns an invalid index, such as -1, our bitwise left-shift 1 << -1 would trigger undefined behavior.
In a robust, production-grade system, we should be mindful of all of the assumptions we're making, what might happen if they're incorrect, and mitigate those risks. Compile-time approaches are particularly useful here, as the compiler will detect a lot of issues without any effort required from us.
It's also relatively easy to expand those capabilities. For example, we can use to ensure our ComponentTypes meet our requirements, or we can scatter static assertions throughout our templates:
dsa_core/include/dsa/Registry.h
// ...
class Registry {
public:
// ...
template <typename... ComponentTypes>
auto GetView() {
// Ensure that every requested component actually has a bit
static_assert(((GetBitIndex<ComponentTypes>() >= 0) && ...),
"All components must have a valid signature bit.");
// Ensure that our bits fit inside our uint8_t signature
static_assert(((GetBitIndex<ComponentTypes>() < 8) && ...),
"Component bit index exceeds signature capacity.");
uint8_t mask = ((1 << GetBitIndex<ComponentTypes>()) | ...);
}
};Finding the Smallest Set
Next, we need to find the smallest component pool to use as the iteration driver. This involves evaluating our sparse sets based on their dense array sizes. A simple two-step approach might look something like this:
dsa_core/include/dsa/Registry.h
class Registry {
public:
// ...
template <typename... ComponentTypes>
auto GetView() {
uint8_t mask = ((1 << GetBitIndex<ComponentTypes>()) | ...);
// Step 1: Gather all sparse sets involved in this query
SparseSet* sets[] = { &GetStorage<ComponentTypes>().m_map... };
// Step 2: Find the smallest set to drive iteration
SparseSet* driver = sets[0];
for (auto* s : sets) {
if (s->m_dense.size() < driver->m_dense.size()) {
driver = s;
}
}
}
};This selects the driver based on real-time array sizes, but we only do it once when GetView() is called. The actual loop we run later won't contain any branching logic about how to iterate.
Filtering and Transforming
The bulk of the work that our view needs to perform is the familiar filtering and transformation pattern. We filter out the entities that don't have the required components. We then transform those values into a container that provides all of the entity and component data to support our goal API:
for (auto[entity, prox, audio] : GetView<Proximity, Audio>()) {
// ...
}The Get() methods of our component pools already provide the component proxies - ProximityRef and AudioRef. We'll also need something to represent our entity. For this example, we'll repeat the pattern, adding a simple EntityRef type to our Registry:
dsa_core/include/dsa/Registry.h
class Registry {
public:
// ...
struct EntityRef {
int id;
std::string& name;
};
// ...
};For the filtering and transformation step, we've covered many approaches through the earlier parts of this course. A simple view pipeline comprising std::views::filter and std::views::transform is one of the simplest options.
Our filter step can remove entities that don't have the required signature, and our transform step can package the surviving entities alongside the requested components as a std::tuple, enabling the auto [entity, prox, audio] structured binding syntax we're aiming for.
Again, we don't know which components are required - that is determined by the template parameters - so we need to use a fold expression with ... when making our tuple:
dsa_core/include/dsa/Registry.h
class Registry {
public:
// ...
struct EntityRef {
int id;
std::string& name;
};
template <typename... ComponentTypes>
auto GetView() {
// ... bitmask setup and driver selection ...
// Return a range pipeline
// We drive from the smallest dense array
return driver->m_dense
| std::views::filter([this, mask](int id) {
// Filter out entities that lack the required bits
return (entities.m_signatures[id] & mask) == mask;
})
| std::views::transform([this](int id) {
// Then transform the IDs into tuples of proxies
return std::make_tuple(
// The entity proxy
EntityRef{id, entities.m_names[id]},
// The proxies for every requested component
GetStorage<ComponentTypes>().Get(id)...
);
});
}
};Modifying Template Logic using if constexpr
This GetView() provides a solid, highly optimized foundation, but it can be further expanded and optimized to meet our project's needs. The code that is created by the template at compile time can be modified using if constexpr expressions.
These modifications will often depend on which component types are requested, or how many. For example, if only one component type is requested in the view, we don't need to find a driver or perform any filtering. That single component pool is the driver, and every component maps to an entity.
Our approach could therefore be simplified to this:
dsa_core/include/dsa/Registry.h
// ...
class Registry {
public:
// ...
template <typename T>
auto GetSingleComponentView() {
auto& storage = GetStorage<T>();
// m_dense provides entity indices
return storage.m_map.m_dense
// enumerate provides component indicies (0, 1, 2, ...)
| std::views::enumerate // Requires C++23
| std::views::transform([this, &storage](auto&& indexed) {
auto& [component_id, entity_id] = indexed;
return std::make_tuple(
EntityRef{entity_id, entities.m_names[entity_id]},
storage.m_components[component_id]
);
});
}
// ...
};Our primary GetView template can defer to this implementation using if constexpr when only a single component type is requested:
dsa_core/include/dsa/Registry.h
// ...
class Registry {
public:
// ...
template <typename... ComponentTypes>
auto GetView() {
if constexpr (sizeof...(ComponentTypes) == 1) {
return GetSingleComponentView<ComponentTypes...>();
}
// ... multi-component algorithm unchanged ...
}
};There are further opportunities for optimization. For example, Get(entity_id) performs a lookup into each pool's m_sparse array to locate the component ID within m_dense. This is unnecessary for the driver, as our loop is already iterating over that pool's m_dense.
However, these optimizations have increasingly expensive tradeoffs and diminishing returns. A project that needs every drip of performance is likely to invest more effort in intervening in the underlying memory layout, such as grouping entities with the same signature together.
Usage Examples
Our registry now provides a safe, friendly interface for any of our systems to efficiently get the data they need:
dsa_app/main.cpp
#include <dsa/Registry.h>
#include <dsa/EntityHandle.h>
#include <print>
int main() {
Registry world;
// Create Alice with both components
world.CreateEntity("Alice")
.AddComponent<Proximity>(10.0f, 0.0f, 0.0f)
.AddComponent<Audio>(0.8f, 1.0f, 0.0f);
// Create Bob with only Proximity
world.CreateEntity("Bob")
.AddComponent<Proximity>(5.0f, 0.0f, 0.0f);
// Create View
auto view = world.GetView<Proximity, Audio>();
// Iterate entities with both proximity and audio
// Alice will appear, Bob will be skipped:
for (auto [entity, prox, audio] : view) {
std::print(
"Entity: {} | Dist: {} | Vol: {}\n",
entity.name, prox.distance, audio.volume
);
}
}Entity: Alice | Dist: 10 | Vol: 0.8Benchmarking the Abstraction
Our ECS is now much easier to use, but have we made it slower? Abstractions usually come with a cost.
Let's verify this with a benchmark comparing two approaches:
- Manual: The direct implementation of the smallest set algorithm we had before, joining our
ProximityandAudiocomponent pools. - View: Our new
GetView()ranges pipeline.
benchmarks/main.cpp
#include <benchmark/benchmark.h>
#include <dsa/Registry.h>
#include <dsa/EntityHandle.h>
#include <random>
// Helper to populate world with a mixture of components
void SetupWorld(Registry& world, int n) {
std::mt19937 rng(42);
// ~10% of entities get a Proximity component
std::uniform_int_distribution<int> dist_prox(0, 9);
// ~5% of entities get an Audio component
std::uniform_int_distribution<int> dist_audio(0, 19);
for (int i = 0; i < n; ++i) {
auto e = world.CreateEntity("E");
if (dist_prox(rng) == 0) {
e.AddComponent<Proximity>(1.0f, 1.0f, 1.0f);
}
if (dist_audio(rng) == 0) {
e.AddComponent<Audio>(1.0f, 1.0f, 1.0f);
}
}
}
static void BM_Manual(benchmark::State& state) {
int n = state.range(0);
Registry world;
SetupWorld(world, n);
// Get references to raw storage
auto& prox = world.GetStorage<Proximity>();
auto& audio = world.GetStorage<Audio>();
auto& entities = world.entities;
// Calculate combined mask
uint8_t mask_prox = 1 << Registry::GetBitIndex<Proximity>();
uint8_t mask_audio = 1 << Registry::GetBitIndex<Audio>();
uint8_t mask_both = mask_prox | mask_audio;
for (auto _ : state) {
int count = 0;
// Manual Driver Logic: Identify and iterate the smallest set
if (prox.m_distance.size() < audio.m_volume.size()) {
// Driver: Proximity
for (size_t i = 0; i < prox.m_distance.size(); ++i) {
int id = prox.m_map.m_dense[i];
if ((entities.m_signatures[id] & mask_both) == mask_both) {
count++; // Simulating work
}
}
} else {
// Driver: Audio
for (size_t i = 0; i < audio.m_volume.size(); ++i) {
int id = audio.m_map.m_dense[i];
if ((entities.m_signatures[id] & mask_both) == mask_both) {
count++; // Simulating work
}
}
}
benchmark::DoNotOptimize(count);
}
}
static void BM_View(benchmark::State& state) {
int n = state.range(0);
Registry world;
SetupWorld(world, n);
for (auto _ : state) {
int count = 0;
// The View Abstraction
auto view = world.GetView<Proximity, Audio>();
for (auto[entity, prox, audio] : view) {
count++; // Simulating work
}
benchmark::DoNotOptimize(count);
}
}
#define BENCHMARK_CONFIG(name) \\
BENCHMARK(name) \\
->RangeMultiplier(10) \\
->Range(100'000, 10'000'000) \\
->Unit(benchmark::kMillisecond)
BENCHMARK_CONFIG(BM_Manual);
BENCHMARK_CONFIG(BM_View);------------------------------
Benchmark CPU
------------------------------
BM_Manual/100000 0.004 ms
BM_Manual/1000000 0.059 ms
BM_Manual/10000000 0.680 ms
BM_View/100000 0.003 ms
BM_View/1000000 0.056 ms
BM_View/10000000 0.628 msThe compiler is smart enough to see through all of this abstraction and fully optimize our view pipeline, fusing the filter and transform stages down to the bare metal.
Complete Code
A complete version of the example from this lesson is available below:
Files
Summary
In this chapter, we stripped object-oriented design down to the bare metal and rebuilt it. We started by abandoning traditional class hierarchies, which scatter data across the heap and destroy performance through cache misses.
Instead, we proved that relational data - arranged in contiguous component pools connected by sparse sets - is vastly superior for the hardware.
We then took the raw, mechanical complexity of the join algorithm, which manually scanned bitmasks to avoid pipeline stalls, and wrapped it in a zero-overhead API. By using the C++20 range interface, our systems can simply request views that automatically filter and transform the smallest possible dataset, fetching exactly what they need, all while maintaining performance.