In the previous lesson, we broke our monolithic data structures apart. We moved from a single table (a structure of arrays) to a model that can represent different data types, like audio and proximity components.

They live in dedicated component pools, meaning any systems we build in the future can efficiently work just with the data they need. If a system only needs to act on nearby entities, it can look at our proximity components and, if it needs the entity data, it can traverse the sparse set in $O(1)$ time to retrieve it.

But what if our system needs data from multiple component pools? In this lesson, we'll imagine we need to build ProximityAudioSystem. This system needs to find every entity that has both an audio component and a proximity component.

Starting Point

We'll continue working with the SoA containers and SparseSet we created previously. Simplified versions of these have been provided below if needed.

A key change is that we've removed the m_audio column from EntityStorage, which was used for benchmarking in the previous lesson. We've added a new AudioStorage SoA, which is functionally identical to ProximityStorage - it just uses different names for the columns:

Files

dsa_core

dsa_app

Select a file to view its content

The Direct Scan

The most obvious approach is to iterate over every entity that exists and check if they have what we need. Our EntityStorage provides entity IDs (basic int values) which we can then look up in the sparse sets of our component pools.

The following code benchmarks this approach in a situation where 10% of entities have an audio component and 10% have a proximity component. This means that approximately 1% of entities are of interest to our hypothetical proximity audio system:

benchmarks/main.cpp

1#include <benchmark/benchmark.h>
2#include <dsa/EntityStorage.h>
3#include <dsa/ProximityStorage.h>
4#include <dsa/AudioStorage.h>
5#include <random>
6
7static void BM_NaiveScan(benchmark::State& state) {
8  int n = state.range(0);
9  EntityStorage entities;
10  ProximityStorage prox;
11  AudioStorage audio;
12
13  // Setup: 10% of entities have Proximity, 10% have Audio
14  // The overlap (Intersection) will be roughly 1%
15  std::mt19937 rng(42);
16  std::uniform_int_distribution<int> dist(0, 9);
17
18  for (int i = 0; i < n; ++i) {
19    entities.Add("Entity");
20
21    if (dist(rng) == 0) prox.Add(i, 0, 0, 0);
22    if (dist(rng) == 0) audio.Add(i, 0, 0, 0);
23  }
24
25  for (auto _ : state) {
26    int count = 0;
27    // Iterate the all entities
28    for (int i = 0; i < n; ++i) {
29      // Check for existence in both sets
30      if (prox.m_map.contains(i) && audio.m_map.contains(i)) {
31        count++;
32      }
33    }
34    benchmark::DoNotOptimize(count);
35  }
36}
37
38#define BENCHMARK_CONFIG(name) \
39  BENCHMARK(name) \
40    ->RangeMultiplier(10) \
41    ->Range(100'000, 10'000'000) \
42    ->Unit(benchmark::kMillisecond)
43
44BENCHMARK_CONFIG(BM_NaiveScan);

1-------------------------------
2Benchmark                   CPU
3-------------------------------
4BM_NaiveScan/100000    0.148 ms
5BM_NaiveScan/1000000    1.51 ms
6BM_NaiveScan/10000000   14.1 ms

This is a direct, brute-force solution, and we suspect we can do better, but it gives us a baseline for comparison.

The Smallest Set Driver

Finding "Entities with Audio who also have Proximity" will return the exact same result as "Entities with Proximity who also have Audio", but the performance will be different based on which approach we choose.

We should choose the smallest set as our starting point, or "driver". If we have 50 proximity components and 5,000 audio components, we should iterate through the ProximityStorage, and then check if the associated entity also has an audio component.

This reduces our workload from 10,000 checks (using entities as the driver) or 5,000 checks (using audio as the driver) to just 50 checks (using proximity as the driver).

To use the smallest-set approach, there is the additional overhead of finding which set is smallest, but that is just a few quick size() calls on the component pools.

In the following benchmark, we recreate the same situation as the previous, where 10% of entities have audio, and 10% of them have proximity. The key difference is that we implement our algorithm using this "smallest set" approach:

benchmarks/main.cpp

1// ...
2
3static void BM_SmallestSet(benchmark::State& state) {
4  int n = state.range(0);
5  EntityStorage entities;
6  ProximityStorage prox;
7  AudioStorage audio;
8
9  std::mt19937 rng(42);
10  std::uniform_int_distribution<int> dist(0, 9);
11
12  for (int i = 0; i < n; ++i) {
13    entities.Add("Entity");
14    if (dist(rng) == 0) prox.Add(i, 0, 0, 0);
15    if (dist(rng) == 0) audio.Add(i, 0, 0, 0);
16  }
17
18  for (auto _ : state) {
19    // Calculate which set is smaller
20    // Smaller set becomes the driver
21    bool drive_audio = audio.m_volume.size() < prox.m_distance.size();
22    int count = 0;
23
24    if (drive_audio) {
25      // Driver: Audio (Smallest)
26      // Lookup: Proximity
27      for (size_t i = 0; i < audio.m_volume.size(); ++i) {
28        // Note: Assuming m_dense is accessible for this lesson
29        int entity_id = audio.m_map.m_dense[i];
30        if (prox.m_map.contains(entity_id)) {
31          count++;
32        }
33      }
34    } else {
35      // Driver: Proximity (Smallest)
36      // Lookup: Audio
37      for (size_t i = 0; i < prox.m_distance.size(); ++i) {
38        int entity_id = prox.m_map.m_dense[i];
39        if (audio.m_map.contains(entity_id)) {
40          count++;
41        }
42      }
43    }
44    benchmark::DoNotOptimize(count);
45  }
46}
47
48// ...
49
50BENCHMARK_CONFIG(BM_SmallestSet);

1---------------------------------
2Benchmark                     CPU
3---------------------------------
4BM_NaiveScan/100000      0.151 ms
5BM_NaiveScan/1000000      1.46 ms
6BM_NaiveScan/10000000     14.2 ms
7BM_SmallestSet/100000    0.011 ms
8BM_SmallestSet/1000000   0.136 ms
9BM_SmallestSet/10000000   2.72 ms

The Cache Problem

The smallest set driver is a big improvement, but there is a major cache inefficiency. Assuming ProximityStorage is the smallest set (the "driver"), then our algorithm does the following:

Iterate over proximity storage
For each proximity component, find the associated entity ID within the dense array
To check if that entity also has an audio component, we search the sparse array within the audio component pool via its contains() method

Step 3 is problematic:

1// ...
2
3class SparseSet {
4public:
5  // ...
6
7  bool contains(int entity_id) const {
8    if (entity_id >= m_sparse.size()) return false;
9    // Random memory access
10    return m_sparse[entity_id] != null_id;
11  }
12
13  // ...
14};

The m_sparse vector is sized to the maximum number of entities (e.g., 10,000 or 100,000). When we iterate through our proximity list, the entity_ids we encounter are effectively random numbers.

One iteration might ask for m_sparse[5]. The next might ask for m_sparse[9000]. This is pointer chasing in disguise. We are jumping randomly into a large block of memory. In a realistic system, this address is unlikely to be in the cache, so are going to stall the CPU waiting for main RAM.

If we are joining 3 or 4 component pools, it gets worse. We would have to perform a random lookup into every pool's sparse array.

We'll discuss some heavyweight options to deal with this later in the lesson, but we can mitigate the problem somewhat by adding signatures to our entities.

Bitmasks and Signatures

One way to reduce this problem is to add the "existence" check to the entity itself. This is generally done using a signature.

A signature is a bitmask where each bit represents a specific component type. If an entity has Audio (Bit 0) and Proximity (Bit 1), their signature would be 00000011.

dsa_core/include/dsa/EntityStorage.h

1// ...
2#include <cstdint> // for uint8_t 
3
4class EntityStorage {
5public:
6  std::vector<std::string> m_names;
7  // Bit 0: Proximity, Bit 1: Audio
8  std::vector<uint8_t> m_signatures; 
9
10  int Add(std::string name) {
11    m_names.push_back(std::move(name));
12
13    // Initialize signature to 0 (no components)
14    m_signatures.push_back(0); 
15
16    return int(m_names.size() - 1);
17  }
18
19  // Helper to update signature
20  void RegisterComponent(int entity_id, uint8_t bit) {
21    if (entity_id < m_signatures.size()) {
22        m_signatures[entity_id] |= bit;
23    }
24  }
25};

We can now check for the required components using bitwise techniques:

1// Bit 0: Proximity, Bit 1: Audio
2uint8_t HAS_PROXIMITY = 1 << 0;
3uint8_t HAS_AUDIO = 1 << 1;
4uint8_t HAS_BOTH = HAS_PROXIMITY | HAS_AUDIO;
5
6if ((m_signatures[entity_id] & HAS_BOTH) == HAS_BOTH) {
7  // entity_id has both components
8}

It might not be clear why this is an improvement, and, in isolation, it isn't. Previously, we were jumping to a random location within a large array called m_sparse. Now we're jumping to a random location within an equally-large array called m_signatures.

How can this possibly be an improvement? There are a few reasons for this.

Physical Size

Even though m_signatures and m_sparse have the same length, they have a different physical size in memory. An index typically requires 32 or 64 bits, whilst a signature tracking the key components we care about rarely needs to be larger than 8 bits. This means more signatures can fit in the CPU caches.

Temporal Locality

When we have lots of systems running, many of them will be performing these "does this entity have this component?" checks.

Previously, the answers to "does this entity have an audio component?" and "does this entity have a proximity component?" were in two completely different memory locations. One was in the sparse set of our audio pool, and the other was in the sparse set of our proximity pool.

Now, they're in the same place. Only one system needs to pull m_signatures[some_entity], and then it's in the cache for any other systems that need it in the near future.

Spatial Locality

When a system pulled information from a sparse array, most of the adjacent entries on the same cache line were the sentinel values. We were filling our memory bandwidth and caches with junk -1s.

Now, when a system pulls m_signatures[some_entity], all adjacent values are signatures of other entities. Our system, or some other system, may need those values soon, so we've eliminated even more cache misses.

Flexibility

Signatures also give us more flexibility. Firstly, when a system needs to join 3 or more components, we've eliminated some of the lookups - it's one check in m_signatures rather than two or more checks in different m_sparse arrays.

Secondly, not all of our data access patterns start by iterating over a component pool. Sometimes, we just have an entity, and we want to know which components it has. That's much easier if the entity has a signature.

Benchmarking Signatures

Let's add a benchmark to compare this new technique to our previous efforts:

benchmarks/main.cpp

1// ...
2
3static void BM_Bitmask(benchmark::State& state) {
4  int n = state.range(0);
5
6  EntityStorage entities;
7  ProximityStorage prox;
8  AudioStorage audio;
9
10  std::mt19937 rng(42);
11  std::uniform_int_distribution<int> dist(0, 9);
12
13  uint8_t BIT_PROX = 1 << 0;
14  uint8_t BIT_AUDIO = 1 << 1;
15  uint8_t MASK_BOTH = BIT_PROX | BIT_AUDIO;
16
17  for (int i = 0; i < n; ++i) {
18    entities.Add("Entity");
19
20    if (dist(rng) == 0) {
21      prox.Add(i, 0, 0, 0);
22      entities.RegisterComponent(i, BIT_PROX);
23    }
24    if (dist(rng) == 0) {
25      audio.Add(i, 0, 0, 0);
26      entities.RegisterComponent(i, BIT_AUDIO);
27    }
28  }
29
30  bool drive_audio = audio.m_volume.size() < prox.m_distance.size();
31
32  for (auto _ : state) {
33    int count = 0;
34
35    // We still drive with the smallest set
36    if (drive_audio) {
37      for (size_t i = 0; i < audio.m_volume.size(); ++i) {
38        int entity_id = audio.m_map.m_dense[i];
39        uint8_t signature = entities.m_signatures[entity_id];
40        if ((signature & MASK_BOTH) == MASK_BOTH) {
41          count++;
42        }
43      }
44    } else { // Driver: Proximity
45      for (size_t i = 0; i < prox.m_distance.size(); ++i) {
46        int entity_id = prox.m_map.m_dense[i];
47        uint8_t signature = entities.m_signatures[entity_id];
48        if ((signature & MASK_BOTH) == MASK_BOTH) {
49          count++;
50        }
51      }
52    }
53    benchmark::DoNotOptimize(count);
54  }
55}
56
57// ...
58
59BENCHMARK_CONFIG(BM_Bitmask);

In an isolated benchmark testing a simple two-component join in a pre-warmed cache, we won't get much benefit from signatures, but we should at least ensure we haven't made things worse.

Any improvement will be a result of the smaller size of signatures, which allows larger working sets to fit within a cache:

1----------------------------------
2Benchmark                      CPU
3----------------------------------
4BM_NaiveScan/100000       0.150 ms
5BM_NaiveScan/1000000       1.51 ms
6BM_NaiveScan/10000000      14.4 ms
7BM_SmallestSet/100000     0.010 ms
8BM_SmallestSet/1000000    0.137 ms
9BM_SmallestSet/10000000    2.58 ms
10BM_Bitmask/100000         0.009 ms
11BM_Bitmask/1000000        0.126 ms
12BM_Bitmask/10000000        1.32 ms

Implementing the `ProximityAudioSystem`

Now that we know the benefits of each approach, we can implement our basic ProximityAudioSystem. There's a lot of ugly code here to manage the iteration. We'll improve it in the next lesson, but the important point for now is to understand the algorithm. The key steps are:

Identify the smallest set, which will become the driver of the loop. Let's assume it's the AudioStorage.
Iterate the AudioStorage, traversing its dense array to find each audio component's associated entity
Inspect that entity's signature to see if it also has a proximity component
If it does, traverse the ProximityStorage's sparse array to get that component
Gather the data we need from both components and process it

dsa_app/ProximityAudioSystem.h

1#pragma once
2#include <dsa/EntityStorage.h>
3#include <dsa/ProximityStorage.h>
4#include <dsa/AudioStorage.h>
5#include <print> // C++23
6
7class ProximityAudioSystem {
8public:
9  void Process(std::string_view name, float distance, float volume) {
10    std::print(
11      "Processing audio for {} (distance = {}, volume = {})\n",
12      name, distance, volume
13    );
14  }
15
16  void Update(
17    EntityStorage& entities,
18    ProximityStorage& prox,
19    AudioStorage& audio
20  ) {
21    // STEP 1: Identify smallest set
22    bool drive_audio = audio.m_volume.size() < prox.m_distance.size();
23
24    // Set required components - assuming 1=Prox, 2=Audio
25    uint8_t required_mask = (1 << 0) | (1 << 1);
26
27    if (drive_audio) {
28      for (size_t i = 0; i < audio.m_volume.size(); ++i) {
29        // STEP 2: What entity am I attached to?
30        int id = audio.m_map.m_dense[i];
31        
32        // STEP 3: Does the entity also have a proximity component?
33        uint8_t signature = entities.m_signatures[id];
34        if ((signature & required_mask) == required_mask) {
35          // STEP 4: It does, so get the component id
36          int prox_idx = prox.m_map.m_sparse[id];
37
38          // STEP 5: Gather the data we need to process
39          // Data from EntityStorage
40          std::string_view name = entities.m_names[id];
41
42          // Data from ProximityStorage
43          float distance = prox.m_distance[prox_idx];
44
45          // Data from AudioStorage
46          float volume = audio.m_volume[i];
47
48          Process(name, distance, volume);
49        }
50      }
51    } else {
52      // Same process, but using proximity as the driver
53      for (size_t i = 0; i < prox.m_distance.size(); ++i) {
54        int id = prox.m_map.m_dense[i];
55        uint8_t signature = entities.m_signatures[id];
56
57        if ((signature & required_mask) == required_mask) {
58          int audio_idx = audio.m_map.m_sparse[id];
59
60          std::string_view name = entities.m_names[id];
61          float distance = prox.m_distance[i];
62          float volume = audio.m_volume[audio_idx];
63          Process(name, distance, volume);
64        }
65      }
66    }
67  }
68};

An example usage of our current system is below, but we'll improve this API soon:

dsa_app/main.cpp

1#include <dsa/EntityStorage.h>
2#include <dsa/ProximityStorage.h>
3#include <dsa/AudioStorage.h>
4#include "ProximityAudioSystem.h"
5
6int main() {
7  EntityStorage entities;
8  ProximityStorage proximity;
9  AudioStorage audio;
10  ProximityAudioSystem system;
11
12  uint8_t BIT_PROX = 1 << 0;
13  uint8_t BIT_AUDIO = 1 << 1;
14
15  // Create an entity
16  int alice = entities.Add("Alice");
17  proximity.Add(alice, 1.0f, 2.0f, 3.0f);
18  entities.RegisterComponent(alice, BIT_PROX);
19  audio.Add(alice, 4.0f, 5.0f, 6.0f);
20  entities.RegisterComponent(alice, BIT_AUDIO);
21
22  // Close but no audio
23  int bob = entities.Add("Bob");
24  proximity.Add(bob, 1.0f, 2.0f, 3.0f);
25  entities.RegisterComponent(bob , BIT_PROX);
26
27  // Audio but not close
28  int charlie = entities.Add("Charlie");
29  proximity.Add(charlie, 1.0f, 2.0f, 3.0f);
30  entities.RegisterComponent(charlie , BIT_AUDIO);
31
32  // Tick the system
33  system.Update(entities, proximity, audio);
34}

1Processing audio for Alice (distance = 1, volume = 4)

Complete Code

A complete version of the example from this lesson is available below:

Files

dsa_core

benchmarks

dsa_app

Select a file to view its content

Further Optimizations

The combination of the smallest set driver and bitmask signatures creates a very efficient query mechanism. It minimizes loop iterations and reduces cache misses significantly.

However, there are techniques we can use to remove lookups entirely.

Caching

Not every system needs to work with the very latest data, and this means we don't need to calculate our set intersection every time our system runs. If an entity has audio enabled, they're likely to have audio enabled for a very long time.

Their proximity might change more frequently, but it's still not something we need to check every few milliseconds.

In these scenarios, we can cache the query result. Instead of iterating over the AudioStorage and checking bitmasks every frame, our system can just maintain a a std::vector<int> that contains the specific list of IDs that possess both Audio and Proximity.

Perhaps it recalculates this list every few seconds instead of every few milliseconds. If the system is designed not to always require the latest data, that gives us additional opportunities to move that data processing off the main thread and into a background task using .

Events

There is another way we can keep our storage up to date - we can use events or observers. We can design our storage containers to emit events such as OnComponentAdded() or OnComponentRemoved().

For components that rarely get removed, our ProximityAudioSystem can just monitor these events instead of constantly rechecking which nearby entities still have an audio component.

This moves the cost of the search from the "update" loop (which runs every frame) to the add/remove events (which happen rarely).

Data Layout

There is a more heavy-handed version of this where we group our related entities together. If we arrange our data such that all the entities that have both a proximity component and an audio component are grouped contiguously, our ProximityAudioSystem just needs to iterate between those two indices - no lookups required.

The archetype pattern is this approach taken to its logical conclusion. Our entities get split into different collections based on their signatures. For example:

Archetype A stores all entities that have only Audio and Proximity.
Archetype B stores all entities that have only Audio and Physics.
Archetype C stores all entities that have Audio, Proximity, and Physics.

Our ProximityAudioSystem then just iterates over Archetype A and C's tables, without needing to check each entity.

This creates perfectly linear iteration, at the expense of some engineering effort to set up, and additional overhead to maintain this layout when adding or removing a component changes a signature.

High-budget projects typically combine these patterns - a layout optimized around performance-critical components that rarely get removed, with lookup-based techniques for components that are less important or get added and removed often.

Summary

We have built a mechanism to join relational data tables.

Smallest Set Driver: Always iterate the component type with the fewest entities to minimize loop count.
Bitmask Filtering: Use a compact central signature to check for component existence. This avoids polluting the cache with random accesses to sparse arrays for entities that don't match our criteria.
Deferred Lookup: Only pay the cost of looking up the secondary data (via the Sparse Set) once you have confirmed the entity is a valid candidate.

However, the "front end" code we're writing to interact with these systems is incredibly messy. We'd like our component registration to look more like this:

1// Before:
2int alice = entities.Add("Alice");
3proximity.Add(alice, 1.0f, 2.0f, 3.0f);
4entities.RegisterComponent(alice, BIT_PROX);
5audio.Add(alice, 4.0f, 5.0f, 6.0f);
6entities.RegisterComponent(alice, BIT_AUDIO);
7
8// After:
9entities.Add("Alice")
10  .AddComponent<Proximity>(1.0f, 2.0f, 3.0f)
11  .AddComponent<Audio>(4.0f, 5.0f, 6.0f);

And we'd like to replace the monstrous 50-line logic in our ProximityAudioSystem with something that looks like this:

1auto noisy_neighbors = entities.GetView<Proximity, Audio>();
2
3for (auto [entity, prox, audio] : noisy_neighbors) {
4  std::print(
5    "Processing audio for {} (distance = {}, volume = {})\n",
6    entity.name, prox.distance, audio.volume
7  );
8}

We'll implement this over the next few lessons

The Join Algorithm

Starting Point

Files

The Direct Scan

benchmarks/main.cpp

The Smallest Set Driver

benchmarks/main.cpp

The Cache Problem

Bitmasks and Signatures

dsa_core/include/dsa/EntityStorage.h

Physical Size

Temporal Locality

Spatial Locality

Flexibility

Benchmarking Signatures

benchmarks/main.cpp

Implementing the `ProximityAudioSystem`

dsa_app/ProximityAudioSystem.h

dsa_app/main.cpp

Complete Code

Files

Further Optimizations

Caching

Events

Data Layout

Summary

Templatizing Components

Entity-Component-System

The Join Algorithm

Starting Point

Files

The Direct Scan

benchmarks/main.cpp

The Smallest Set Driver

benchmarks/main.cpp

The Cache Problem

Bitmasks and Signatures

dsa_core/include/dsa/EntityStorage.h

Physical Size

Temporal Locality

Spatial Locality

Flexibility

Benchmarking Signatures

benchmarks/main.cpp

Implementing the ProximityAudioSystem

dsa_app/ProximityAudioSystem.h

dsa_app/main.cpp

Complete Code

Files

Further Optimizations

Caching

Events

Data Layout

Summary

Templatizing Components

Implementing the `ProximityAudioSystem`