The Swap-and-Pop Idiom
Comparing the performance cost of deletion techniques, and implementing the Swap-and-Pop approach in a Structure of Arrays container.
Eventually, a player in your game will disconnect. An entity in your simulation will be destroyed. A transaction in your ledger will be cancelled. When this happens, we need to remove the data from our container.
If we were using std::vector in a standard way, we would reach for erase(). But as we are about to discover, erase() is is an extremely expensive operation. It prioritizes politeness over performance, preserving the order of elements at the cost of massive memory bandwidth.
In this lesson, we are going to implement deletions. We will learn the Swap-and-Pop idiom - a technique that trades order for raw speed.
The Cost of Politeness
To understand why we need a special technique for deletion, we must look at how std::vector::erase() works physically.
A std::vector is a contiguous block of memory. This contiguity is the source of its power (cache locality), but it is also its greatest weakness. We cannot simply "delete" a slot in the middle of a block of memory. The memory addresses are fixed hardware locations.
To fix the gap and maintain the contiguous nature of the array, we need to take every object to the right of what we we deleted, and slide them one slot to the left.
The Shift
This sliding is exactly what std::vector::erase() does. If you have 1,000,000 items and you delete item #0, the CPU must copy items #1 through #999,999 to their new positions.
This is an operation. It requires reading and writing the entire tail of the vector. For large vectors or large elements, this saturates the memory bandwidth and stalls the CPU.
The standard library does this to be polite. It assumes that the order of your elements matters. If you had a sorted list, or a prioritized queue, preserving the relative order of the remaining elements is important.
But often, we do not care about the order and, in those situations, we can be more efficient.
Using the Swap-and-Pop Idiom
The Swap-and-Pop idiom uses the fact that std::vector supports one type of deletion that is incredibly fast: pop_back().
Removing the last element of a vector is . No data needs to move - the internal size counter is simply decremented behind the scenes.
So, how do we delete an element in the middle?
- We take the element we want to delete (the victim).
- We take the last element in the vector (the replacement).
- We swap (move) the replacement into the victim's slot.
- We call
pop_back()to remove the now-duplicate element at the end.
This turns an memory copy operation into a single move assignment and an integer decrement.
Previous Code
Let's implement this in our PlayerStorage. If you're following along from previous lessons, feel free to continue with your container, but we've provided a simplified version of it below.
To keep things focused on the new things, we've stripped our example back to just a basic SoA with 4 columns and a method to add new entities:
dsa_core/include/dsa/PlayerStorage.h
#pragma once
#include <vector>
#include <string>
class PlayerStorage {
public:
std::vector<int> m_ids;
std::vector<int> m_scores;
std::vector<float> m_health;
std::vector<std::string> m_names;
void AddPlayer(int id, int score, float hp, std::string name) {
m_ids.push_back(id);
m_scores.push_back(score);
m_health.push_back(hp);
m_names.push_back(std::move(name));
}
};Our goal is to add a method: void EraseAt(size_t index).
Implementation in SoA
Implementing this for a std::vector is trivial. Implementing it for our PlayerStorage requires care.
Because our data is split across 4 different vectors, we cannot just swap one of them. If we swap the ID at index 5 but fail to swap the Name at index 5, we have corrupted our data. The entity at index 5 would have the ID of the last player, but the Name of the deleted player.
We beed to perform the transaction atomically (logically speaking) across all columns.
dsa_core/include/dsa/PlayerStorage.h
// ...
class PlayerStorage {
public:
// ...
// Remove the player at the given index in O(1) time.
// WARNING: This changes the order of elements!
void EraseAt(size_t index) {
// Bounds check (optional, but good for safety)
if (index >= m_ids.size()) return;
// 1. Move the last element into the 'hole'
// We use std::move to avoid deep copying strings/arrays
m_ids[index] = std::move(m_ids.back());
m_scores[index] = std::move(m_scores.back());
m_health[index] = std::move(m_health.back());
m_names[index] = std::move(m_names.back());
// 2. Remove the last element
m_ids.pop_back();
m_scores.pop_back();
m_health.pop_back();
m_names.pop_back();
}
};Notice the use of std::move(). For integers like id and score, this is equivalent to a copy. But for a std::string like name, std::move() is a massive optimization.
If we just copied the string, the std::string assignment operator would likely allocate new heap memory and copy the character buffer. With std::move(), we simply steal the pointer from the source string. The string at the end of the vector is left in the dilapidated "moved-from" state, but we immediately destroy it with pop_back() anyway.
We cover std::move() and move semantics in a .
Benchmarking: Shift vs. Swap
Let's quantify the difference. We will set up a benchmark that repeatedly deletes elements from a container.
However, we always need to be mindful when our benchmark is modifying the source data. Our benchmarks run many times before the library reports a result, and we don't want previous runs influencing the measurement of future runs.
We'll reset our data at the start of every test, but we don't want that replenishment time to be included in the benchmark, as that's not what we're testing. So, we can call state.PauseTiming() to pause the clock, refill the vector, and then resume the clock with state.ResumeTiming():
benchmarks/main.cpp
#include <benchmark/benchmark.h>
#include <vector>
#include <dsa/PlayerStorage.h>
struct Player {
int id;
int score;
float health;
std::string name;
};
static void BM_VectorErase(benchmark::State& state) {
int n = state.range(0);
for (auto _ : state) {
// 1. Setup (Not Timed)
state.PauseTiming();
std::vector<Player> v(n);
state.ResumeTiming();
// 2. The Benchmark: Erase from front until empty
// This is O(n^2) complexity total: the O(n) erase
// method being called n times
while (!v.empty()) {
v.erase(v.begin());
}
}
}
static void BM_PlayerStorageErase(benchmark::State& state) {
int n = state.range(0);
for (auto _ : state) {
// 1. Setup (Not Timed)
state.PauseTiming();
PlayerStorage ps;
for(int i=0; i<n; ++i) ps.AddPlayer(i, i, 100.f, "Name");
state.ResumeTiming();
// 2. The Benchmark: Swap-and-Pop from front until empty
// This is O(n) complexity total: the O(1) pop-and-swap
// implementation being called n times
while (ps.m_ids.size() > 0) {
ps.EraseAt(0);
}
}
}
#define BENCHMARK_STD(func) \
BENCHMARK(func) \
->RangeMultiplier(10) \
->Range(100, 10000) \
->Unit(benchmark::kMillisecond)
BENCHMARK_STD(BM_VectorErase);
BENCHMARK_STD(BM_PlayerStorageErase);When you run this benchmark, you will likely see the run time of BM_VectorErase() explode, whilst the pop-and-swap implementation keeps the scaling under control:
--------------------------------------
Benchmark CPU
--------------------------------------
BM_VectorErase/100 0.018 ms
BM_VectorErase/1000 1.68 ms
BM_VectorErase/10000 184 ms
BM_PlayerStorageErase/100 0.001 ms
BM_PlayerStorageErase/1000 0.006 ms
BM_PlayerStorageErase/10000 0.063 msDeleting is Dangerous
When we delete objects from our collection, there are some consequences we need to be mindful of.
Firstly, we should always remember that the swap-and-pop technique creates an unstable iteration order. After using it, the order of our elements has changed.
More problematically, both swap-and-pop and erase() result in index invalidation.
Unstable Iteration Order
When we use erase(), the element at index 5 is deleted, and the element at index 6 slides down to become the new index 5. While the indices shift, the relative order is preserved.
When we use swap-and-pop, the element at index 5 is deleted, and the element at index 999 (the last one) teleports to index 5.
If you are iterating through the container and you delete an element, the order of future elements is unpredictable. You effectively shuffle the deck every time you remove a card.
This is usually acceptable in many contexts, but something we need to be mindful of.
Index Invalidation
This is the more serious problem. Suppose we have an external system - let's say a scoreboard - that keeps track of players in our collection. This can take several forms - perhaps they have an index into our system, or a pointer/reference to a memory location within one of the arrays our system manages.
When we remove elements from our arrays, the indices or pointers those other systems hold may no longer point where they intended.
For the erase() approach, pointers to every element after the erased victim is invalidated, because they've all shifted one slot to the left to fill the gap.
For the pop-and-swap approach, pointers to both the victim and the last element are invalidated.
This is the index invalidation problem. When we move data to keep arrays dense (packed tight), we break any external references that rely on fixed locations. Our next lesson will be focused on ways to deal with this problem.
Summary
In this lesson, we covered the tricky problem of deleting elements.
- Stable Erasing: We learned that
std::vector::erase()is polite, but slow. It preserves the original order of our collection by shifting memory, but shifting memory is an operation. - Swap-and-Pop: We implemented the standard solution for high-performance deletion. By swapping our victim with the last element , we move the gap to the end, where we can pop it off in .
- SoA Maintenance: We reinforced that maintaining a structure of arrays requires keeping the arrays perfectly synchronized. Every operation - add, remove, sort - must apply to all columns simultaneously.
- The Stability Cost: We acknowledged removals break our addressing system. Pointers and indices are no longer stable identifiers.
So, how do we refer to a player safely if their index keeps changing?
Over the next two lessons, we will solve the index invalidation problem by introducing generational indices (also known as handles). We will upgrade our PlayerStorage to create stable, safe references that persist even as the underlying memory churns and shuffles.
Stable Addressing and Tombstones
Solving the index invalidation problem and the trade-offs between memory density and reference stability.