The Reallocation Spike

The hidden cost of std::vector growth, why it is dangerous for real-time systems, and how to mitigate it.

Ryan McCombe
Published

We still have to deal with one lurking demon. Our SoA is calling push_back() a lot. If we check a reference guide on std::vector::push_back(), we'll see its performance touted as amortized O(1)O(1)

This means that push_back() will have a fast, predictible speed on average. The problem is that in many contexts, we don't care that much about the average. We care more about predictability, meaning the worst case is more important. If an algorithm must always run within 10ms, but it sometimes takes longer, we can't use it.

So, whenever we encounter "amortized O(1)O(1)" we should take it as a warning sign. It means predictable speed... right up until the moment it isn't.

The Mechanics of Growth

So, why is push_back() only O(1)O(1) on average? Under the hood, std::vector manages a contiguous block of memory. It tracks two numbers:

  1. Size: The number of elements currently in the vector.
  2. Capacity: The number of elements the current memory block can hold before it is full.

When Size < Capacity, push_back() is incredibly fast. It is a simple pointer increment and a write. This is the O(1)O(1).

But when Size == Capacity, the array is full. To add one more element, it needs to grow. Since it requires contiguous memory, it cannot just grab the next byte of RAM - that byte is likely owned by something else.

Instead, it performs a reallocation. It asks the OS for a new block of memory, usually 2x the size of the old one. Then, it copies existing element from the old block to the new block. Finally, it inserts the new element and frees the old block of memory.

The Double Whammy

The bigger our collection, the slower this process takes. It is an O(n)O(n) algorithm on paper, but it is a particularly punishing one because it hits the two slowest parts of a computer simultaneously:

  1. Allocation Latency: Asking the OS for a large chunk of memory is a "system call." The CPU has to stop your program, switch to kernel mode, find a free page of physical RAM, map it to your virtual address space, and switch back. This takes thousands of cycles.
  2. Bandwidth Saturation: We have to copy the entire dataset. If you have 10MB of data, you are reading 10MB and writing 10MB. You are flooding the memory bus with traffic, stalling the CPU as it waits for data to move.

If our array's current capacity allows it to store 1,000 objects, and we push_back() the 1,001st, we aren't just paying the cost of creating one object. We are paying the cost of moving the previous 1,000 objects, plus the overhead of the OS allocator.

Reserving Capacity

The best, and most common option to prevent reallocation is to proactively reserve the space we'll need. Often, we're not sure how much that is, so we overallocate (within reason) to a safe upper bound.

This is done via the reserve() method.

std::vector<Player> players;
players.reserve(3000); // Allocate capacity for 3000 items

// These 2000 pushes will NEVER reallocate
for(int i = 0; i < 2000; ++i) {
  players.push_back(Player{...});
}

By reserving, we pay the heavy allocation cost once so that we can have a smooth experience the rest of the time. That heavy allocation is done when the performance isn't as important, such as during the initial startup or as part of a loading screen.

Reserving Capacity in a SoA

In a SoA layout, we have multiple arrays. We usually just add a function to our container that forwards the reserve() call to all of the underlying arrays:

dsa_core/include/dsa/PlayerStorage.h

// ...
class PlayerStorage {
public:
  std::vector<int> m_ids;
  std::vector<int> m_scores;
  std::vector<float> m_health;
  std::vector<std::string> m_names;

  void Reserve(size_t capacity) {
    m_ids.reserve(capacity); 
    m_scores.reserve(capacity); 
    m_health.reserve(capacity); 
    m_names.reserve(capacity); 
  }
  // ...
};

Non-Contiguous Layouts

Sometimes, we cannot predict the future. In a sandbox game or a web server, the number of entities might grow indefinitely. We cannot reserve() effectively because we don't know the upper limit.

If we cannot tolerate the latency spike of a std::vector reallocation, but we need dynamic growth, the only option is abandon the strict requirement of contiguous memory.

Chunked / Paged Storage

The reallocation spike happens because std::vector must maintain a single, unbroken block of memory.

Paged storage such as std::deque in the standard library works by allocating small, fixed-size "pages" or "chunks" of memory. Each chunk is an array but, when the container needs to grow, it does this by allocating additional chunks rather than resizing the current ones.

This strategy completely eliminates the reallocation spike. Since the chunks never move, we never have to copy old elements.

Behind the scenes, the std::deque does some additional bookkeeping to hide these chunks and simulate an array-like experience. We can iterate through a std::deque just like an array, and we can access random elements using the [] operator.

However, it comes with a performance cost. Calculating which chunk an index maps to makes the [] operator slightly more expensive, and iteration is slower as the hardware prefetcher gets surprised every time we jump between chunks.

As with everything, there is no "correct" answer, only trade-offs.

Summary

In this lesson, we looked at the reallocation risks hiding beyond push_back(). Here are the key points:

  1. Amortized vs. Worst Case: Amortized or "average" performance hides the "worst case" spikes that kill real-time experiences.
  2. The Growth Spike: Pushing to a std::vector can result in an expensive allocate, copy, free, process if it requires the array to grow
  3. Reservation: Using reserve() is the easiest way to prevent this in most scenarios - we pay the allocation once, at a convenient time (usually initialization or during a loading screen).
  4. Paged Alternatives: If reserving isn't possible, breaking memory into chunks eliminates copying at the cost of cache locality.

However, our collection is still just one big blob of data. In the real world, we typically need to quickly retrieve and analyze slices of data. We need to get a count of the number of players who are nearby; the total value of all the transactions that look suspicious; a report on all of the tasks that are incomplete.

Our range-based interface provides an infinitely flexible API for this on the front end:

auto healthy = std::views::filter(
  [](PlayerRef p) { return p.health > 100; }
);

for (PlayerRef p : players | healthy) {
  // ...
}

However, views are designed for iterating through our data one element at a time. For heavy tasks that can span across our entire dataset, we can use approaches that are orders of magnitude more efficient. These approaches can also incorporate the multithreading techniques we covered previously. We'll see how to do that in the next chapter.

Have a question about this lesson?
Answers are generated by AI models and may not be accurate