The Mechanics of Linked Lists

An introduction to node-based data structures. The physical reality of std::list, pointer swapping, fast insertions, and why pointer chasing destroys performance.

Ryan McCombe
Published

With our memory foundation secure, we are ready to transition away from flat, contiguous arrays and begin building structural layouts. We are moving into the world of node-based data structures.

With a contiguous array, the physical arrangement dictates the logical order. If we had a collection of sorted characters, A, B, and C, they would sit right next to each other, in that order, in memory.

A linked data structure works differently. The physical location of elements is not related to their logical ordering. Instead, the logical order is dictated by pointers that are stored and managed alongside the key data.

In this lesson, we will dissect the simplest linked structure in existence: the linked list. We will look at its key strengths compared to contiguous structures, how a standard linked list is typically implemented, and some of the problems with that approach. We will then introduce to some of the techniques we'll apply throughout the rest of the chapter to mitigate against those problems.

The Physical Structure of a Node

When we use a std::vector<int>, the standard library allocates a single, contiguous block of memory to hold our integers. Nothing else is required, as the logical ordering of the data can be inferred by physical ordering in memory.

When we use a linked list, the physical location of each object in the collection is relatively meaningless. To express how the data is ordered, we need additional pointers. To set this up, linked list implementations have the concept of a node. Conceptually, a node is an object that contains both the data we're storing for each element in the collection, in addition to the pointers that define the overall structure of the collection.

If we wanted to store a linked list of integers, for example, each node would contain the integer, as well as a pointer to the next node in the collection. The linked list itself can then be as simple as a pointer to the first node in the chain, which is usually called the head of the list:

Singly Linked (std::forward_list)

The previous diagram was an example of a singly linked list. This represents the absolute minimum structural overhead required to maintain a sequence. Every node contains the payload (the primary data we're trying to store) and one pointer, which holds the memory address of the next node in the chain, or a nullptr if there are no further nodes.

The C++ standard library implementation of this is std::forward_list. If we were to strip away the template magic, the physical structure of a singly linked node looks like this:

template <typename T>
struct SinglyNode {
  T Payload;             // Your data
  SinglyNode* Next;      // Memory address of the next node
};

Below, we use the SinglyNode type to create a linked list containing three elements:

// We allocate nodes on the heap and wire their pointers together
SinglyNode<int>* node3 = new SinglyNode<int>{30, nullptr};
SinglyNode<int>* node2 = new SinglyNode<int>{20, node3};
SinglyNode<int>* Head  = new SinglyNode<int>{10, node2};

// In real code, we must also delete these to avoid leaks

Wrappers like std::forward_list hides these intermediate node objects and these mechanical steps. They expose a simple public interface that makes adding, removing, and iterating through the objects in our collection much easier. Some of the key methods are shown below:

#include <forward_list>
#include <iostream>

int main() {
  // Create a list for storing int values
  std::forward_list<int> list;

  // Add elements to the front
  list.push_front(30);
  list.push_front(20);
  
  // Construct an element in-place
  list.emplace_front(10);
  
  // Iterate through the nodes
  for (int value : list) {
    std::cout << value << " ";
  }
  
  // Access the first element
  int value = list.front(); // 10
  
  // Remove the first element
  list.pop_front();
  
  value = list.front(); // 20
  
  // Wipe the entire chain
  list.clear();
}

However, behind the scenes, because each node only has a Next pointer, we can only move forward. If we are currently looking at node number 50, and we realize we need to check something on node 49, we are out of luck. Unless we remembered where node 49 was, the only way to get back to it was to jump all the way back to the Head pointer (the start of the list) and manually walk forward 49 steps.

We can see these limitations if we attempt to iterate through a std::forward_list in reverse order:

#include <forward_list>
#include <ranges>
#include <iostream>

int main() {
  std::forward_list<int> list{10, 20, 30};

  for (int value : std::views::reverse(list)) { 
    std::cout << value << " ";
  }
}
error: std::forward_list does not satisfy bidirectional_range

Doubly Linked (std::list)

To solve the traversal problem, we can add more metadata. A doubly linked list node contains two pointers: one pointing forward, and one pointing backward.

This is the implementation used by the std::list container. Mechanically, it looks like this:

template <typename T>
struct DoublyNode {
  T Payload;
  DoublyNode* Prev; 
  DoublyNode* Next;
};

Now, we can traverse our collection in either direction:

#include <list>
#include <ranges>
#include <iostream>

int main() {
  std::list<int> list{10, 20, 30};

  for (int value : std::views::reverse(list)) {
    std::cout << value << " ";
  }
}
30 20 10

The Advantages of Linked Structures

We previously saw how contiguous structures like std::vector give us the best possible iteration speed, but manipulating their structure is relatively expensive.

  • If we want to add an element to an array, we need to physically shift everything that comes after that element to make space. Even if we add an element to the end, that might take us over our capacity, leading to a massive
  • If we want to remove an element, we need to shift everything that comes after to fill the gap, or perform a operation that changes the logical order of the remaining elements
  • If we want to sort the collection, that's a particularly heavy, physical operation

In addition, all of this physical movement invalidates pointers, adding more engineering overhead or risk of bugs.

The primary benefit of linked data is their algorithmic efficiency and stability when modifying the structure. Elements rarely need to be physically moved - we can change the logical structure of the collection simply by changing where the pointers point to.

The most powerful example of this is splicing, where large sequences of nodes can be taken out of one list and placed into another in constant O(1)O(1) time with no physical movement - all it takes is some pointer updates.

The size() Caveat

Traditionally, determining how many elements are in a linked list was an O(N)O(N) operation - we needed to walk through all the nodes, tracking how many steps it takes until we reach the node with the null Next value. Alternatively, if we want the size of our collection to be available faster, we could proactively keep track of how many elements are in our list, updating a size variable every time elements are added or removed.

However, we sometimes won't know how many elements we're adding or removing. We may just have two pointers or iterators representing the first and last nodes of a chain we want to insert. In that scenario, the container must walk the range to know how the size of our collection will be affected.

When designing our own containers, we can set our own preferences here - we can let size() be an O(N)O(N) algorithm that traverses the list, or we can make insertions and deletions slightly slower to ensure a size variable is always up to date and immediately available.

From C++11 onwards, the specification requires that size() must be O(1)O(1) for all standard containers, so std::list chooses the latter option. This most notably affects its splice() method. The simple pointer updates to move a range of nodes between lists is an O(1)O(1) operation, but updating the size of the lists requires an O(N)O(N) scan over the range to count how many nodes are being transferred.

The Disadvantages of Linked Structures

While it is conceptually elegant that linked lists do not physically move elements when they are added or removed, this design ignores the physical reality of the machine. The CPU does not care about logical elegance; it cares about physical data layout.

Because the elements of a standard linked list are allocated individually on the global heap, they are scattered randomly across the machine's RAM. To understand why this is a massive problem, we need to examine the actual memory footprint of a single node and how the CPU attempts to read that scattered memory.

The Tax of the Pointers

On a modern 64-bit CPU, every pointer consumes exactly 8 bytes of physical RAM. Let's look at the true physical size of a std::list<int>. Our payload is a standard 32-bit integer, which typically consumes 4 bytes. We have a Prev pointer (8 bytes) and a Next pointer (8 bytes).

But we are not done. Remember the strict CPU alignment rules we discussed when building our Because our struct contains 8-byte pointers, the compiler will force the entire struct to align to an 8-byte boundary.

It will inject 4 bytes of dead padding into our node to ensure the subsequent pointers land on safe memory addresses.

struct ListNode {
  int Payload;     // 4 bytes
  // Padding       // 4 bytes injected by compiler 
  ListNode* Prev;  // 8 bytes
  ListNode* Next;  // 8 bytes
};

Each 4 byte integer in our collection now requires 24 bytes, but we're still not done. To help with bookkeeping, the global allocator also allocates the hidden chunk headers we discussed .

These chunk headers are rarely smaller than 16 bytes, so we're now up to 90% waste - allocating 40 bytes to store a 4 byte integer.

The Hardware Prefetcher

As we covered in the initial chapter, the CPU is orders of magnitude faster than the main RAM. To keep the CPU fed, the hardware relies on the L1 cache and the hardware prefetcher.

When we iterate through a std::vector, the memory accesses are perfectly linear (Buffer + 0, Buffer + 4, Buffer + 8). The prefetcher recognizes this pattern instantly. It runs ahead of our code, pulling 64-byte chunks of data from main RAM into the L1 cache before the CPU even asks for them.

When we iterate through a linked structure, we execute a completely different mechanical pattern.

void Traverse(DoublyNode* current) {
  while (current != nullptr) {
    Process(current->Payload);

    // The CPU stall happens here
    current = current->Next; 
  }
}

Think about how the CPU actually executes current = current->Next. The CPU cannot ask for the next node until it has finished reading the current node to find out what the Next memory address actually is.

Because each node was likely allocated at a random time by the global heap allocator, they are physically scattered across gigabytes of virtual memory. The hardware prefetcher is completely blind - it cannot predict where the next read will occur.

When the CPU executes current->Next, it looks in the L1 cache. It misses. It looks in L2. It misses. It is forced to halt the execution pipeline entirely, reach all the way out to main RAM, suffer a massive latency penalty, and fetch the cache line containing the new node.

It resumes, processes the payload, and then immediately hits the next pointer. It misses again. This is called pointer chasing, and we are forcing the CPU to wait on the physical latency of RAM chips over and over again.

Optimizing Linked Structures

So, do we throw linked lists in the trash? Absolutely not. There are complex systems, such as operating system task schedulers and game engine scene graphs, where physical shifting is either structurally impossible or fundamentally too expensive.

The solution is not to abandon the linked list - the solution is to rescue it from the global allocator. In this lesson, we assumed that each node was randomly allocated by new. But we just spent an entire chapter learning how to build our own memory allocators.

In the rest of this chapter, we'll apply and build on these techniques to drastically reduce the cost of using linked structures whilst maintaining their flexibility.

Summary

In this lesson, we introduced the basics of node-based structures. Here are the key points:

  1. A node acts as a structural envelope, carrying our payload alongside pointer metadata.
  2. Singly linked lists like std::forward_list uses a single Next pointer, minimizing size but restricting us to forward traversal.
  3. Doubly linked lists like std::list uses Prev and Next pointers, increasing size but enabling bidirectional traversal.
  4. Linked lists achieve O(1)O(1) insertions and deletions because modifying the sequence only requires rewiring local pointers, not physically shifting surrounding data.
  5. Because data never physically moves, pointers to list elements are stable, remaining valid until explicitly deleted.
  6. Standard linked lists are often catastrophically slow due to the increased memory footprint of the pointers and the latency of chasing those pointers. Random heap allocations destroy spatial locality, causing continuous CPU cache misses.

We have established that the architecture of a linked list is useful, but the global heap implementation is fatal to performance.

In the next lesson, we will combine our knowledge of custom allocators and node mechanics. We will ditch the 64-bit heap pointers entirely, replace them with tiny integer indices, and build a fast, cache-friendly index-based linked list.

Next Lesson
Lesson 48 of 49

Index-Based and Pool-Allocated Lists

Replace 64-bit pointers with lightweight indices and back our linked list with a contiguous memory pool to maximize performance.

Have a question about this lesson?
Answers are generated by AI models and may not be accurate