The Iterator Pattern

The bridge between data layout and algorithms. Learn how iterators abstract memory traversal and the hierarchy of capabilities.

Ryan McCombe
Published

To understand why iterators exist, we have to look at the problem they solve. In software engineering, we deal with two primary categories of components:

Containers such as std::vector, std::list, and std::array define how data is stored.

Algorithms such as such as std::sort(), std::find(), and std::count() define how data is processed.

Let's imagine we wanted to write a new algorithm for our program. We would need to know exactly how to traverse every container the algorithm might want to support.

For a std::vector, we would need to use an integer index [i]. For a std::list, we would follow next pointers. For more complex containers, the process would look totally different again.

We might even want to write our own container in the future, but we'd have the immediate problem where all of the existing algorithms aren't compatible with it.

The iterator pattern solves this problem. Iterators act as a universal adapter: the container's job is to provide an iterator; the algorithm's job is to consume that iterator.

Under this pattern, we can extend our functionality and storage in a cross-compatible way:

  • If we implement new functionality as an iterator-based algorithm, it's immediately compatible with all containers that provide an iterator.
  • If create a new container and include functions to provide an iterator, our container is immediately compatible with all iterator-based algorithms.

Pointer Arithmetic

Before we jump into iterators, let's first cover how we can move around a contiguous container like a std::vector. In C++, pointer types like int* support arithmetic operators like ++, +, and -. Using these operators on pointers is called pointer arithmetic.

Stepping (++)

Physical RAM is essentially a massive, linear array of bytes. A pointer is just an index into that array.

When you have a pointer to an integer (int* ptr) and you increment it (ptr++), the CPU doesn't just add 1 to the address. It adds sizeof(int) (typically 4 bytes). It steps to the next logical element.

Jumping (+ N)

Contiguous memory allows us to do something more powerful than stepping - we can jump.

If we want to access the 5th element, we don't need to step through the first 4. We can calculate the exact address instantly:

Target Address=Current Address+(5×Size)\text{Target Address} = \text{Current Address} + (5 \times \text{Size})

In C++, this is expressed as ptr + 5. This is a constant time O(1)O(1) operation. It costs the CPU the exact same amount of time to jump forward 5 elements as it does to jump forward 5,000,000.

The Index Operator []

We rarely perform pointer arithmetic directly. Instead, we use the [] operator, which abstracts this away for us. When we write arr[5], the compiler translates it to *(arr + 5) behind the scenes.

#include <iostream>
#include <vector>

int main() {
  std::vector<int> v{10, 20, 30, 40};

  // Get a raw pointer to the start of the array
  int* ptr = v.data();

  // "Stepping"
  ptr++;
  std::cout << *ptr << "\n"; // Output: 20

  // "Jumping"
  int* target = ptr + 2;
  std::cout << *target << "\n"; // Output: 40

  // Syntactic Sugar
  // ptr[2] is identical to *(ptr + 2)
  std::cout << ptr[2] << "\n"; // Output: 40
}

The Limitation

This arithmetic works because the array data is laid out contiguously in RAM.

But what about a linked list like std::list? The nodes are scattered randomly across the heap. There is no mathematical relationship between the address of Node A and Node B. You cannot calculate the address of the 5th element; you have to traverse the pointers one by one to find it.

This is where the raw pointer fails us, and where the iterator class takes over. Conceptually, we can think of an iterator as being a pointer that is associated with some container. Just like with raw pointers, we use the unary * operator to access the value that the iterator is currently pointing at:

int ValueA{*SomePointer};
int ValueB{*SomeIterator};

However, in addition to pointing at something, iterators also provides a unified interface to move around within the container they're associated with.

The iterators that the std::vector type provides are effectively a wrapper around a raw pointer. When we use the iterator's ++ operator, that just calls ++ on the underlying pointer.

For a std::list, the iterator that it provides is a class that holds a pointer to the current node, and when we call ++, it follows the next pointer to find the new address.

A Basic Example

Having different operations accessible using the same syntax (++) gives us a unified interface. It lets us tell an iterator "move to the next element in whatever container you're traversing - I don't care what type of container it is". This lets us create algorithms in a container-agnostic way.

Let's write some code that works with both std::vector and std::list. By convention, a container that supports iterators implements the begin() method, which returns an iterator that is initially pointing to the first element in the collection.

Both std::vector and std::forward_list implement this convention. So, we can work with both containers using code that looks identical, even though the underlying mechanics are different:

#include <iostream>
#include <vector>
#include <forward_list>

int main() {
  std::vector<int> vec{10, 20, 30, 40};
  std::forward_list<int> list{10, 20, 30, 40};

  auto vecIterator{vec.begin()};
  std::cout << *vecIterator << ", "
            << *(++vecIterator) << ", "
            << *(++vecIterator) << "\n";
  
  auto listIterator{list.begin()};
  std::cout << *listIterator << ", "
            << *(++listIterator) << ", "
            << *(++listIterator) << "\n";
}
10, 20, 30
10, 20, 30

When we combine this iterator pattern with templates, we can now write algorithms that work across different container types. Our logFirstThree() function works with arrays, linked lists, and any other type that follows the convention of returning an iterator from a method called begin():

#include <iostream>
#include <vector>
#include <forward_list>

// This algorithm assumes the container has at least 3 items
template <typename ContainerType>
void logFirstThree(ContainerType c) {
  auto It = c.begin();
  std::cout << *It << ", "
            << *(++It) << ", "
            << *(++It) << "\n";
}

int main() {
  std::vector<int> vec{10, 20, 30, 40, 50};
  std::forward_list<int> list{10, 20, 30, 40, 50};

  logFirstThree(vec);
  logFirstThree(list);
}
10, 20, 30
10, 20, 30

The Half-Open Range: [begin, end)

If an iterator represents a position, how do we represent a collection? We need two positions: a start and a finish.

In C++, we use the half-open range convention, denoted as [begin, end). This notation indicates that the collection ranges from begin to end, including begin, but not including end.

  • begin(): Points to the first element in the container.
  • end(): Points to the memory address one step past the last element.

This end() iterator is a sentinel. It does not point to valid data. We must never dereference it - it exists solely to act as a "stop" sign.

This design choice solves two problems - detecting empty ranges, and determining when to terminate a loop.

Empty Ranges

If a container is empty, begin() is equal to end(). We don't need a separate isEmpty check or special logic for null pointers:

#include <iostream>
#include <vector>
#include <forward_list>

template <typename ContainerType>
void complainIfEmpty(ContainerType c) {
  if (c.begin() == c.end()) {
    std::cout << "That is empty\n";
  }
}

int main() {
  std::vector<int> vec;
  std::forward_list<int> list;

  complainIfEmpty(vec);
  complainIfEmpty(list);
}
That is empty
That is empty

Loop Termination

Secondly, it allows us to write loops that naturally terminate when we hit the sentinel. When working with iterators, the following pattern is the canonical "raw loop":

// Iterate until 'it' hits the sentinel
for (auto it = v.begin(); it != v.end(); ++it) {
  std::cout << *it << "\n";
}

We can update our previous logFirstThree() algorithm to now safely log everything in the container:

#include <iostream>
#include <vector>
#include <forward_list>

template <typename ContainerType>
void logEverything(ContainerType c) {
  auto It = c.begin();
  auto End = c.end();
  while (It != End) {
    std::cout << *It << ", ";
    ++It;
  }
  std::cout << '\n';
}

int main() {
  std::vector<int> vec{10, 20, 30, 40};
  std::forward_list<int> list{10, 20, 30, 40};

  logEverything(vec);
  logEverything(list);
}
10, 20, 30, 40, 
10, 20, 30, 40,

The Range-Based For Loop

We have established that the standard, safe pattern for iterating through a container is to start at begin() and increment until we hit end().

This pattern is so fundamental that the language provides a dedicated syntax for it: the range-based for loop.

std::vector<int> v{10, 20, 30};

// Before: Raw loop
for (auto it = v.begin(); it != v.end(); ++it) {
  std::cout << *it << "\n";
}

// After: The "range-based" syntax
for (int value : v) {
  std::cout << value << "\n";
}

This capability is not restricted to the standard library containers - any container that implements appropriate begin() and end() methods becomes usable in this way.

This syntax is purely syntactic sugar. It does not introduce any new runtime behavior. When the compiler sees a range-based loop, it mechanically rewrites it into the raw iterator loop we looked at previously. It invokes begin() and end() methods on the value we provide - the std::vector called v, in this example - and throws a compilation error if the type doesn't have those methods.

Avoiding Copies

There is an important performance trap with these loops. By default, objects are copied into the loop body. This is fine if we're iterating over a collection of simple types like our int example above.

However, if the container contains objects that are expensive to copy, we should be more cautious. We rarely need to copy the objects we're working with so, to avoid this, we can combine the range-based loop with references:

std::vector<std::string> names{"Alice", "Bob", "Charlie"};

// Expensive: Copies each string into 'name'
for (std::string name : names) { /* ... */ }

// Better: 'name' is a reference to the data inside the vector
for (std::string& name : names) { /* ... */ }

// Best (for reading): Read-only reference
for (const std::string& name : names) { /* ... */ }

By using const auto&, we get the best of both worlds: the readability of the range-based loop and the zero-overhead access of the raw iterator.

The Hierarchy of Capabilities

Not all containers are created equal. A std::vector lets us jump between elements; a std::list only allows us to step.

If we wrote an algorithm that required jumping (like binary search), it wouldn't work on a linked list. To handle this, C++ categorizes iterators into a hierarchy of capabilities.

Algorithms explicitly state what kind of iterator they need. If you try to pass a std::list iterator to std::sort(), the compiler will generate an error. This is because the sort algorithm is implemented to require a contiguous or random access iterator (the ability to jump around) but the std::list iterator only provides a bidirectional iterator (the ability to move forward or backwards one step at a time).

Below, we've listed the primary categories of iterator, from weakest to strongest. These categories are hierarchical - each new category adds new capabilities, but it also includes the capabilities of the previous categories:

1. Input / Output Iterator

These are the weakest iterators. They may only be single-pass. You can read from them once, but if you increment them, the old value may be gone forever. These are generally associated with low-level data streams, such as values coming from a sensor.

2. Forward Iterator

These support multi-pass algorithms. You can read a value, increment the iterator, and keep a copy of the old iterator to read the old value again later. However, you can only move forward (++). The iterators from a std::forward_list (singly linked list) are in this category.

3. Bidirectional Iterator

These add the ability to move backwards (--). Doubly linked lists like std::list provide these iterators.

4. Random Access Iterator

These add the ability to jump (+ N, - N) and quickly calculate distance (it_a - it_b). They support the array index operator []. The most notable example of a random access container in the standard library is std::deque.

Iterators from std::vector and std::array are also random access, but they go beyond that - they are contiguous iterators.

5. Contiguous Iterator (C++20)

This is the strongest guarantee. It adds the physical guarantee that the elements are adjacent in RAM. In most practical cases, an algorithms might check for this to determine if it can use raw memory techniques like std::memcpy() or std::memset() for larger performance gains.

Aside: Iterator Concepts

As a brief detour, C++20 concepts are available for these iterator categories. Don't worry if this section doesn't make sense - it is not important to the rest of the course, and is safe to skip.

With concepts, we're just specifying the requirements of our template types in a way that the compiler can programatically check. This allows it to provide more useful feedback if someone tries to pass unexpected arguments to our algorithms - like passing a std::string instead of an iterator.

Our algorithm is performing a single pass through our data, so our first template type needs to be at least an input iterator. Our second template type needs to be a valid sentinel for that first type:

#include <iostream>
#include <vector>
#include <forward_list>
#include <iterator> // For Concepts 

template <
  std::input_iterator Iterator,
  std::sentinel_for<Iterator> Sentinel
>
void logEverything(Iterator It, Sentinel End) {
  while (It != End) {
    std::cout << *It << ", ";
    ++It;
  }
  std::cout << '\n';
}

int main() {
  std::vector<int> vec{10, 20, 30, 40};
  std::forward_list<int> list{10, 20, 30, 40};

  logEverything(vec.begin(), vec.end());
  logEverything(list.begin(), list.end());
}

Remember, these iterator categories are hierarchical. A more powerful iterator includes all the capabilities of less powerful iterators.

For example, the forward iterator provided by a std::forward_list is both a forward iterator and an input iterator. The iterator provided by a std::vector is a contiguous iterator, an input iterator, and everything in between.

The std::iter_value_t Type Trait

If we want to get the type that an iterator points to, we can use the std::iter_value_t type trait.

In our example, both instantiations of logEverything() have an Iterator type that traverses through a container of int values, so std::iter_value_t<Iterator> will be int in both cases.

Below, we use this type trait with a custom Streamable concept to require that the provided iterator is pointing to a type that is compatible with the std::cout << *It << ", " expression our algorithm uses:

template <typename T>
concept Streamable = requires(std::ostream& os, T value) {
  { os << value } -> std::convertible_to<std::ostream&>;
};

template 
  std::input_iterator Iterator,
  std::sentinel_for<Iterator> Sentinel
>
requires Streamable<std::iter_value_t<Iterator>>
void logEverything(Iterator It, Sentinel End) {
  while (It != End) {
    std::cout << *It << ", ";
    ++It;
  }
  std::cout << '\n';
}

We won't use concepts or similar techniques in our code samples in the rest of the course, as they add a lot of noise that can drown out the key points. However, in real world use cases, constraining templates and being explicit about their requirements is generally recommended.

Generic Navigation

When writing generic algorithms, we often run into a syntax problem. If we want to move an iterator forward by 5 steps, the syntax depends on the iterator type.

  • Vector (Random Access): it + 5 (Valid)
  • List (Bidirectional): it + 5 (Compiler Error)

The list iterator class simply does not define operator+(). To solve this, the standard library provides a helper function: std::advance(it, n).

#include <vector>
#include <list>
#include <iterator> // for std::advance 

template <std::input_iterator Iterator>
void MoveForwardFive(Iterator& it) {
  // This works for ANY iterator type
  std::advance(it, 5);
}

int main() {
  std::vector<int> v{1, 2, 3, 4, 5, 6};
  auto v_it = v.begin();
  MoveForwardFive(v_it); // v_it now points to 6

  std::list<int> l{1, 2, 3, 4, 5, 6};
  auto l_it = l.begin();
  MoveForwardFive(l_it); // l_it now points to 6
}

Under the Hood: Tag Dispatching

The std::advance() utility uses a technique called tag dispatching (or if constexpr in modern C++) to inspect the iterator's capabilities at compile time and choose the best implementation.

Behind the scenes, std::advance() might look like this:

template <typename Iterator>
void advance(Iterator& it, int n) {
  if constexpr (is_random_access_v<Iterator>) {
    // Fast path: Just do math
    // Complexity: O(1)
    it += n;
  } else {
    // Slow path: Step one by one
    // Complexity: O(n)
    while (n > 0) {
      ++it;
      --n;
    }
  }
}

Or using C++20 concepts:

template <std::random_access_iterator Iterator>
void advance(Iterator& it, int n) {
  // Fast path: Just do math
  // Complexity: O(1)
  it += n;
}

template <std::input_iterator Iterator>
void advance(Iterator& it, int n) {
  // Slow path: Step one by one
  // Complexity: O(n)
  while (n > 0) {
    ++it;
    --n;
  }
}

The abstraction allows your code to be correct for all containers. But it does not make the underlying operation fast. If you call std::advance(it, 1000) on a std::vector, it compiles to a single addition instruction.

If you call it on a std::list, it compiles to a loop that runs 1,000 times, chasing pointers and stalling on cache misses. The abstraction hides the syntax difference, but it cannot hide the hardware reality.

Modern Helpers

In addition to std::advance(), the standard library <iterator> header provides std::next() and std::prev(). These are often preferred in modern C++ because they return a new iterator rather than modifying the existing one, which can lead to cleaner code in expressions.

auto it = v.begin();

// Old way (modifies 'it')
std::advance(it, 2);
DoSomething(it);

// Modern way (leaves 'it' alone)
DoSomething(std::next(it, 2));

There is also std::distance(start, end), which calculates the number of hops between two iterators. Like advance(), this is O(1)O(1) for vectors (simple subtraction) and O(n)O(n) for lists (counting steps).

#include <iterator> // for std::size, std::distance 

template <typename ContainerType>
void getSize(ContainerType c) {
  // This works if ContainerType has a .size() method:
  auto size = c.size();
  
  // Alternative - still requires .size():
  size = std::size(c);
  
  // This works with any iterator pair:
  size = std::distance(c.begin(), c.end());
}

Summary

The iterator pattern is the glue that holds the containers and algorithms together together.

  1. Decoupling: By agreeing on a standard interface for traversal, we can write algorithms that work on any container.
  2. Hardware Mimicry: Iterators are designed to look and feel like pointers (*, ++), efficiently mapping to CPU instructions where possible.
  3. The Sentinel: The [begin, end) half-open range simplifies loop logic and empty state handling.
  4. Capabilities: Not all memory can be traversed equally. The hierarchy of capabilities (eg Forward -> Bidirectional -> Random Access) allows algorithms to demand specific features or optimize their approach based on container capabilities.
  5. Generic Utilities: Helpers like std::advance() allow us to write generic code that automatically selects the most efficient instruction sequence for the given data structure.

However, using iterators directly can be verbose. Managing begin() and end() pairs manually is prone to errors, and composing complex transformations (like filtering a list and then squaring the numbers) is difficult with standard iterators.

In the next lesson, we will introduce C++20 Ranges. We will see how this modern evolution of the iterator pattern allows us to compose powerful, lazy-evaluated data pipelines that are both more readable and safer than raw iterator manipulation.

Next Lesson
Lesson 2 of 5

Standard Library Algorithms

An introduction to the C++ Standard Library's algorithms, lambda expressions, and handling memory safely with insertion iterators.

Have a question about this lesson?
Answers are generated by AI models and may not be accurate