Polynomial Algorithms

Learn how input size affects execution speed and the most common class of algorithms.

Ryan McCombe
Published

In the previous lesson, we looked at the cost of operations. We learned that not all operations are created equal; fetching data from the CPU cache is effectively free, while fetching data from main RAM is incredibly expensive.

Factors like locality and caching determine how expensive each operation is, whilst algorithm analysis focuses how many of those operations are required to complete a task.

Big O notation is a high level classifications of how the time required to complete an algorithm changes based on how much data needs to be handled. It's the answer to questions like: "if the number of users doubles, does my system still perform the same, or does it get slower? And if it gets slower, by how much?"

When we analyze algorithms, this idea of the "amount of data", the "input size", or the "number of objects" an algorithm needs to handle is denoted by the variable nn or NN.

For example, if we want to find a specific user in our collection, the amount of work we need to do might depend on the number of users, nn, that is in our collection.

Constant Time

Most algorithms fall into a category called polynomial. This means the runtime is proportional to nn raised to some power kk. Mathematically, this looks like O(nk)O(n^k). In polynomial time, kk is a constant, like 00, 11, 22, and so on.

Constant time is the gold standard of algorithmic efficiency. It means the execution time is independent of the input size. Whether you have ten elements or ten billion, the code takes the same approximate amount of time to complete.

In our polynomial model, this is n0n^0. Since any number to the power of 00 is 11, we write this as O(1)O(1).

The classic example of this is accessing an element in an array (or std::vector) by its index.

constant_time.cpp

#include <vector>
#include <iostream>

void PrintFirstElement(const std::vector<int>& Data) {
  if (Data.empty()) return;

  // This operation takes the same time regardless
  // of whether Data has 1 element or 1,000,000.
  std::cout << Data[0] << "\n";
}

Why is this constant? Because of the contiguous memory layout we discussed in the intro.

The compiler knows the memory address where the vector starts, and it knows the size of each element. To find the element at index i, it simply performs one calculation:

Address=Start+(i×Size)\text{Address} = \text{Start} + (i \times \text{Size})

This is a single instruction for the CPU. It doesn't need to "search" for the element; it teleports directly to it. This is why we prefer contiguous containers like std::vector whenever possible - they unlock O(1)O(1) access patterns.

Linear Time

Linear time means the work grows in direct proportion to the input. If we double the data, we double the amount of work our algorithm needs to do.

In our polynomial model, this is n1n^1, or simply O(n)O(n).

This is the complexity of "touching every element." If you need to print every player's name, or sum up a list of transaction values, you are doing O(n)O(n) work.

linear_time.cpp

#include <vector>
#include <iostream>

void PrintAll(const std::vector<int>& Data) {
  // A simple loop is the hallmark of O(n)
  for (int Value : Data) {
    std::cout << Value << "\n";
  }
}

bool Contains(const std::vector<int>& Data, int SearchKey) {
  // Even though this might return early, in the
  // worst case we have to check every element.
  for (int Value : Data) {
    if (Value == SearchKey) return true;
  }
  return false;
}

Linear algorithms are the workhorse of modern computing. Remember, CPUs are designed with prefetchers - components that predict what memory you will need next.

When you iterate linearly through a std::vector, the access pattern is perfectly predictable. The prefetcher pulls data from RAM into the L1 cache before your code even asks for it.

Because of the realities of hardware, a linear scan over a array is often faster than theoretically superior algorithms, particularly for small to medium datasets.

Quadratic Time

Quadratic time is where things start to get dangerous. This complexity occurs when the work grows proportional to the square of the input. If you double the data, the work quadruples. This is a quadratic algorithm - O(n2)O(n^2).

This typically happens when you nest loops - processing the entire collection for each element in the collection. A classic example is checking for collisions between all entities in a game, or checking for duplicates in an unsorted list.

quadratic_time.cpp

#include <vector>
#include <iostream>

void FindDuplicates(const std::vector<int>& Entities) {
  // Outer loop: n iterations
  for (size_t i = 0; i < Entities.size(); ++i) {

    // Inner loop: n iterations
    for (size_t j = 0; j < Entities.size(); ++j) {

      // We perform this check n * n times
      if (i != j && Entities[i] == Entities[j]) {
        std::cout << "Duplicate detected!\n";
      }
    }
  }
}

To visualize this, imagine a grid. If you have a list of 5 items, a linear algorithm walks a line of 5 steps. A quadratic algorithm has to fill in a 5×55 \times 5 grid, performing 25 operations.

If nn increases, linear algorithm tend to scale in a controllable way, whilst the "grid" of work required of a quadratic algorithm explodes.

These quadratic algorithms are when we begin to enter the danger zone. An algorithm that feels instant when testing can bring systems to a crawl as nn gets larger.

Naive Solutions

Our previous algorithm to find duplicates in a collection is an example of a "naive" or "brute force" solution. The same problem can be solved in many different ways. The following solution removes the nested-loop interaction:

#include <vector>
#include <iostream>
#include <unordered_set>

void FindDuplicates(const std::vector<int>& entities) {
  std::unordered_set<int> seen;
  for (int e : entities) { // O(n)
    if (seen.contains(e)) { // O(1)
      std::cout << "Duplicate detected!\n";
    }
    seen.insert(e); // O(1)
  }
}

The total complexity here is nn multipled by the constant costs of contains() and insert(). And, since Big O ignores multiplicative constants, we now have a linear solution.

For large values of nn, this is likely to be an improvement over our initial attempt, as long as the additional memory required for the seen container isn't problematic.

Comparing Polynomial Algorithms

We've now seen the three most common types of polynomial algorithms:

  1. Constant: O(n0)O(n^0), usually written as O(1)O(1)
  2. Linear: O(n1)O(n^1), usually written as O(n)O(n)
  3. Quadratic: O(n2)O(n^2)

This pattern continues. We can have cubic algorithms - O(n3)O(n^3), quartic algorithms - O(n4)O(n^4), and so on.

It is quite rare that the algorithms we create to solve real-world problems fit exactly into these equations. For example, an algorithm might require 50n250n^2 operations. Another might require n2+10n+30n^2 + 10n + 30 operations.

Big O notation simplifies things to just the core classification - both of these algorithms would be categorized as O(n2)O(n^2). This simplification allows us to communicate and compare approaches more simply.

Specifically, Big O makes the following simplifications:

Treat All Operations as Equal

As we've seen in the previous lesson, some operations can take much more time than others, particularly if the required data isn't in the cache. To keep things simple, Big O ignores that. It assumes all operations are equally expensive.

Ignore Constants

One algorithm might require n+100n + 100 operations whilst another might require exactly nn. We ignore the additive constant - both these algorithms are linear - O(n)O(n)

One algorithm might require 50n50n operations whilst another might require nn. We also ignore multiplicative constants - these algorithms are still both linear - O(n)O(n)

Only Include the Dominant Term

One algorithm might require n2+nn^2 + n operations whilst another might require n2+50n^2 + 50. As we saw in the graphs above, the term with the highest order (that is, the largest exponent) dominates the behavior of the function as nn increases. So, both of these are classified as quadratic - O(n2)O(n^2)

Structure Dictates Complexity

Remember, the algorithm we use to work with our data is intrinsically linked to how that data is structured. The physical shape of our data sets the speed limit for how fast a task can be accomplished.

Consider the simple task of finding the last element in a collection. If our data is stored in a contiguous array (like std::vector), finding the element at the last index, or any index, is a simple arithmetic calculation based on where the array starts in memory and the size of each element.

It doesn't matter if the array has 5 items or 5 billion; the CPU performs one calculation and jumps instantly to the end. This is a constant time, O(1)O(1) algorithm.

However, if your data is stored in a linked list, this algorithm isn't available because the nodes could be scattered anywhere in RAM. To find the last node, we have to follow the links from one node to the next. The more objects we have in our list, the more nodes we need to traverse, so the algorithm scales linearly, O(n)O(n), with the input size.

The task is the same for both containers, but the algorithmic complexity can be different depending on how that container organizes its collection in memory.

Theory vs. Engineering

Academic resources often disproportionately focus on abstract Big O analysis and underrepresent the importance of physical hardware, memory layout, and the "constant factors" that Big O ignores. In this abstract world, a linear solution is strictly superior to a quadratic solution, but that's not always true in the real world

When nn is small, the overhead of a complex algorithm can dwarf the actual work being done. You can implement a complex algorithm that should be highly efficient in theory, but if it doesn't consider the physical reality of the hardware, it can perform terribly in the real world.

Meanwhile, a beginner with no academic knowledge might store everything in arrays and solve every problem by looping over those arrays. Their approach can create highly performant systems by accident because they're taking full advantage of the caches through locality, and they're consistently accessing their data in predictable ways that the prefetcher can help with.

We can use Big O to prevent architectural disasters, but day-to-day, we use profiling to test and optimize the actual implementation.

Summary

In this lesson, we explored polynomial time complexity:

  • Constant: The ideal - instant access, enabled by contiguous memory layouts and pointer arithmetic.
  • Linear: The typical reality - we need to touch every item. The hardware loves this pattern if the data is stored contiguously, making it surprisingly fast.
  • Quadratic: Where things begin to get dangerous - usually nested loops. Avoid this for large datasets.

We also learned that Big O is a model, not a law. For small data, a "bad" algorithm with good memory locality can beat a "good" algorithm with bad locality.

In the next lesson, we will look at logarithmic time - O(logn)O(\log n). We will see how we can exploit sorted data to discard half the workload with every single step, allowing us to search billions of items in nanoseconds.

Next Lesson
Lesson 4 of 5

Logarithmic and Linearithmic Algorithms

Working with massive datasets using divide and conquer strategies, and the the required trade-offs.

Have a question about this lesson?
Answers are generated by AI models and may not be accurate