NP-Hard and Approximations

Learn why some problems are impossible to solve exactly and how we can use approximations like Monte Carlo simulations to cheat.

Ryan McCombe
Published

In the previous lessons, we explored algorithms that slow down as data grows. An O(n2)O(n^2) algorithm might get sluggish if you throw a million items at it, but with enough patience or a faster CPU, you can usually crunch through it.

Now, we hit the wall. We are going to look at algorithms that don't just slow down - they explode. These are problems where adding just a handful of elements to your input can turn a task that takes milliseconds into one that takes millennia.

The Recursion Trap

The first tier of intractability is exponential time, most commonly seen as O(2n)O(2^n). This complexity implies that for every single element you add to the input, the work doubles.

The most common source of exponential complexity happens by accident when we try to use recursion. Let's look at the classic mathematical definition of the Fibonacci sequence:

F(n)=F(n1)+F(n2) F(n) = F(n-1) + F(n-2)

To find the 5th number, you add the 4th and the 3rd. It's simple, but if we add this innocuous line of code to our program, we can create a disaster.

fibonacci.cpp

#include <iostream>

// WARNING: Exponential Complexity
uint64_t fib(int n) {
  if (n <= 1) return n;

  // Each call spawns two more calls
  return fib(n - 1) + fib(n - 2);
}

int main() {
  // 2 microseconds
  std::cout << fib(10) << "\n";

  // 2 minutes
  std::cout << fib(50) << "\n";

  // 2 trillion years
  std::cout << fib(100) << "\n";
}

At first glance, fib(5) doesn't look scary. But if we map out the execution, we'll see that we aren't building a list of function calls; we are building a tree.

Additionally, most of the function calls in this tree are repeating work done elsewhere in the tree. We've highlighted that duplicate work in red below:

By the time we get to fib(50), this tree has roughly 40 billion function calls. On a typical CPU, this would take multiple minutes to solve.

Increase nn to 100, and the runtime exceeds the age of the universe.

Advanced: The Recursive Stack

The recursive fibonacci solution is bad in theory, but it's even worse in reality. Firstly, there is a limit to how large the call stack can be. It's often difficult to predict when our recursion will hit that limit - the famous stack overflow error. For this reason, use of recursion is often banned in many safety-critical applications.

However, even when that isn't a concern, there is still a performance overhead associated with recursion. A recursive solution might have the same theoretical scaling as an iterative one but, in reality, an iteration of a loop and a function call are very different.

A function call is a heavy, physical interaction with memory. Every time your program calls fib(), the CPU cannot just "jump" to the code. It has to prepare the stack.

  1. Push Return Address: The CPU writes the address of the current instruction to the stack so it knows where to return later.
  2. Push Registers: It saves the current state of the CPU registers to the stack to prevent the new function from overwriting them.
  3. Allocate Locals: It moves the stack pointer to make room for local variables (the n parameter, in this case).
  4. Branch Prediction: The CPU has to jump to a new location in code memory, likely flushing the instruction pipeline.

In our fib() example, we aren't just doing 2n2^n additions. We are performing 2n2^n interactions with the stack memory. The sheer volume of memory traffic generated by these function calls creates massive latency before the functions themselves do anything useful.

This is why we usually prefer iterative solutions (loops) over recursive ones. A loop stays in one stack frame. It keeps the variables hot in the registers. It flows linearly through memory.

Factorial Time

If you thought exponential time was bad, factorial time is the final boss of complexity. This is O(n!)O(n!).

n!=n×(n1)×(n2)×...×1n! = n \times (n-1) \times (n-2) \times ... \times 1

This level of complexity appears in combinatorial optimization problems - situations where you need to find the "best" arrangement of a set of items. The most famous example is the Traveling Salesman Problem (TSP).

Imagine you're a delivery driver with a list of nn cities. You need to visit every city exactly once and return to the start, and you want to find the route that results in the shortest total driving distance.

The Brute Force Solution

To solve this problem using a brute-force approach, we need to measure every possible route.

  1. Pick a starting city.
  2. For the next stop, you have n1n-1 choices.
  3. For the stop after that, you have n2n-2 choices.
  4. And so on.

The total number of routes is n!n!.

For 5 cities, 5!=1205! = 120 routes. Your computer solves this in microseconds.

For 10 cities, 10!=3,628,80010! = 3,628,800 routes. Still solvable in milliseconds.

For 20 cities, 20!2.4×101820! \approx 2.4 \times 10^{18} routes.

If you had a supercomputer that could check one billion routes per second, it would still take 77 years to solve the route for just 20 cities. Add one more city (n=21n=21), and it takes 1,600 years.

Approximations

We generally don't have 1,600 years to solve a problem. We want our delivery route today, not in the next geologic epoch.

When an exact solution is intractable, we switch to approximation algorithms. We agree to accept a "good enough" answer in exchange for a solvable runtime.

Heuristics

A heuristic is a rule of thumb. It's a strategy that isn't guaranteed to be perfect, but is usually pretty good.

For the Traveling Salesman Problem, a common heuristic is the nearest neighbor algorithm: From your current city, just go to the closest city you haven't visited yet.

This changes the complexity from O(n!)O(n!) to O(n2)O(n^2). We can solve it for 10,000 cities in seconds. The resulting path might be 10% longer than the mathematically perfect path, but a decent solution that we can find is better than a perfect solution that we can't.

Monte Carlo Simulations

Another technique for cheating intractability is the Monte Carlo method. Instead of calculating an exact answer, we use repeated randomness to estimate it.

For example, if we throw darts at random positions on a board, we can estimate the approximate area of shapes on that board based on the proportion of darts that hit them.

The following simulation uses this technique to estimate the value of π\pi based on how many darts hit a circle. The correct answer is around 3.143.14 and our simulation will get an answer pretty close to that without any prior knowledge.

The code for a Monte Carlo simulation usually involves a simple loop that generates a random sample on every iteration. A modern CPU can run millions of these iterations every second, with each sample further improving the accuracy of the estimate.

Time-Bounded Execution

By using approximations, we can convert our solution into an anytime algorithm. This is a special class of algorithm that can be interrupted at any point during its execution and still return a valid solution.

This gives us control over the performance profile of our application. If we need a highly precise result for a scientific experiment, we can let the simulation throw darts for many hours, days, or even months. If we need a "good enough" estimate for a real time application, we might cut it off after a few milliseconds. We are no longer at the mercy of the algorithm's complexity; we decide exactly how much time we are willing to pay for a solution.

Summary

We have covered a lot of ground in this course. Here are the key points:

  1. Data Structure = Memory Layout: We learned that a data structure is a configuration of objects in memory. The two primary options are contiguous layouts (arrays) or layouts where objects are linked to each other by pointers.
  2. Hardware is Reality: We saw that the CPU doesn't talk to RAM directly. It talks to caches. Performance is dominated by latency. If you miss the cache, you stall.
  3. Polynomial Time: We explored the standard complexity classes like O(n)O(n) and O(n2)O(n^2). We learned that for small data, hardware efficiency (prefetching, locality) often matters more than Big O.
  4. Logarithmic Time: We saw the power of divide and conquer - O(logn)O(\log n). We learned that sorting data allows us to search massive datasets almost instantly, but we pay a penalty in poor locality and branch mispredictions.
  5. Intractability: Finally, in this lesson, we looked into the abyss of exponential and factorial time. We learned that some problems are unsolvable by brute force, requiring us to use approximations.

Up to this point, we have relied on theory, diagrams, and Big O analysis to predict how fast code will run. But as we've noted repeatedly, theory often disagrees with reality. The only way to know the truth is to measure it.

Our walks though setting up a lab to test the performance of data structures and algorithms.

Have a question about this lesson?
Answers are generated by AI models and may not be accurate