Amortized Costs and Crossover Points

Using benchmarks to calculate the 'break-even point' where investing in data preparation pays off.

Ryan McCombe
Published

In the previous lessons, we built a benchmarking lab and learned how to map algorithmic complexity. Now, we are going to use all of these tools to answer a typical, real world question we might have in our project:

"I have an array of objects. I need to find specific items. Should I just scan the vector, or should I sort it and use binary search?"

We will benchmark these two strategies against each other. We'll imagine the data we're working with a std::vector<Player>, and each Player looks like the following:

Player.h

struct Player {
  int id;
  int score;
  int health;
};

We frequently need to find specific players in our collection based on their ID. We have two contending strategies. We leave the vector exactly as it is. When we need to find a player, we use linear search - an O(n)O(n) algorithm.

Alternatively, we sort the array by ID at a O(nlogn)O(n \log n) cost. Then, we can use binary search at O(logn)O(\log n) cost any time we need to find a player by ID.

So which is better? It depends. This isn't really helpful, so we generally want to ask a different question: "under what conditions is this better?" In this scenario, the answer we're trying to get is "under what conditions should I consider sorting my data?"

The Lookup Reality

Before we even consider the cost of sorting, let's check our premises. Will binary search ever be faster than linear search for the data we're working with?

We know linear search is hardware-friendly so, even if the data was already sorted and we didn't need to pay the cost, we might still prefer linear search to binary search. Let's test.

We will create a benchmark that runs on already sorted data. We will compare find() (linear search) against binary_search():

benchmarks/main.cpp

#include <benchmark/benchmark.h>
#include <vector>
#include <algorithm>
#include <random>
#include <ranges>

struct Player {/*...*/};
std::vector<Player> GetSortedPlayers(int n) {/*...*/}; static void BM_LinearLookup(benchmark::State& state) { int n = state.range(0); // Player ids from GetSortedPlayers() range from 0 to n*2 auto players = GetSortedPlayers(n); int target = n; // Look for a middle-ish value for (auto _ : state) { auto result = std::ranges::find(players, target, &Player::id); benchmark::DoNotOptimize(result); } } static void BM_BinaryLookup(benchmark::State& state) { int n = state.range(0); auto players = GetSortedPlayers(n); int target = n; for (auto _ : state) { auto result = std::ranges::binary_search( players, target, {}, &Player::id ); benchmark::DoNotOptimize(result ); } } BENCHMARK(BM_LinearLookup)->RangeMultiplier(2)->Range(4, 128); BENCHMARK(BM_BinaryLookup)->RangeMultiplier(2)->Range(4, 128);

Benchmarking Results

On my machine, this experiment produced the following results:

----------------------------
Benchmark                CPU
----------------------------
BM_LinearLookup/4    2.14 ns
BM_LinearLookup/8    2.92 ns
BM_LinearLookup/16   4.60 ns
BM_LinearLookup/32   6.14 ns
BM_LinearLookup/64   11.5 ns
BM_BinaryLookup/4    3.08 ns
BM_BinaryLookup/8    3.30 ns
BM_BinaryLookup/16   4.01 ns
BM_BinaryLookup/32   4.60 ns
BM_BinaryLookup/64   5.94 ns

So, with 8 or fewer items in our collection, linear search wins. Binary search begins to outperform the scanning when n16n \ge 16. Note these results are only valid for this experiment. If we change the variables, the result will change. The two most important variables for this experiment are:

  • The hardware running the benchmarks: We should run our benchmarks on hardware representative of how our final code will be used
  • The data type being used: If our type was more expensive to compare (usually the == operator), doing fewer comparisons would skew the results in favor of binary search. If the type was more expensive to sort (usually the < operator) and move, that would hurt binary search.

But, for this scenario, we can see that if we're dealing with n8n \le 8 then we wouldn't sort our data. Even if it was already sorted, we'd just use linear search. For n16n \ge 16, we should start considering binary search. But we haven't factored in the cost of sorting yet.

The Cost of Sorting

Let's switch our parameters - we'll assume we know that we're working with player counts in the region of 10,000. Based on our previous results, we know binary search will be much faster than linear scans on that data.

But the binary search has a prerequisite: the data must be sorted. We have to pay a cost to sort it. And specifically, it must be sorted based on the thing we're searching for. If we want to find a player by id and the data is sorted by score, that isn't helpful. We'd need to use linear search, or re-sort the data by id.

Additionally, if players are being added and removed from the array, or if the thing we're searching by can change (like the player's score), we have to pay the cost to sort it again if we want to keep using binary search.

If we sort the vector and then only look up one player before we need to resort it, we have made a terrible investment. We paid for a full sort just to avoid a single linear scan.

If we sort the vector and then look up one million players, we have probably made a fantastic investment. The massive savings on the searches should pay for the initial sort.

But where is the break-even point?

Implementing the Amortization Benchmark

Mathematically, we are looking for the number of queries KK where:

Sort+(K×Binary)<(K×Linear)\text{Sort} + (K \times \text{Binary}) < (K \times \text{Linear})

To find KK, we need a benchmark that simulates the full lifecycle: prepare + query.

We will fix our dataset size (nn) to a number we know our project requires (e.g., n=10,000n=10,000). Then, we will vary the number of searches per sort (KK) using the library's range feature.

benchmarks/main.cpp

#include <benchmark/benchmark.h>
#include <vector>
#include <algorithm>
#include <random>
#include <ranges>

struct Player {/*...*/};
std::vector<Player> GenerateRandomPlayers() {/*...*/};
void GenerateRandomQueries() {/*...*/}; const int N = 10000; // No setup cost, high query cost static void BM_RepeatedLinear(benchmark::State& state) { auto original_data = GenerateRandomPlayers(N); // Create K random queries int K = state.range(0); std::vector<int> queries(K); GenerateRandomQueries(queries, N * 10); for (auto _ : state) { std::vector<Player> players = original_data; // Perform K linear searches for (int target_id : queries) { auto result = std::ranges::find( players, target_id, &Player::id ); benchmark::DoNotOptimize(result); } } } // High setup cost (Sort), low query cost static void BM_SortAndBinary(benchmark::State& state) { auto original_data = GenerateRandomPlayers(N); // Create K random queries int K = state.range(0); std::vector<int> queries(K); GenerateRandomQueries(queries, N * 10); for (auto _ : state) { std::vector<Player> players = original_data; // THE COST: Sort the vector std::ranges::sort(players, {}, &Player::id); // THE BENEFIT: K binary searches for (int target_id : queries) { auto result = std::ranges::binary_search( players, target, {}, &Player::id ); benchmark::DoNotOptimize(result); } } } BENCHMARK(BM_RepeatedLinear) ->Range(1, 1<<10) // Number of Queries (K) ->Unit(benchmark::kMillisecond); BENCHMARK(BM_SortAndBinary) ->Range(1, 1<<10) // Number of Queries (K) ->Unit(benchmark::kMillisecond);

Amortization Benchmark Results

Here are the results from my machine:

---------------------------------
Benchmark                     CPU
---------------------------------
BM_RepeatedLinear/1      0.004 ms
BM_RepeatedLinear/8      0.020 ms
BM_RepeatedLinear/64     0.153 ms
BM_RepeatedLinear/512     1.29 ms
BM_RepeatedLinear/1024    2.35 ms
BM_SortAndBinary/1       0.475 ms
BM_SortAndBinary/8       0.455 ms
BM_SortAndBinary/64      0.445 ms
BM_SortAndBinary/512     0.481 ms
BM_SortAndBinary/1024    0.502 ms

So, for these variables, on this hardware, our basic linear searching strategy wins for K64K \le 64, whilst sort + binary search wins for K512K \ge 512. We can run this again with a tighter range within these bounds to get a more accurate result if needed.

But from these results, we can conclude that if we typically get fewer than 64 reads of our data before we need to sort it again, the investment doesn't pay off. If we read our data 512 times or more between sorts, the sorts are a good investment.

Obviously, there is a lot of nuance here, and real world software is a lot more complex. Perhaps we can sort the data in a different thread, making it much cheaper. Perhaps other components in our system have pointers to our players, making sorting more disruptive as it invalidates those pointers.

The most important thing for now is that we have a lab to test our ideas. The academic theory gives us intuition on what ideas are worth experimenting with, and the lab lets us test if those ideas hold water in our specific use cases, with our specific data types.

Have a question about this lesson?
Answers are generated by AI models and may not be accurate