When you write C++ code, you are writing text. That text needs to be translated into machine instructions - binary - that the CPU can execute. This translation is performed by a compiler toolchain, such as GCC (on Linux), Clang (on macOS/Linux), or MSVC (on Windows).

These compilers are incredibly complex tools. A compiler like GCC has hundreds of command-line flags that control code generation. You can tell it to prioritize file size over speed, to unroll loops, to assume strict aliasing rules, or to generate instructions for a specific processor architecture.

Manually invoking the compiler with these flags for every single file in your project is impossible. This is where a build system comes in.

We will be using CMake. It is important to understand that CMake is not a build system itself; it is a build system generator. You describe your project in a file called CMakeLists.txt. CMake reads that file and generates the actual build instructions for your specific environment.

Diagram showing the CMake generation process

If you are on Linux, it might generate a Makefile. If you are on Windows and have Visual Studio installed, it might generate a Visual Studio Solution (.sln). This allows us to write our configuration once and have it work on any machine.

We cover CMake in detail in a . For this course, we will move much faster and focus specifically on the configurations that impact performance analysis.

The Project Architecture

In this course, we'll mostly be focused on setting up a benchmarking lab, but we'll use a somewhat realistic project structure - one that generates an executable that we'd ultimately ship to users.

An important thing to note is that our benchmarking lab will also be an executable.

Both executables need to share the same code - the algorithms and data structures we test in our benchmarks are the same ones we use in what we ship to our users.

To accomplish this, we need to place that shared code in a library, meaning our project needs to have at least three components:

Core Logic: A library containing the code that is used by both our primary executable and our benchmarking.
Application: The actual program that users run (e.g., a game) that consumes the core logic.
Benchmarks: A separate testing suite that also consumes the core logic to measure its speed.

Of course, more complex projects can span across many more libraries but, for our simple project, we will organize our filesystem into these three distinct subdirectories:

1MyProject/
2├── CMakeLists.txt        (The Root)
3├── CMakePresets.json     (Build configurations)
4├── dsa_core/             (The Library)
5│   ├── CMakeLists.txt
6│   ├── include/
7│   │   └── dsa/
8│   │       └── vector.h
9│   └── src/
10│       └── vector.cpp
11├── dsa_app/              (The Main Application)
12│   ├── CMakeLists.txt
13│   └── main.cpp
14└── benchmarks/           (We will add this next lesson)
15    ├── CMakeLists.txt
16    └── main.cpp

The Root Configuration

Let's start by creating the root CMakeLists.txt. Its primary job is to define the project-wide standards and orchestrate the subdirectories.

CMakeLists.txt

1cmake_minimum_required(VERSION 3.20)
2
3# Define the project name and language
4project(DsaCourse VERSION 1.0 LANGUAGES CXX)
5
6# Enforce a global standard
7set(CMAKE_CXX_STANDARD 23)
8set(CMAKE_CXX_STANDARD_REQUIRED ON)
9set(CMAKE_CXX_EXTENSIONS OFF)
10
11# Add our subdirectories
12add_subdirectory(dsa_core)
13add_subdirectory(dsa_app)
14
15# We will add this in the next lesson
16# add_subdirectory(benchmarks)

Defining the Core Library

Now we move into the dsa_core folder, which is where we'll place all our shared code. We will define this as a library target.

In CMake, add_library() tells the build system to compile code into a binary archive but not into an executable program. This archive waits to be linked into other programs later.

We will also add a simple placeholder class here, MyVector, just so we can confirm everything is working.

Files

dsa_core

Select a file to view its content

Note that the intermediate dsa directory between include/ and vector.h is intentional. In larger projects, this "namespaced include" technique makes it easier to tell what our #include directives are doing:

1// Before - where is this file coming from?
2#include <vector.h> 
3
4// After - we're clearly using the dsa library
5#include <dsa/vector.h>

Defining the Application

Next, we define our main application. This will be an executable that links to our library. We use add_executable() to create the target, and target_link_libraries() to link it against our library:

dsa_app/CMakeLists.txt

1# Create the executable binary
2add_executable(dsa_app main.cpp)
3
4# Link to our core library.
5# This pulls in the code AND the include paths
6target_link_libraries(dsa_app PRIVATE dsa_core)

Our library's configuration file specified where its header files are using target_include_directories(), and our application's configuration file used target_link_libraries() to declare that it needs the library.

CMake will now manage that dependency for us. We can #include our library's header files from our app:

dsa_app/main.cpp

1#include <dsa/vector.h>
2#include <iostream>
3
4int main() {
5    dsa::MyVector vec;
6    vec.push_back(42);
7
8    std::cout << "Application Running. Vector size: "
9              << vec.data.size() << "\n";
10    return 0;
11}

When we build our project, CMake will coordinate with the build system to make sure the linking happens correctly, too. We'll build and confirm this soon, but lets handle some additional configuration first.

The Debug Deception

At this point, we have a working build. But if you compile it using default settings, you are likely creating a Debug build.

In debug mode, the compiler's goal is not performance; it is observability. It wants to make it easy for you to step through the code line-by-line in a debugger. To achieve this, it does several things that hurt performance:

No Inlining: It compiles every function call as a literal call instruction. For small functions (like std::vector::size() or accessing an element), the overhead of the function call can be larger than the work done inside the function.
Memory Stack Usage: It keeps variables in stack memory rather than CPU registers so you can inspect their values at any time. This forces constant, slow L1 cache interaction instead of instant register access.
Safety Checks: The standard library often enables extra checks in debug mode. Accessing a vector element might trigger a bounds check. Incrementing an iterator might check if the iterator is valid.

These features are helpful for detecting and fixing bugs but, to maximize performance, we turn them off when we're creating a build that we'll release. So, if we're trying to measure performance of what we'll eventually release, we must replicate that release configuration when running our benchmarks.

To simplify this, we will use a CMakePresets.json file in our root directory. This allows us to define standard configurations that we can apply with a single command later:

CMakePresets.json

1{
2  "version": 3,
3  "configurePresets": [{
4    "name": "release",
5    "binaryDir": "${sourceDir}/build/release",
6    "cacheVariables": {
7      "CMAKE_BUILD_TYPE": "Release"
8    }
9  }],
10  "buildPresets": [{
11    "name": "release",
12    "configurePreset": "release"
13  }]
14}

Unlocking the Hardware

Setting the build to Release usually enables the -O3 (on GCC/Clang) or /O2 (on MSVC) optimization level. This tells the compiler to apply aggressive transformations to your code, like unrolling loops and reordering instructions.

However, by default, compilers are conservative about architecture.

When you download a program like Chrome or Discord, it works on your computer, but it also works on a computer from 2015. To achieve this portability, the compiler only uses the "lowest common denominator" CPU instructions. It avoids using modern features like AVX2 or AVX-512 because older CPUs would crash if they tried to execute them.

But we aren't distributing our benchmarks to millions of users. We are running them on our machine and, assuming our test machine is similar to our users, we want to optimize for that hardware.

Using `march=native`

We can tell the compiler to generate code specifically for the CPU it is currently running on. On GCC and Clang, this flag is -march=native. MSVC doesn't have a direct equivalent, but /arch:AVX2 is a fairly common standard for modern PCs.

We will apply these flags specifically to our dsa_core library. Because dsa_core is linked into both our primary application and, later, our benchmarking application, optimizing it here will apply it to both.

Let's update dsa_core/CMakeLists.txt:

dsa_core/CMakeLists.txt

1# ...
2
3# Enable Architecture-Specific Optimizations for Release Builds
4# -march=native allows the compiler to use instructions specific
5# to your CPU (like AVX)
6if(CMAKE_CXX_COMPILER_ID MATCHES "Clang" OR CMAKE_CXX_COMPILER_ID MATCHES "GNU")
7  target_compile_options(dsa_core PRIVATE
8    $<$<CONFIG:Release>:-O3 -march=native>
9  )
10elseif(MSVC)
11  target_compile_options(dsa_core PRIVATE
12    $<$<CONFIG:Release>:/O2 /arch:AVX2>
13  )
14endif()

Notice the use of generator expressions: $<$<CONFIG:Release>:...>. This ensures these flags are only applied when we are building in release mode. We don't want to optimize our debug builds, or we'll lose the ability to step through them easily.

Link Time Optimization (LTO)

Traditionally, C++ compiles one file at a time. This unit is called a translation unit.

If you have a function MyVector::push_back() in vector.cpp and you call it from main.cpp, the compiler cannot optimize that call when compiling main.cpp. It doesn't know what push_back() does; it only sees the function declaration (the promise that it exists).

This means the compiler cannot inline the function. It has to generate a standard function call, jumping to a different memory address. This breaks locality and prevents optimizations.

Link Time Optimization (LTO) changes this. It delays the code generation step until the linker runs. The linker sees the entire program at once. It can see the code inside vector.cpp and inject it directly into main.cpp, deleting the function call entirely.

CMake makes enabling this easy. We check if the compiler supports it, and if so, we turn it on for dsa_core.

dsa_core/CMakeLists.txt

1# ...
2
3# Enable Link Time Optimization (LTO) if supported.
4include(CheckIPOSupported)
5check_ipo_supported(RESULT result OUTPUT output)
6
7if(result)
8  set_target_properties(dsa_core PROPERTIES
9    INTERPROCEDURAL_OPTIMIZATION TRUE
10  )
11else()
12  message(WARNING "IPO is not supported: ${output}")
13endif()

Complete Code

Here is the complete state of the project files at the end of this lesson. This structure establishes the library-application separation and configures the build system for high-performance release builds.

Files

CMakeLists.txt

CMakePresets.json

dsa_core

dsa_app

Select a file to view its content

Configuring the Generator

Before we use CMake to generate the build system, we might want to configure what type of build system it generates. By default, it scans our system searching for what we have installed, and it will generally choose a sensible default.

However, if we have a complex environment, perhaps with many different toolchains installed, it might choose something we don't want.

We can get the list of the available generators using the following command:

1cmake --help

To explicitly specify which one we use, we add a generator key to our CMakePresets.json. This value should be the generator we want to use, formatted in the exact same way it was in the cmake --help output. For example:

1{
2  "version": 3,
3  "configurePresets": [{
4    "name": "release",
5    "binaryDir": "${sourceDir}/build",
6    "generator": "Visual Studio 17 2022",
7    "cacheVariables": {
8      "CMAKE_BUILD_TYPE": "Release"
9    }
10  }],
11  "buildPresets": [{
12    "name": "release",
13    "configurePreset": "release",
14    "configuration": "Release"
15  }]
16}

Building the Project

Let's build our project and confirm everything is working. From our project directory (the same location as the root CMakeLists.txt) we can ask CMake to generate a build system using our release preset:

1cmake --preset release

We can then use that build system to compile our project. CMake provides an interface via the --build flag that lets us communicate with that underlying build system in a standardized way.

To use it, from our project root:

1cmake --build --preset default

Using this approach is recommended over interacting with the build system directly. This is primarily because cmake --build will first check if we changed anything in our CMakeLists.txt files, and will automatically regenerate the build system to include those changes.

If our build was successful, and we pay attention to the output, we should see where our executable was created. The location depends on our build system, but will often be something like build\dsa_app\Release\dsa_app, with an additional .exe extension on Windows.

We can run it from the same terminal window:

1.\build\dsa_app\Release\dsa_app.exe

1Application Running. Vector size: 1

Summary

We have now laid the foundations for a build environment that will let us do performance analysis. Here are the key points:

Structure: We separated our project into dsa_core (library) and dsa_app (executable). This modular design allows us to easily attach a benchmarking suite in the next lesson.
Build Type: We prioritized Release mode to strip away debug overhead and enable the optimizer.
Hardware Access: We used march=native to allow the compiler to use every feature of our specific CPU.
Global Optimization: We enabled LTO to allow the compiler to inline functions across the boundary between our library and our applications.

In the next lesson, we will implement the third component of our architecture: the benchmarks directory. We will integrate Google Benchmark to measure our code.

Configuring a CMake Project

The Project Architecture

The Root Configuration

CMakeLists.txt

Defining the Core Library

Files

Defining the Application

dsa_app/CMakeLists.txt

dsa_app/main.cpp

The Debug Deception

CMakePresets.json

Unlocking the Hardware

Using `march=native`

dsa_core/CMakeLists.txt

Link Time Optimization (LTO)

dsa_core/CMakeLists.txt

Complete Code

Files

Configuring the Generator

Building the Project

Summary

Integrating Google Benchmark

Google Benchmark for C++

Configuring a CMake Project

The Project Architecture

The Root Configuration

CMakeLists.txt

Defining the Core Library

Files

Defining the Application

dsa_app/CMakeLists.txt

dsa_app/main.cpp

The Debug Deception

CMakePresets.json

Unlocking the Hardware

Using march=native

dsa_core/CMakeLists.txt

Link Time Optimization (LTO)

dsa_core/CMakeLists.txt

Complete Code

Files

Configuring the Generator

Building the Project

Summary

Integrating Google Benchmark

Using `march=native`