Object Files and the Linking Process

The linker's role in the build process, how it resolves symbols to combine object files, and how to troubleshoot common errors.

Greg Filak
Published

In the last lesson, we saw how C++ source code is transformed into machine code. The compiler takes each .cpp file, preprocesses it into a translation unit, and compiles it into an object file, typically with a .o extension on Unix, and .obj on Windows.

We now have a collection of these object files, each an island of compiled code. An object file for main.cpp might contain a call to a function like calculateScore(), but it has no idea where the actual machine code for calculateScore() lives. The compiler just left a note saying, "I need the address of calculateScore()."

This is where the linker comes in. The linker is the final piece of the puzzle in the C++ build process. It's the master assembler that takes all our separate, compiled object files and stitches them together to create a single, cohesive executable program.

In this lesson, we'll learn about this linking process. We'll explore the linker's primary job of resolving symbols, see how it combines object files into an executable, and, most importantly, learn to diagnose and fix the linker errors that every C++ developer eventually faces.

The Linker's Role and Symbol Resolution

After the compiler has done its job on each source file, we're left with a set of object files. Each object file is a container of machine code, but it's incomplete. It knows about the functions and variables it defines, but it only has placeholders for the ones it uses from other files.

The linker's job is to resolve these placeholders and merge everything into a final, runnable program.

It accomplishes this through two main tasks: symbol resolution and relocation.

Symbols and Symbol Tables

In the context of linking, a symbol is simply a name that refers to a function or a global variable. When the compiler processes main.cpp, it generates a symbol table for the resulting main.o object file. This table lists all the symbols the object file is aware of, categorized in two ways:

  1. Defined Symbols: These are the functions and global variables that are implemented within this object file. These symbols are "exported" and made available for other object files to use. For example, if math.cpp contains the definition of int add(int, int), then add is a defined symbol in math.o.
  2. Undefined Symbols (or External References): These are symbols that are used in this object file but are defined elsewhere. The object file essentially says, "I need to call a function named add, but I don't know where it is. Please find it for me."

The Resolution Process

Symbol resolution is the process where the linker connects the dots. It iterates through all the object files you give it and builds a global map of all defined symbols. Then, for every undefined symbol in each object file, it searches this global map to find a matching defined symbol.

Let's imagine a simple project with two files:

Files

math.cpp
main.cpp
Select a file to view its content

When we compile these, we get math.o and main.o.

  • The symbol table for math.o says: "I define a symbol named add."
  • The symbol table for main.o says: "I use a symbol named add, but it's undefined here."

When we link them, the linker sees the undefined reference to add in main.o, finds the definition of add in math.o, and connects them.

Once the linker knows the final memory address for the add function, it performs relocation. It goes back to the machine code in main.o and replaces the placeholder in the call instruction with the actual memory address of add. This "patching" process ensures that when the program runs, the function call jumps to the correct location.

Static Linking - Combining Objects into an Executable

The process we've been describing is known as static linking. It's the most straightforward way to build a program.

In static linking, the linker takes all the input object files and copies their machine code and data directly into the final executable file. The result is a single, self-contained program that includes all the necessary code from your project.

When you run a command like g++ main.cpp math.cpp -o my_app, the g++ driver first compiles main.cpp and math.cpp into main.o and math.o behind the scenes. Then, it invokes the linker (ld on Linux) with these object files.

The linker performs its magic:

  1. It combines the code sections (often called .text) from all object files into a single code section in the executable.
  2. It combines the data sections (.data, .rodata) similarly.
  3. It resolves all symbols and performs relocations as described above.
  4. It adds some essential runtime startup code. This is a small piece of code provided by the C++ runtime library that runs before your main() function. It sets up the stack, initializes global variables, and prepares the command-line arguments - argc and argv - before finally calling your main() function.

The final output is an executable file that the operating system can load into memory and run. Because all the necessary code from your object files is copied into it, it has no external dependencies on them at runtime.

This self-contained nature is the hallmark of static linking. We'll contrast this with dynamic linking in the next lesson.

Undefined Reference / Unresolved External Symbol

Since the linker works across your entire project, it's the first tool to spot inconsistencies between different source files. Compile-time errors happen within a single translation unit, but linker errors happen when the pieces don't fit together.

Undefined references and unresolved external symbols are the most frequent linker error, and it means exactly what it says: you used a symbol, but the linker couldn't find a definition for it anywhere.

The error message will look something like this:

main.o: In function `main':
main.cpp:(.text+0x15): undefined reference to `add(int, int)'
collect2: error: ld returned 1 exit status

On Windows with MSVC, it's "LNK2019: unresolved external symbol".

This error occurs when the linker can't find the implementation for a function or variable you've used. Here are the usual suspects:

You Forgot to Define a Function

You declared it in a header file, used it in main.cpp, but never wrote the function body anywhere.

math.h

#pragma once

int add(int a, int b); // Declaration only
// No definition!

The function is defined in math.cpp, but you only told the compiler to build main.cpp:

g++ main.cpp -o my_app

We need to ensure we provide all the required files:

g++ main.cpp math.cpp -o my_app

A Name Mismatch

Another common reason for unresolved external symbols is that you might have a subtle difference between the declaration and the definition.

For example, if one is const and the other isn't, the compiler "mangles" their names into different symbols. The linker sees them as two completely different functions.

math.h

#pragma once

// Declaration
int multiply(int a, int b);

math.cpp

// Definition has a different signature
int multiply(const int a, int b) { 
  return a * b;
}

To the linker, multiply(int, int) is a different symbol than multiply(const int, int). The call in main.cpp will be looking for the first one, but only the second one is defined.

Multiple Definition / Symbol Redefined

The second most common linker error is the opposite problem: the linker found more than one definition for the same symbol. This violates a fundamental principle of C++.

A "multiple definition" error usually means you've violated the ODR. The most common cause is defining a function or a global variable in a header file.

Why do I get a Multiple Definition Error?

When you #include a header, its contents are textually pasted into the source file. If that header contains a function definition, every .cpp file that includes it will get its own copy of that function's machine code.

util.h

#pragma once

// BAD: Function defined in a header file 
int getID() { 
  return 42;
}

Now imagine main.cpp and player.cpp both include util.h.

  • When main.cpp is compiled, main.o gets a definition of getID().
  • When player.cpp is compiled, player.o also gets a definition of getID().

When the linker tries to combine main.o and player.o, it sees two different definitions for getID() and complains:

player.o: In function `getID()':
player.cpp:(.text+0x0): multiple definition of `getID()'
main.o:main.cpp:(.text+0x0): first defined here
ld: 1 duplicate symbol for architecture x86_64

The fix is to follow the standard C++ practice:

  • Declare in the header file (.h).
  • Define in a corresponding source file (.cpp).

Files

util.h
util.cpp
Select a file to view its content

Now, only util.o will contain the definition for getID(). Other object files will have an undefined reference, which the linker will happily resolve to the single definition in util.o.

For more on how linkage works with inline, static, and extern, we have a dedicated lesson:

Internal and External Linkage

A deeper look at the C++ linker and how it interacts with our variables and functions. We also cover how we can change those interactions, using the extern and inline keywords

Handling Libraries During Linking

So far, we've only linked object files that we compiled from our own project's source code. But what about external code, like the C++ Standard Library or a third-party library like SDL or Boost?

We don't want to recompile these libraries from source every time we build our project. Instead, we link against their pre-compiled object files, which are conveniently bundled into libraries.

A library is just an archive of object files. When linking, you can tell the linker to look inside these libraries to find definitions for any symbols it can't find in your own object files.

The two main types of libraries are static and dynamic. We'll focus on the static linking process here and explore the differences in the next lesson.

Static Libraries - .a and .lib

A static library - usually with a .a extension on Unix or .lib on Windows - is an archive of object files. Think of it as a .zip file for .o files.

When you link your program against a static library, the linker doesn't just dump the entire library into your executable. Instead, it intelligently searches the library only for the object files that provide definitions for your program's unresolved symbols.

For example, if you use a function from a large math library, the linker will find the specific object file within the library that contains that function and copy only that object file's code into your executable. The rest of the library is ignored.

The final executable is still self-contained. The necessary code from the library has been statically linked - copied into your program - just as if you had compiled it from source yourself.

We'll cover this in more detail in the next lesson but, to use a static library, you typically need to tell the linker two things:

  • Library Path: The directory where the linker can find the library file.
  • Library Name: The name of the library to link.

When we're using many libraries, managing these paths and names manually is tedious and error-prone, especially for cross-platform projects. This is one of the key problems that CMake is designed to solve.

Summary

In this lesson, we completed our journey from source code to a runnable program by exploring the final step: linking.

  • The Linker's Role: The linker is a program that takes one or more object files and libraries, resolves symbol references between them, and combines them into a single executable.
  • Symbol Resolution: It builds a map of all defined symbols (functions and global variables) from the input files and uses it to fill in the placeholders (undefined symbols) in each object file.
  • Static Linking: This process copies the required machine code from your project's object files and any static libraries directly into the final executable, creating a self-contained program.
  • Common Linker Errors: We learned that "undefined reference" errors mean a definition was missing, while "multiple definition" errors mean a symbol was defined more than once, violating the One Definition Rule (ODR).
  • Libraries: Libraries are archives of pre-compiled object files. The linker can search them to find definitions for symbols not provided by your own code.
Next Lesson
Lesson 4 of 4

Static and Shared Libraries

The difference between static and dynamic libraries, how to create them, and the trade-offs between them.

Have a question about this lesson?
Answers are generated by AI models and may not have been reviewed for accuracy