Generating Files with Protobuf and add_custom_command()

Learn to integrate code generators like Protobuf into your build process using add_custom_command(), creating automated file-level dependencies in CMake.

Greg Filak
Published

So far, we've operated under a key assumption: all our source files are written by hand. But what if they're not? In many modern C++ projects, a significant portion of the code isn't written by developers; it's generated by other tools.

This lesson introduces the add_custom_command() command, CMake's primary mechanism for integrating these external tools into your build process. We'll learn how to use its OUTPUT argument to transform a static build into a dynamic one where source code is generated on the fly.

Case Study: Protocol Buffers (Protobuf)

To make this concrete, we'll use a real-world example. The applications we write often need to communicate with other programs. Or, if we were making a chat program or a multiplayer game, it may need to communicate with other instances of our program, being run by a different user.

Solving this is a three step process:

  • We serialize the data we want to send into a format where it can be transmitted
  • We send this data to the program we want to communicate with, usually over some network such as the internet
  • That other program deserializes it into a format it can natively understand, such as a C++ object.

Protocol Buffers, or Protobuf for short, is Google's language-neutral mechanism for helping with requirements such as these.

The workflow is as follows:

  1. You define your data structures (called "messages") in a simple text file with a .proto extension. These describe the type of data you need to send between programs
  2. You run the Protobuf compiler, protoc.
  3. protoc reads your .proto file and generates the code to help us implement our messaging features. Protobuf supports a wide range of languages but, for C++, it will use the familiar convention of generating header and source files.

These generated files contain all the boilerplate code for setting, getting, reading, and writing your data, saving you from a mountain of tedious work.

In these cases, the generated code becomes a dependency. If we change the definition file, the generated code must be recreated before our project can use it. Manually running the code generator is tedious and error-prone, so this is a perfect opportunity to apply what we've learned in this chapter to automate it using CMake.

Setting Up the Protobuf Target

Let's integrate Protobuf into our Greeter project. We'll create a new, self-contained CMake target called DataModel whose job is to manage our Protobuf definitions and the code generated from them.

Step 1: Install Protobuf

First, we need to set up Protobuf as a dependency. The official documentation lists some installation options, but it's also available through both vcpkg and Conan.

We'll use in our examples. If you want to follow along using the same approach, a basic conanfile.txt and CMakePresets.json are below:

Files

conanfile.txt
CMakePresets.json
Select a file to view its content

Step 2: Create the .proto Definition

Next, we'll create the definition file for our data. We'll create a new datamodel directory and, inside it, a proto directory to hold our definition. Let's imagine our program needs to communicate information about a person, so we'll define a simple Person message type using protobuf's syntax:

datamodel/proto/person.proto

syntax = "proto3";

message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;
}

We're mostly focused on CMake and just using protobuf as an example. If you want to learn about the language, their official guide is a good introduction.

Step 3: Using Protobuf

Our program will create a message, serialize it to a string, and then send it. The receiving program will deserialize (or "parse") the message into a format it can use.

In our case, we'll simulate this by having the same program be both the sender and receiver of the message. In a realistic scenario, they'd be different programs, or different instances of the same program running on different machines. The serialized message (stored in a std::string in this example) would be transmitted between them, usually over a network.

Note that this file currently isn't compilable, as protobuf hasn't yet generated the person.pb.h header that defines the Person type we're trying to use here:

app/src/main.cpp

#include <iostream>
#include <string>
#include "person.pb.h" // This doesn't exist yet 

int main() {
  // Create a message
  Person p1;
  p1.set_name("Jane Doe");
  p1.set_id(1234);
  p1.set_email("jane.doe@example.com");

  // Sender serializes the message
  std::string serialized;
  p1.SerializeToString(&serialized);

  // Send the message
  // ...

  // Receiver deserializes and uses the message
  Person p2;
  p2.ParseFromString(serialized);
  std::cout << "Hello " << p2.name();

  return 0;
}

Step 4: Set up the CMakeLists.txt for the DataModel Library

Finally, let's create the CMakeLists.txt for our new datamodel component and add it to the root CMakeLists.txt. This is where we'll do the heavy lifting but, for now, we'll just make sure it can find the protobuf package:

datamodel/CMakeLists.txt

cmake_minimum_required(VERSION 3.21)

find_package(protobuf REQUIRED)

CMakeLists.txt

cmake_minimum_required(VERSION 3.21)
project(Greeter)

add_subdirectory(app)
add_subdirectory(datamodel)

With this setup, our project should now be able to generate our project. If you're following along using Conan and the presets provided above, this can be tested by running the following commands from the project root:

conan install . --output-folder=build --build=missing
cmake --preset conan

The build step will currently fail because person.pb.h doesn't exist:

cmake --build --preset conan
fatal error: person.pb.h: No such file or directory
    3 | #include "person.pb.h"
      |          ^~~~~~~~~~~~~~
compilation terminated.

Next, we'll write the CMake command to automatically generate this file as part of our build process.

Calling protoc using a Custom Command

To generate the files containing the source code we need for our Person type, we need to invoke the protoc compiler. We'll use CMake to automate this, with the help of add_custom_command().

The add_custom_command(OUTPUT ...) signature defines a rule to generate files. It also establishes a dependency in the build graph: if another target needs one of the output files, CMake will ensure that this command runs first to generate it.

Let's build the command piece by piece in datamodel/CMakeLists.txt.

Step 1: Define the Input and Output Files

Let's start by creating variables for the files we're working with. We already have our input person.proto file, which protoc will use to generate a person.pb.h header and a person.pb.cc source file:

datamodel/CMakeLists.txt

cmake_minimum_required(VERSION 3.21)

find_package(protobuf REQUIRED)

# Define the path to our input .proto file
set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/proto/person.proto")

# Define the paths for the generated output files
set(GENERATED_H "${CMAKE_CURRENT_BINARY_DIR}/person.pb.h")
set(GENERATED_CC "${CMAKE_CURRENT_BINARY_DIR}/person.pb.cc")

Step 2: Write the Command

Now we can construct the add_custom_command() call.

datamodel/CMakeLists.txt

cmake_minimum_required(VERSION 3.21)

find_package(protobuf REQUIRED)
set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/proto/person.proto")
set(GENERATED_H "${CMAKE_CURRENT_BINARY_DIR}/person.pb.h")
set(GENERATED_CC "${CMAKE_CURRENT_BINARY_DIR}/person.pb.cc")

add_custom_command(
  OUTPUT ${GENERATED_H} ${GENERATED_CC}
  DEPENDS ${PROTO_FILE}
  COMMENT "Using protobuf to generate C++ files from person.proto"
  COMMAND
    ${Protobuf_PROTOC_EXECUTABLE}
    --cpp_out=${CMAKE_CURRENT_BINARY_DIR}
    --proto_path=${CMAKE_CURRENT_SOURCE_DIR}/proto
    ${PROTO_FILE}
)

Let's break down each argument:

OUTPUT: This is the most important part. We use this to provide a list of one or more files that this command is expected to create.

DEPENDS: This lists the input files the command depends on. In our case, if person.proto is modified, the build system will need to re-run this command because the previously generated files are now stale. If it's unchanged, the command is skipped.

COMMENT: An optional string that explains what the command is doing. This may be printed to the console during the build, letting the user know what's happening.

COMMAND: The actual command to execute.

  • ${Protobuf_PROTOC_EXECUTABLE}: We use a variable provided by find_package(protobuf) to get the path to the protoc compiler.
  • --cpp_out: An argument to protoc telling it where to place the generated C++ files. We point it to our current build directory.
  • --proto_path: An argument to protoc telling it where to look for .proto files, similar to a C++ include path.
  • ${PROTO_FILE}: The final argument is the input file to process.

Integrating the Generated Files

We've defined the rule to generate the files, but we haven't told any of our C++ targets to actually use them. This is the final step.

Step 1: Create the DataModel Library

We'll create a library that encapsulates our generated code. This library will be responsible for compiling the generated .pb.cc file.

datamodel/CMakeLists.txt

cmake_minimum_required(VERSION 3.21)

find_package(protobuf REQUIRED)
set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/proto/person.proto")
set(GENERATED_H "${CMAKE_CURRENT_BINARY_DIR}/person.pb.h")
set(GENERATED_CC "${CMAKE_CURRENT_BINARY_DIR}/person.pb.cc")

add_custom_command(
  OUTPUT ${GENERATED_H} ${GENERATED_CC}
  DEPENDS ${PROTO_FILE}
  COMMENT "Using protobuf to generate C++ files from person.proto"
  COMMAND
    ${Protobuf_PROTOC_EXECUTABLE}
    --cpp_out=${CMAKE_CURRENT_BINARY_DIR}
    --proto_path=${CMAKE_CURRENT_SOURCE_DIR}/proto
    ${PROTO_FILE}
)

add_library(DataModel ${GENERATED_CC})

By listing ${GENERATED_CC} as a source file for DataModel, we have created the file-level dependency. CMake now knows: "To build DataModel, I first need the file person.pb.cc. I see a custom command that OUTPUTs that file, so I must run that command first."

Step 2: Set Usage Requirements

Any target that uses our DataModel library will need the location of the header files it generates - person.pb.h in our case.

This will be same location we passed to the --cpp_out argument to protoc, which was ${CMAKE_CURRENT_BINARY_DIR}.

Consumers of our DataModel will also have a dependency on the protobuf library itself, so we need to link against it. This library is called protobuf::libprotobuf, which we can confirm from the documentation or by checking the log output of the find_package() command.

We declare this include directory and link using target_include_directories() and target_link_libraries() as usual. Both need to be PUBLIC so they propagate to any consumers of our target:

datamodel/CMakeLists.txt

cmake_minimum_required(VERSION 3.21)

find_package(protobuf REQUIRED)
set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/proto/person.proto")
set(GENERATED_H "${CMAKE_CURRENT_BINARY_DIR}/person.pb.h")
set(GENERATED_CC "${CMAKE_CURRENT_BINARY_DIR}/person.pb.cc")

add_custom_command(
  OUTPUT ${GENERATED_H} ${GENERATED_CC}
  DEPENDS ${PROTO_FILE}
  COMMENT "Using protobuf to generate C++ files from person.proto"
  COMMAND
    ${Protobuf_PROTOC_EXECUTABLE}
    --cpp_out=${CMAKE_CURRENT_BINARY_DIR}
    --proto_path=${CMAKE_CURRENT_SOURCE_DIR}/proto
    ${PROTO_FILE}
)

add_library(DataModel ${GENERATED_CC})

target_include_directories(DataModel PUBLIC
  ${CMAKE_CURRENT_BINARY_DIR}
)

target_link_libraries(DataModel PUBLIC
  protobuf::libprotobuf
)

Now that we have all our protobuf plumbing abstracted away in our DataModel target, we just need to update our application to link against it:

app/CMakeLists.txt

cmake_minimum_required(VERSION 3.21)

add_executable(GreeterApp src/main.cpp)

target_link_libraries(GreeterApp PRIVATE DataModel)

Because we set the include directories and Protobuf library link as PUBLIC on DataModel, GreeterApp inherits them automatically. The app/CMakeLists.txt remains clean; it doesn't need to know any of the messy details about code generation. It just consumes the DataModel library.

Step 4: Building and Testing

Now, if we configure and build our project, everything should work seamlessly. If you're following along using Conan and the provided presets, the full build process will look like this from the project root:

conan install . --output-folder=build --build=missing
cmake --preset conan
cmake --build --preset conan

We can then run GreeterApp or GreeterApp.exe to confirm everything is working:

./build/app/GreeterApp.exe
Hello Jane Doe

The protobuf_generate() Command

This lesson walked through an example of using add_custom_command(OUTPUT) as it is the generic way of handling all file-generation tasks within builds.

However, some libraries offer convenient helpers that make them easier to integrate into CMake builds. Protobuf in particular includes the protobuf_generate() command that wraps the more verbose add_custom_command() invocations in a friendlier interface.

It also accepts our target as an argument, and will automatically provide it with the source files it generates.

A variation of our datamodel/CMakeLists.txt that uses this helper might look something like this:

datamodel/CMakeLists.txt

cmake_minimum_required(VERSION 3.21)

find_package(protobuf REQUIRED)

# Create the target first as protobuf_generate requires it
add_library(DataModel)

# This replaces our add_custom_command() call
protobuf_generate(
  TARGET DataModel
  LANGUAGE cpp
  PROTOS proto/person.proto
  IMPORT_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/proto
)

target_include_directories(DataModel PUBLIC
  ${CMAKE_CURRENT_BINARY_DIR}
)

target_link_libraries(DataModel PUBLIC
  protobuf::libprotobuf
)

Summary

The add_custom_command(OUTPUT ...) command is the primary way of integrating external to generate files within a CMake build.

  • Code Generation: It's helps reduce boilerplate and ensuring consistency, and Protobuf is a classic example.
  • File-Level Dependencies: The OUTPUT argument tells CMake what files the command creates, allowing it to build a dependency graph.
  • Integration: After defining the command, you integrate the generated files into a target just like any other source file using commands like add_library() , add_executable(), or target_sources() (which we cover later in the course).
  • Encapsulation: Wrapping complexity behind a friendly interface is a good practice. We can wrap the entire code generation process and its dependencies inside a self-contained library target.
Next Lesson
Lesson 43 of 47

Custom Targets and Build Utilities

Learn to create standalone, non-code targets using add_custom_target() for utility tasks like code formatting and documentation generation.

Have a question about this lesson?
Answers are generated by AI models and may not be accurate