Padding and Alignment

Learn how memory alignment affects data serialization and how to handle it safely

Ryan McCombe
Published

When we write code, we often think about memory as a simple sequence of bytes. However, modern processors work with memory in larger chunks for efficiency. Two key concepts drive this behavior: cache lines and memory pages.

Cache lines, typically 64 bytes, are the smallest unit of data that can be transferred between the CPU cache and main memory. Similarly, memory pages are the smallest unit managed by the operating system's virtual memory system.

For optimal performance, we typically want our data to be aligned to minimise the frequency with which a single value crosses one of these boundaries. An example of a boundary cross might be a 4-byte integer where it's first two bytes are at the end of one cache line, and the last two bytes are at the start of the next.

The boundary between our two cache lines might look like the following, where X represents the integer we're interested in, and A and B represent other arbitrary variables:

Line 1  | Line 2
A A X X | X X B B

Most systems handle this scenario gracefully - they perform multiple reads to grab both blocks of memory, then take the appropriate bytes from each and combine them to to reconstruct our integer X.

However, this comes at a performance cost. Instead, we want to align our data to maximise the chances that it is stored entirely within the same cache line or page, eliminating the need for this additional processing.

Aligning data means we simply add additional bytes in strategic positions within the memory layout of our objects. These bytes, which contain no useful data and exist only to push subsequent bytes into later memory addresses, are called padding.

We could align our previous structure by adding 2 bytes of padding after A, thereby pushing X entirely onto the next line, where it can be accessed in a single read operation.

We'll represent padding by underscores, _, and the boundary between our cache lines would now look like this:

Line 1  | Line 2
A A _ _ | X X X X B B

Alignment and Padding

Let's see an example where our compiler will likely intervene, adding some padding to achieve a specific alignment:

#include <iostream>

struct MyStruct {
  char A; // 1 byte
  int B; // 4 bytes
};

int main() {
  std::cout << sizeof(MyStruct) << " bytes";
}

Given instances of MyStruct require 1 byte for the char and 4 for the int, we might expect the overall size to be 5 bytes. However, in most scenarios, 3 bytes of padding are added to objects of this type, bringing their total size to 8:

8 bytes

This additional padding is added to ensure the B integer is placed in its natural alignment - that is, a memory address divisible by 4.

As such, we can imagine the memory layout of an instance of MyStruct looking like the following, where we have 1 byte assigned to storing the char called A, followed by 3 bytes of padding, and finally 4 bytes assigned to the int value B:

A _ _ _ B B B B

Natural alignment refers to placing data at memory addresses that match the size of the data type - 32-bit integers are typically aligned to 4-byte boundaries, 64-bit doubles to 8-byte boundaries, and so on.

This alignment strategy comes from the CPU's memory access patterns: modern processors are designed to read data most efficiently when it's placed at these aligned addresses.

This typically allows them to fetch the entire value in a single operation, rather than the more expensive process of multiple memory reads and then reconstructing the required value by combining them.

Let's see another example, where we simply reorder the A and B members within our struct definition:

#include <iostream>

struct MyStruct {
  int B; // 4 bytes
  char A; // 1 byte
};

int main() {
  std::cout << sizeof(MyStruct) << " bytes";
}

Perhaps surprisingly, the compiler adds 3 bytes of padding here too:

8 bytes

In this case, the padding is added to the end of our memory layout. It looks like this:

B B B B A _ _ _

The primary reason for this padding is to deal with the common scenario where multiple instances of our objects are stored contigously in memory, such as in a std::vector<MyStruct>.

In that context, the memory layout of two objects in an array looks like this:

B B B B A _ _ _ B B B B A _ _ _

The additional padding was added to maintain alignment in scenarios like this. The B integer in the first object is correctly aligned to byte offset 0, whilst the B in the second object is correctly aligned to byte offset 8, and so on.

Packing

We can order the members of our type to make more efficient use of memory. That is, to reduce the amount of padding the compiler requires to maintain alignment.

For example, let's consider the following struct:

#include <iostream>

struct MyStruct {
  char A; // 1 byte
  int B;  // 4 bytes
  char C; // 1 byte
};

int main() {
  std::cout << sizeof(MyStruct) << " bytes";
}

Objects of this type only contain 6 bytes of useful data. However, to correctly align the integer B (including for the array context), 6 additional bytes of padding are required, taking its size to 12:

12 bytes

The memory layout of an instance of this struct looks like this:

A _ _ _ B B B B C _ _ _

By reordering our members, we can pack memory more efficiently. The following version of MyStruct contains all the same data, but only requires 8 bytes of storage:

#include <iostream>

struct MyStruct {
  int B;  // 4 bytes
  char A; // 1 byte
  char C; // 1 byte
};

int main() {
  std::cout << sizeof(MyStruct) << " bytes";
}
8 bytes

This is more efficient because only 2 bytes of padding are required to align the integer for use in arrays:

B B B B A C _ _

Serializing with Padding

As we might expect, these padding and alignment behaviors have implications when it comes to serializing our objects. If we're not mindful that these "gaps" exist between our variables, our serialization and deserialization code can contain serious bugs and result in data loss.

Below, we attempt to serialize MyStruct without being aware that padding is added between A and B. We assume, therefore, that writing 5 bytes will capture all of the data:

#include <SDL.h>
#include <iostream>

struct MyStruct {
  char A;
  int B;
};

int main(int argc, char** argv) {
  SDL_RWops* rw{
    SDL_RWFromFile("example.bin", "wb")};
  if (!rw) {
    std::cerr << "Failed to open file: "
      << SDL_GetError();
    return 1;
  }

  MyStruct Serialized{'A', 42};
  
  // Assume MyStruct is 5 bytes
  SDL_RWwrite(rw, &Serialized, 1, 5);
  SDL_RWclose(rw);
  
  std::cout << "Serialized: A = "
    << Serialized.A
    << ", B = " << Serialized.B;

  return 0;
}
Original: A = A, B = 42

If we later read this file using the same assumptions, we'll see our B integer doesn't have the correct value:

#include <SDL.h>
#include <iostream>

struct MyStruct {
  char A;
  int B;
};

int main(int argc, char** argv) {
  SDL_RWops* rw{
    SDL_RWFromFile("example.bin", "rb")};
  if (!rw) {
    std::cerr << "Failed to open file: "
      << SDL_GetError();
    return 1;
  }

  MyStruct Deserialized;
  SDL_RWread(rw, &Deserialized, 1, 5);
  SDL_RWclose(rw);
  
  std::cout << "Deserialized: A = "
    << Deserialized.A
    << ", B = " << Deserialized.B;
  return 0;
}
Deserialized: A = A, B = -859045846

To solve this problem, we need to approach alignment of class and struct instances differently.

Adding Save and Load Methods

The standard way of serializing and deserializing objects in a way that respects alignment across a variety of platforms is to handle their data members as individual values.

Rather than serializing a MyStruct object in a single operation, we'd serialize each of its variables individually. In large programs, this is typically done by adding dedicated serialization and deserialization methods to our class or struct:

// MyStruct.h
#pragma once
#include <iostream>
#include <SDL.h>

class MyStruct {
 public:
  char A;
  int B;

  void Save(const std::string& path) const {
    SDL_RWops* Handle{SDL_RWFromFile(
      path.c_str(), "wb")};

    if (!Handle) {
      std::cout << "Error opening file: "
        << SDL_GetError();
      return;
    }

    SDL_RWwrite(Handle, &A, sizeof(char), 1);
    SDL_RWwrite(Handle, &B, sizeof(int), 1);

    SDL_RWclose(Handle);
  }

  void Load(const std::string& path) {
    SDL_RWops* Handle{SDL_RWFromFile(
      path.c_str(), "rb")};

    if (!Handle) {
      std::cout << "Error opening file: "
        << SDL_GetError();
      return;
    }

    SDL_RWread(Handle, &A, sizeof(char), 1);
    SDL_RWread(Handle, &B, sizeof(int), 1);

    SDL_RWclose(Handle);
  }
};

Elsewhere in our program, we can now instruct MyObject instances to save their state to a file using the Save() method, or load their state from a file using the Load() method:

#include <iostream>
#include "MyStruct.h"

int main(int argc, char** argv) {
  MyStruct MyObject{'A', 42};
  MyObject.Save("example.bin");
  std::cout << "Serialized: A = "
    << MyObject.A << ", B = " << MyObject.B;

  MyObject.Load("example.bin");
  std::cout << "\nDeserialized: A = "
    << MyObject.A << ", B = " << MyObject.B;

  return 0;
}
Serialized: A = A, B = 42
Deserialized: A = A, B = 42

Summary

In this lesson, we've seen how memory alignment affects our C++ programs and why padding is necessary. Understanding these concepts helps us write more efficient code and avoid common pitfalls when working with data serialization.

Key takeaways:

  • Alignment requirements come from hardware design
  • Padding maintains proper alignment
  • Structure layout affects memory usage
  • Careful serialization is essential
  • Tools exist for custom alignment control
Next Lesson
Lesson 85 of 129

Read/Write Offsets and Seeking

Learn how to manipulate the read/write offset of an SDL_RWops object to control stream interactions.

Questions & Answers

Answers are generated by AI models and may not have been reviewed. Be mindful when running any code on your device.

Why Computer Memory Needs Padding
Why do we need padding at all? Can't the computer just read the bytes it needs?
Consequences of Misaligned Memory
What happens if we try to read memory that isn't properly aligned?
Memory Impact of Padding
Does padding waste a lot of memory in real programs?
Understanding Natural Alignment
How do I know what the natural alignment of a type should be?
CPU Architecture and Alignment
Do different CPU architectures handle alignment differently?
Alignment with Virtual Functions
How does alignment work with inheritance and virtual functions?
Understanding Union Alignment
How does alignment work with unions?
SIMD and Memory Alignment
How does alignment work with SIMD instructions?
Or Ask your Own Question
Purchase the course to ask your own questions