Padding and Alignment
Learn how memory alignment affects data serialization and how to handle it safely
When we write code, we often think about memory as a simple sequence of bytes. However, modern processors work with memory in larger chunks for efficiency. Two key concepts drive this behavior: cache lines and memory pages.
Cache lines, typically 64 bytes, are the smallest unit of data that can be transferred between the CPU cache and main memory. Similarly, memory pages are the smallest unit managed by the operating system's virtual memory system.
For optimal performance, we typically want our data to be aligned to minimise the frequency with which a single value crosses one of these boundaries. An example of a boundary cross might be a 4-byte integer where it's first two bytes are at the end of one cache line, and the last two bytes are at the start of the next.
The boundary between our two cache lines might look like the following, where X
represents the integer we're interested in, and A
and B
represent other arbitrary variables:
Line 1 | Line 2
A A X X | X X B B
Most systems handle this scenario gracefully - they perform multiple reads to grab both blocks of memory, then take the appropriate bytes from each and combine them to to reconstruct our integer X
.
However, this comes at a performance cost. Instead, we want to align our data to maximise the chances that it is stored entirely within the same cache line or page, eliminating the need for this additional processing.
Aligning data means we simply add additional bytes in strategic positions within the memory layout of our objects. These bytes, which contain no useful data and exist only to push subsequent bytes into later memory addresses, are called padding.
We could align our previous structure by adding 2 bytes of padding after A
, thereby pushing X
entirely onto the next line, where it can be accessed in a single read operation.
We'll represent padding by underscores, _
, and the boundary between our cache lines would now look like this:
Line 1 | Line 2
A A _ _ | X X X X B B
Alignment and Padding
Let's see an example where our compiler will likely intervene, adding some padding to achieve a specific alignment:
#include <iostream>
struct MyStruct {
char A; // 1 byte
int B; // 4 bytes
};
int main() {
std::cout << sizeof(MyStruct) << " bytes";
}
Given instances of MyStruct
require 1 byte for the char
and 4
for the int
, we might expect the overall size to be 5
bytes. However, in most scenarios, 3 bytes of padding are added to objects of this type, bringing their total size to 8
:
8 bytes
This additional padding is added to ensure the B
integer is placed in its natural alignment - that is, a memory address divisible by 4.
As such, we can imagine the memory layout of an instance of MyStruct
looking like the following, where we have 1 byte assigned to storing the char
called A
, followed by 3 bytes of padding, and finally 4 bytes assigned to the int
value B
:
A _ _ _ B B B B
Natural alignment refers to placing data at memory addresses that match the size of the data type - 32-bit integers are typically aligned to 4-byte boundaries, 64-bit doubles to 8-byte boundaries, and so on.
This alignment strategy comes from the CPU's memory access patterns: modern processors are designed to read data most efficiently when it's placed at these aligned addresses.
This typically allows them to fetch the entire value in a single operation, rather than the more expensive process of multiple memory reads and then reconstructing the required value by combining them.
Let's see another example, where we simply reorder the A
and B
members within our struct definition:
#include <iostream>
struct MyStruct {
int B; // 4 bytes
char A; // 1 byte
};
int main() {
std::cout << sizeof(MyStruct) << " bytes";
}
Perhaps surprisingly, the compiler adds 3 bytes of padding here too:
8 bytes
In this case, the padding is added to the end of our memory layout. It looks like this:
B B B B A _ _ _
The primary reason for this padding is to deal with the common scenario where multiple instances of our objects are stored contigously in memory, such as in a std::vector<MyStruct>
.
In that context, the memory layout of two objects in an array looks like this:
B B B B A _ _ _ B B B B A _ _ _
The additional padding was added to maintain alignment in scenarios like this. The B
integer in the first object is correctly aligned to byte offset 0
, whilst the B
in the second object is correctly aligned to byte offset 8
, and so on.
Packing
We can order the members of our type to make more efficient use of memory. That is, to reduce the amount of padding the compiler requires to maintain alignment.
For example, let's consider the following struct:
#include <iostream>
struct MyStruct {
char A; // 1 byte
int B; // 4 bytes
char C; // 1 byte
};
int main() {
std::cout << sizeof(MyStruct) << " bytes";
}
Objects of this type only contain 6 bytes of useful data. However, to correctly align the integer B
(including for the array context), 6 additional bytes of padding are required, taking its size to 12:
12 bytes
The memory layout of an instance of this struct looks like this:
A _ _ _ B B B B C _ _ _
By reordering our members, we can pack memory more efficiently. The following version of MyStruct
contains all the same data, but only requires 8 bytes of storage:
#include <iostream>
struct MyStruct {
int B; // 4 bytes
char A; // 1 byte
char C; // 1 byte
};
int main() {
std::cout << sizeof(MyStruct) << " bytes";
}
8 bytes
This is more efficient because only 2 bytes of padding are required to align the integer for use in arrays:
B B B B A C _ _
Serializing with Padding
As we might expect, these padding and alignment behaviors have implications when it comes to serializing our objects. If we're not mindful that these "gaps" exist between our variables, our serialization and deserialization code can contain serious bugs and result in data loss.
Below, we attempt to serialize MyStruct
without being aware that padding is added between A
and B
. We assume, therefore, that writing 5 bytes will capture all of the data:
#include <SDL.h>
#include <iostream>
struct MyStruct {
char A;
int B;
};
int main(int argc, char** argv) {
SDL_RWops* rw{
SDL_RWFromFile("example.bin", "wb")};
if (!rw) {
std::cerr << "Failed to open file: "
<< SDL_GetError();
return 1;
}
MyStruct Serialized{'A', 42};
// Assume MyStruct is 5 bytes
SDL_RWwrite(rw, &Serialized, 1, 5);
SDL_RWclose(rw);
std::cout << "Serialized: A = "
<< Serialized.A
<< ", B = " << Serialized.B;
return 0;
}
Original: A = A, B = 42
If we later read this file using the same assumptions, we'll see our B
integer doesn't have the correct value:
#include <SDL.h>
#include <iostream>
struct MyStruct {
char A;
int B;
};
int main(int argc, char** argv) {
SDL_RWops* rw{
SDL_RWFromFile("example.bin", "rb")};
if (!rw) {
std::cerr << "Failed to open file: "
<< SDL_GetError();
return 1;
}
MyStruct Deserialized;
SDL_RWread(rw, &Deserialized, 1, 5);
SDL_RWclose(rw);
std::cout << "Deserialized: A = "
<< Deserialized.A
<< ", B = " << Deserialized.B;
return 0;
}
Deserialized: A = A, B = -859045846
To solve this problem, we need to approach alignment of class and struct instances differently.
Adding Save and Load Methods
The standard way of serializing and deserializing objects in a way that respects alignment across a variety of platforms is to handle their data members as individual values.
Rather than serializing a MyStruct
object in a single operation, we'd serialize each of its variables individually. In large programs, this is typically done by adding dedicated serialization and deserialization methods to our class or struct:
// MyStruct.h
#pragma once
#include <iostream>
#include <SDL.h>
class MyStruct {
public:
char A;
int B;
void Save(const std::string& path) const {
SDL_RWops* Handle{SDL_RWFromFile(
path.c_str(), "wb")};
if (!Handle) {
std::cout << "Error opening file: "
<< SDL_GetError();
return;
}
SDL_RWwrite(Handle, &A, sizeof(char), 1);
SDL_RWwrite(Handle, &B, sizeof(int), 1);
SDL_RWclose(Handle);
}
void Load(const std::string& path) {
SDL_RWops* Handle{SDL_RWFromFile(
path.c_str(), "rb")};
if (!Handle) {
std::cout << "Error opening file: "
<< SDL_GetError();
return;
}
SDL_RWread(Handle, &A, sizeof(char), 1);
SDL_RWread(Handle, &B, sizeof(int), 1);
SDL_RWclose(Handle);
}
};
Elsewhere in our program, we can now instruct MyObject
instances to save their state to a file using the Save()
method, or load their state from a file using the Load()
method:
#include <iostream>
#include "MyStruct.h"
int main(int argc, char** argv) {
MyStruct MyObject{'A', 42};
MyObject.Save("example.bin");
std::cout << "Serialized: A = "
<< MyObject.A << ", B = " << MyObject.B;
MyObject.Load("example.bin");
std::cout << "\nDeserialized: A = "
<< MyObject.A << ", B = " << MyObject.B;
return 0;
}
Serialized: A = A, B = 42
Deserialized: A = A, B = 42
Summary
In this lesson, we've seen how memory alignment affects our C++ programs and why padding is necessary. Understanding these concepts helps us write more efficient code and avoid common pitfalls when working with data serialization.
Key takeaways:
- Alignment requirements come from hardware design
- Padding maintains proper alignment
- Structure layout affects memory usage
- Careful serialization is essential
- Tools exist for custom alignment control
Read/Write Offsets and Seeking
Learn how to manipulate the read/write offset of an SDL_RWops
object to control stream interactions.