Parsing Data using std::string

Learn how to read, parse, and use config files in your game using std::string and SDL_RWops

Ryan McCombe
Published

When we're loading text files into our program, we need the ability to programmatically understand that text data. This process is commonly called parsing the data.

In this lesson, we'll imagine the configuration of our program is provided in a serialized form - perhaps a file or network response - that looks like this:

WINDOW_TITLE: Example Window
WINDOW_WIDTH: 800
WINDOW_HEIGHT: 600
LEVELS: 1, 2, 3, 4

We want to deserialize this payload into C++ content, defined by a struct that looks like this:

struct Config {
  std::string WindowTitle;
  int WindowWidth;
  int WindowHeight;
  std::vector<int> Levels;
};

We'll cover the most important techniques to achieve results like this, using the standard library's std::string type.

Loading Content into a std::string

Behind the scenes, a std::string object manages a contiguous block of memory. This memory holds the collection of char values that make up the string.

The data() method of std::string returns the memory address where this block of memory starts:

std::string Content{"Hello"};
void* DataAddress{Content.data()};
std::cout << DataAddress;
000000C4E77BF760

In most use cases, we don't need to worry about this block of memory. In fact, the main benefit of the std::string class over something like a char* is that std::string mostly absolves us of the need to manage memory. The std::string class takes care of it for us.

For example, if we use a std::string method or operation that would cause the string to no longer fit within the allocated memory, the std::string class will automatically allocate a new, larger memory block, and automatically copy our string content to the new location.

The following program shows an example of this behavior:

#include <iostream>

int main() {
  std::string Content{"Hello"};
  std::cout << "Memory Location is       "
    << static_cast<void*>(Content.data());

  Content += "World";
  std::cout << "\nMemory Location is still "
    << static_cast<void*>(Content.data());

  Content += "This is too much content for the "
    "current memory location";
  std::cout << "\nString has moved to      "
    << static_cast<void*>(Content.data());
}
Memory Location is       0000007F5EFEFCA0
Memory Location is still 0000007F5EFEFCA0
String has moved to      000001D3BCF98150

However, when we're working with a string's underlying memory address (by accessing it through the data() method, for example), we're bypassing the string's automatic memory management. As such, responsibility falls back on us to ensure that whatever we're doing with that pointer isn't going to introduce memory-related bugs.

The most common use case for retrieving the data() pointer is that we want to write content directly to the location that the std::string is managing. Therefore, we need to ensure that the std::string has allocated enough capacity to accommodate the amount of data we're going to write into its memory.

  • We can find the current capacity of a std::string using the capacity() method.
  • We can increase the capacity by passing an integer to the reserve() method, which ensures our std::string can hold at least that number of characters, and usually slightly more.

Here's an example:

std::string Content{"Hello"};
std::cout << "Capacity: " << Content.capacity();
Content.reserve(100);
std::cout << "\nCapacity: " << Content.capacity();
Capacity: 15
Capacity: 111

Let's use these techniques alongside SDL_RWops to write the content of our config.txt file directly into a std::string:

SDL_RWops* File{SDL_RWFromFile(
  "Config.txt", "rb")};

// How many characters are in the file?
Sint64 Size{SDL_RWsize(File)};

// The string that we'll use to store
// our file's content
std::string Content;

// Update the string's capacity to ensure it
// can contain everything in the file
Content.reserve(Size);

// Populate the string with everything from
// the file
SDL_RWread(File, Content.data(), 1, Size);

// Cleanup
SDL_RWclose(File);

std::cout << "Content:\n" << Content;
Content:
WINDOW_TITLE: Example Window
WINDOW_WIDTH: 800
WINDOW_HEIGHT: 600
LEVELS: 1, 2, 3, 4

Finding Substrings

One of the main techniques we will need to understand the content of a string is to determine if the string contains some sequence of characters - that is, some other string. A specific string that is part of a larger string is sometimes called a substring.

We can find if our string contains a specific substring, and the position of that substring, using the find() method. Below, we search our "Hello World!" string for the "World" substring:

std::string Content{"Hello World!"};
size_t SubstringLocation{Content.find("World")};
std::cout << "Location: " << SubstringLocation;

We find that Content does contain the "World" substring, and the substring starts at position 6:

Location: 6

Like arrays, string indices start at 0, so this 6 value indicates that "World" begins at the 7th character.

In this example, we find where the first line break in our string is:

std::string Content{
  "WINDOW_TITLE: Example Window\n"
  "WINDOW_WIDTH : 800\n"
  "WINDOW_HEIGHT : 600\n"
  "LEVELS: 1, 2, 3, 4"};

std::cout << "First Line Break: "
  << Content.find('\n');
First Line Break: 28

If our string does not contain the substring we're searching for, the find() method will return a value equal to std::string::npos:

std::string Content{
  "WINDOW_TITLE: Example Window\n"
  "WINDOW_WIDTH : 800\n"
  "WINDOW_HEIGHT : 600\n"
  "LEVELS: 1, 2, 3, 4"};

if (Content.find("Hi") == std::string::npos) {
  std::cout << "Cannot find \"Hi\"";
}
Cannot find "Hi"

Finding Multiple Substrings

The find() method accepts a second optional argument, indicating where we want our search to begin. In the following example, we skip the first 5 characters in our search, allowing us to find the second "Hello" in the string:

std::string Content{"Hello World Hello"};

std::cout << "First Hello: "
  << Content.find("Hello");

std::cout << "\nNext Hello: "
  << Content.find("Hello", 5);
First Hello: 0
Next Hello: 12

This is primarily useful when we want to find every instance of a substring within our larger string. Below, we continuously call the find() method in a loop, until find() returns std::string::npos.

On every successful find(), we update our Start value with the index of the previous find, allowing us to advance through the string:

std::string Content{
  "WINDOW_TITLE: Example Window\n"
  "WINDOW_WIDTH : 800\n"
  "WINDOW_HEIGHT : 600\n"
  "LEVELS: 1, 2, 3, 4"
};

size_t Start;
size_t LineBreak{Content.find("\n")};

while (LineBreak != std::string::npos) {
  std::cout << "Line Break at "
    << LineBreak << '\n';
  Start = LineBreak + 1;
  LineBreak = Content.find("\n", Start);
}
Line Break at 28
Line Break at 47
Line Break at 67

Creating Substrings

Beyond finding substrings, it can be helpful to create a new std::string based on the contents of some other std::string. We can do this using the substr() method, passing two arguments:

  • The position within the original string that we want our substring to begin
  • How many characters we want the substring to contain

Below, we create a substring by starting at position 6 of our Content, and copying 5 characters to our new string:

std::string Content{"Hello World!"};
std::string Substring{Content.substr(6, 5)};
std::cout << Substring;
World

The second argument is optional. If we omit it, our substring will continue until the end of the original string:

std::string Content{"Hello World!"};
std::string Substring{Content.substr(6)};
std::cout << Substring;
World!

Example: Combining find() and substr()

It's common for the arguments we pass to substr() to be determined based on the value returned by a find() call.

Below, we have a string comprising two lines. We find where the line break \n is, and use this index to create substrings containing the first and second lines of our original content:

std::string Content{
  "WINDOW_TITLE: Example Window\n"
  "WINDOW_WIDTH : 800"};

size_t LineBreak{Content.find('\n')};
if (LineBreak != std::string::npos) {
  std::string Line1{Content.substr(0, LineBreak)};
  std::cout << "Line 1: " << Line1;
  std::string Line2{Content.substr(LineBreak + 1)};
  std::cout << "\nLine 2: " << Line2;
}
Line 1: WINDOW_TITLE: Example Window
Line 2: WINDOW_WIDTH : 800

Example: Splitting Strings into Lines

Below, we expand our previous example to split a string into a dynamic number of substrings, based on how many line breaks our content has:

std::string Content{
  "WINDOW_TITLE: Example Window\n"
  "WINDOW_WIDTH : 800\n"
  "WINDOW_HEIGHT : 600\n"
  "LEVELS: 1, 2, 3, 4"
};

size_t Start{0};
size_t End{Content.find("\n", Start)};
while (End != std::string::npos) {
  std::cout << "Line: "
    << Content.substr(Start, End - Start)
    << '\n';

  Start = End + 1;
  End = Content.find("\n", Start);
}

// There are no more line breaks, but we haven't
// printed the final line yet - let's do it
std::cout << "Final Line: "
  << Content.substr(Start);
Line: WINDOW_TITLE: Example Window
Line: WINDOW_WIDTH : 800
Line: WINDOW_HEIGHT : 600
Final Line: LEVELS: 1, 2, 3, 4

Example: Splitting Strings into Keys and Values

Each line of our sample content contains a key and value pair formatted as KEY: VALUE. We can split these lines into the key and value substrings by finding the ": " delimiter:

std::string Content{
  "WINDOW_TITLE: Example Window"
};

size_t Delim{Content.find(": ")};
if (Delim == std::string::npos) {
  std::cout << "Invalid Input";
  return 0;
};

std::string Key{Content.substr(0, Delim)};
std::string Value{Content.substr(Delim + 2)};

std::cout << "Key: " << Key
  << "\nValue: " << Value;
Key: WINDOW_TITLE
Value: Example Window

Example: Splitting Comma-Separated Strings

Finally, one of the keys in our sample content contains a comma-separated list of integers: 1,2,3,4. Here's one way we can parse such a string into its individual parts:

std::string Content{"1,2,3,4"};
size_t Start{0};
size_t End {Content.find(",")};

while (End != std::string::npos) {
  std::cout << "Value: "
    << Content.substr(Start, End - Start)
    << '\n';
  Start = End + 1;
  End = Content.find(",", Start);
}
std::cout << "Final Value: "
  << Content.substr(Start, End - Start);
Value: 1
Value: 2
Value: 3
Final Value: 4

This is just a basic overview of string handling that solves some of the more common problems. In our advanced course, we have a dedicated chapter on text processing.

This includes a deeper overview of the std::string class, as well as more advanced techniques such as regular expressions, which provide unlimited flexibility for processing and understanding text content.

The first lesson in that chapter is available here:

Characters, Unicode and Encoding

An introduction to C++ character types, the Unicode standard, character encoding, and C-style strings

Example: Loading a Configuration File

Let's combine these techniques to load our configuration file config.txt into a C++ Config struct. Our configuration file looks like this:

// config.txt
WINDOW_TITLE: Example Window
WINDOW_WIDTH: 800
WINDOW_HEIGHT: 600
LEVELS: 1, 2, 3, 4

We'll add a public Load() method to our struct, which accepts the path to the file to load as an argument:

// Config.h
#pragma once
#include <SDL.h>
#include <iostream>
#include <string>
#include <vector>

struct Config {
  std::string WindowTitle;
  int WindowWidth;
  int WindowHeight;
  std::vector<int> Levels;

  void Load(const std::string& Path) {
    // ...
  }
};

We'll add two private helper methods to break up our load process into multiple steps. The ReadFile() will load the file into a std::string and return it. The ParseConfig() function will then receive this string, and use it to update our data members:

// Config.h
// ...

struct Config {
  std::string WindowTitle;
  int WindowWidth;
  int WindowHeight;
  std::vector<int> Levels;

  void Load(const std::string& Path) {
    std::string Content{ReadFile(Path)};
    if (Content.empty()) return;
    ParseConfig(Content);
  }

 private:
  std::string ReadFile(const std::string& Path) {
    // TODO
    return "";
  }

  void ParseConfig(const std::string& Content) {
    // TODO
  }
};

Our ReadFile() function will use the techniques we covered earlier, loading our content into a std::string and returning it:

// Config.h
// ...

struct Config {
  // ...
 private:
  std::string ReadFile(const std::string& Path) {
    SDL_RWops* File{SDL_RWFromFile(
      Path.c_str(), "rb")};
    if (!File) {
      std::cout << "Failed to open config file: "
        << SDL_GetError() << "\n";
      return "";
    }

    Sint64 Size{SDL_RWsize(File)};
    std::string Content;
    Content.reserve(Size);
    SDL_RWread(File, Content.data(), 1, Size);
    SDL_RWclose(File);
    return Content;
  }
};

Our ParseConfig() function will find each line of text within the string, by finding \n characters in a loop. We'll create a new std::string for each line in our original std::string using the substr() method. We'll then hand it off to the ProcessLine() function:

// Config.h
// ...

struct Config {
  // ...
 private:
  void ParseConfig(const std::string& Content) {
    size_t Start{0};
    size_t End{Content.find("\n", Start)};
    while (End != std::string::npos) {
      ProcessLine(Content.substr(
        Start, End - Start));
      Start = End + 1;
      End = Content.find("\n", Start);
    }
    
    // There are no more line breaks, but we
    // still need to process the last line
    ProcessLine(Content.substr(Start));
  }

  void ProcessLine(const std::string& Line) {
    // ...
  }
};

Our ProcessLine() function will split each line into a Key and Value substring, based on the position of the ": " substring within the line.

Then, based on the Key, we'll update the appropriate variable using the Value substring. For numeric types, we'll use std::stoi() to convert the string to an integer. For the Levels array, we'll offload the work to our final helper function, ParseLevels().

// Config.h
// ...

struct Config {
  // ...
 private:
  void ProcessLine(const std::string& Line) {
    size_t Delim{Line.find(": ")};
    if (Delim == std::string::npos) return;

    std::string Key{Line.substr(0, Delim)};
    std::string Value{Line.substr(Delim + 2)};

    if (Key == "WINDOW_TITLE") {
      WindowTitle = Value;
    } else if (Key == "WINDOW_WIDTH") {
      WindowWidth = std::stoi(Value);
    } else if (Key == "WINDOW_HEIGHT") {
      WindowHeight = std::stoi(Value);
    } else if (Key == "LEVELS") {
      ParseLevels(Value);
    }
  }

  void ParseLevels(const std::string& Value) {
    // TODO
  }
};

Our ParseLevels() function will start by clearing out existing values from our Levels array using the clear() method.

Then, we'll iterate through our levels string to find all of the , characters, and create substrings based on these positions.

We'll convert these numeric substrings to integers using the std::stoi() function, and push them onto our Levels array using the push_back() method:

// Config.h
// ...

struct Config {
  // ...
 private:
  void ParseLevels(const std::string& Value) {
    Levels.clear();
    size_t Start{0};
    size_t End{Value.find(",", Start)};
    
    while (End != std::string::npos) {
      Levels.push_back(std::stoi(Value.substr(
        Start, End - Start)));
      Start = End + 1;
      End = Value.find(",", Start);
    }
    
    // There are no more commas, but we still need
    // to push the final number to our array
    Levels.push_back(std::stoi(
      Value.substr(Start)));
  }
};

Our final class is given below, and we've also included an example of using it within a basic SDL application.

Summary

In this lesson, we learned how to load a configuration file into a C++ struct. We used SDL_RWops to read the file contents into a std::string, and then applied various string methods to parse the data. Here are the key takeaways:

  • We used SDL_RWFromFile() to open the config file.
  • We determined the file size with SDL_RWsize().
  • We used std::string::find() and std::string::substr() to locate and extract keys and values.
  • We used std::stoi to convert string values to integers.
  • We created a Config struct to encapsulate these capabilities and hold our parsed settings.
Next Lesson
Lesson 81 of 129

Writing Data to Files

Learn to write and append data to files using SDL2's I/O functions.

Questions & Answers

Answers are generated by AI models and may not have been reviewed. Be mindful when running any code on your device.

Moving Past Delimiters
Why do we use Start = End + 1 to move to the next position after finding a delimiter (like \n or ,)?
Trimming Extra Spaces
What if a key in our config file has multiple spaces before or after it, such as " KEY : VALUE"? How can we trim leading/trailing spaces from the key and value?
Config from Network Stream
What changes would be needed to load the config from a different source, like a network stream, instead of a file?
Saving Config Changes
How could we extend the Config class to allow writing the configuration back to a file, effectively saving changes made in-game?
Implementing a Default Config
How could we implement a default configuration that is used if the config file is missing or invalid?
Error Line Numbers
When we encounter an error parsing a file, how can we output an error message that includes where the error occurred - for example, the line number within the file?
Or Ask your Own Question
Purchase the course to ask your own questions