Working with C-Style Strings

A guide to working with and manipulating C-style strings, using the cstring header file
This lesson is part of the course:

Professional C++

Comprehensive course covering advanced concepts, and how to use them on large-scale projects.

DreamShaper_v7_hipster_Sidecut_hair_modest_clothes_fully_cloth_3 (1).jpg
Ryan McCombe
Ryan McCombe
Posted

In this lesson, we’ll focus on some helper functions that can help us work with C-style strings. In most cases, we should not use C-Style strings. Even with these helper functions, they are still unpleasant to work with and have many potential pitfalls.

More modern approaches such as std::string are typically easier and safer to work with, and they come with more capabilities out of the box.

However, C-style strings remain in common use with external libraries and tools that we will integrate with. Therefore, a basic knowledge of what they are and how they work is extremely useful.

Character Arrays

In the previous lesson, we covered how C-strings (and other null-terminated strings) are stored in an array-like layout in memory. A C-style string is typically a pointer - a char* to the first character in the array:

Diagram showing a character pointer pointing at a c-string in memory

This gives us another way to create C-style strings. We can use the C-style array syntax:

int main(){
  char MyString[6]{"Hello"};
}

This allows us to specify how much memory is available for the string, remembering to include space for the null terminator.

The ability to specify the size is useful if we aren’t going to provide an initial value for the string, or if we’re going to modify the string to be longer later in our application.

The array form of strings (char[]) and the pointer form (char*) are mostly interchangeable. This is perhaps not surprising if we recall that C-style arrays aren’t that different from a pointer to the first element, and often decay to exactly that.

But, there are some differences. Most notably, the array form will allocate the characters onto the stack by default. We can change that in the usual ways, with the new keyword and/or smart pointers:

#include <memory>

int main(){
  // Dynamic allocation of string
  auto MyStringA{std::make_unique<char[]>(50)};

  // Dynamic allocation of string with initial value
  std::unique_ptr<char[]> MyStringB{
    new char[50]("Hello")};
}

The <cstring> Header

The standard library includes a collection of functions that can be useful for working with C-style strings.

They’re available by including the <cstring> header file.

#include <cstring>

The full collection of utilities available can be seen from a standard library reference, such as cppreference.com. but we’ll cover the most useful ones below, alongside some issues and pitfalls to consider.

Getting the String Length using strlen()

The strlen() function returns the number of characters in a C-style string, not counting the null terminator:

#include <iostream>
#include <cstring>

int main(){
  const char* MyString{"Hello"};
  std::cout << "String Length: "
    << strlen(MyString);
}
String Length: 5

Note, when using a character array, the length of the string is not necessarily the same as the length of the array.

The array may have additional space, after the string’s null terminator. strlen() returns the length of the string, not the length of the array:

#include <iostream>
#include <cstring>

int main(){
  char MyString[100]{"Hello"};
  std::cout << "Array Length: "
    << sizeof(MyString) / sizeof(char);
  std::cout << "\nString Length: "
    << strlen(MyString);
}
Array Length: 100
String Length: 5

Comparing Strings using strcmp()

The strcmp() function accepts two strings as arguments. It then compares them alphabetically, ie, in dictionary order. This is also sometimes referred to as lexical order or lexicographic order.

strcmp() will return an integer, which we should interpret in the following way:

  • If the integer is negative, the first string comes before the second string in the dictionary order
  • If the integer is zero, the two strings are the same
  • If the integer is positive, the first string comes after the second string in the dictionary order

In this example, we do a simple comparison to check if two strings are equal:

#include <cstring>
#include <iostream>

int main(){
  const char* A{"Bear"};
  const char* B{"Bear"};
  const char* C{"Zebra"};

  if (strcmp(A, B) == 0) {
    std::cout << "A and B are equal";
  }

  if (strcmp(B, C) != 0) {
    std::cout << "\nB and C are not equal\n";
  }

  if (strcmp(B, C) < 0) {
    std::cout << B << " comes before " << C;
  }
}
A and B are equal
B and C are not equal
Bear comes before Zebra

Here, we use the strcmp() function as a predicate for the std::ranges::sort algorithm, to put a collection of strings in alphabetical order:

#include <algorithm>
#include <cstring>
#include <iostream>
#include <vector>

int main(){
  std::vector Animals{
    "Bear", "Zebra", "Chicken", "Alligator"};

  auto Predicate{
    [](const char* A, const char* B){
      return strcmp(A, B) < 0;
    }
  };

  std::ranges::sort(Animals, Predicate);

  for (const char* Animal : Animals) {
    std::cout << Animal << '\n';
  }
}
Alligator
Bear
Chicken
Zebra

We introduced std::ranges::sort in our algorithm section earlier in this course:

Concatenating Strings using strcat_s()

The strcat_s() function appends one C-style string onto another. Combining strings in this way is referred to as concatenation.

The function takes care of copying the characters for us and repositions the null terminator into the correct place.

However, we are responsible for ensuring the string has enough surplus memory to store what we’re concatenating onto it.

#include <iostream>
#include <cstring>

int main(){
  char MyString[50]{"Hello"};
  strcat_s(MyString, " World");
  std::cout << MyString;
}
Hello World

The strcat_s() function replaces an earlier strcat() function, which was deprecated for safety reasons. Specifically, it could be used in cases where the destination did not have enough memory to store everything we were concatenating.

This resulted in buffer overflow issues, where our program could corrupt the memory that falls after our string.

The original strcat() function was not doing enough to prevent this from happening, so it was replaced with the strcat_s() function.

strcat_s() includes additional checks to ensure the destination has enough space to perform the requested concatenation.

In many cases, the compiler can determine the size of the destination automatically, but not always. Given the propensity of C-style arrays to lose track of their size by decaying to a pointer, we sometimes have to intervene.

If the compiler cannot determine the size of the destination automatically, it will throw an error. We can provide the size manually using an alternative function signature, where we provide it as the second argument:

#include <iostream>
#include <memory>
#include <cstring>

int main(){
  std::unique_ptr<char[]> MyString{
    new char[50]("Hello")};

  strcat_s(MyString.get(), 50, " World");

  std::cout << MyString;
}
Hello World

At run time, our program will check if the size is not big enough before doing the concatenation. If it isn’t, our program will throw a runtime error rather than corrupting memory.

In release configurations, this check is disabled for performance reasons, but the noisy failure should be enough for us to catch the problem during the development cycle.

Copying Strings using strcpy_s()

Creating a copy of a C-style string is not as easy as we might expect. If we create a copy in the normal way, using the = operator, what we’re actually creating is a copy of the pointer.

int main(){
  const char* Source{"Hello"};
  const char* Destination{Source};
}

After running this code, both Source and Dest point to the exact same location in memory.

Diagram showing the effect of copying a C-style string

As such, modifications to one string would affect the other.

#include <iostream>
#include <cstring>

int main(){
  char Source[50]{"Hello"};
  const char* Copy{Source};

  std::cout << "Copy Content is: " << Copy;

  // We modify the source string...
  strcat_s(Source, " World");

  // ...but the copy will change too
  std::cout << "\nCopy Content is: " << Copy;
}
Copy Content is: Hello
Copy Content is: Hello World

This type of copy is often referred to as a shallow copy. We have multiple variables, but below the surface, those variables share one or more underlying resources.

To create a full, deep copy of the string, we first need to allocate enough space in memory for the copy. We can do that by creating a character array, with enough space for the string, and the null terminator.

We can then use the should use the strcpy_s method, passing the destination first, and the source second. We are responsible for ensuring the destination has enough space for the source string, as well as the null terminator.:

#include <iostream>
#include <cstring>

int main(){
  char Source[50]{"Hello"};
  char Dest[50];

  strcpy_s(Dest, Source);

  std::cout << "Dest Content is: " << Dest;

  // We modify the source string...
  strcat_s(Source, " World");
  std::cout << "Source Content is: " << Source;

  // ...but now the copy will remain the same
  std::cout << "\nDest Content is: " << Dest;
}
Destination Content is: Hello
Source Content is: Hello World
Destination Content is: Hello

As we can see from the output, modifications to the source string no longer affect the destination. This is because, unlike the earlier example, we are now doing a “deep copy”. The entire string is copied in memory, not just the pointer:

Diagram showing the effect of copying a C-style string using strcpy or strcpy_s

Similar to strcat_s(), the strcpy_s() function replaces the older strcpy() function, which was deprecated due to poor protection against buffer overflows.

If the compiler believes the destination does not have enough space for the copy, it will throw an error to alert us to the danger.

If the compiler cannot determine the amount of space available at the destination, it will also throw an error. We can address this by switching to an alternative form of the strcpy_s() function, which allows us to provide the size as the second argument:

#include <iostream>
#include <cstring>

int main(){
  char Source[50]{"Hello"};
  auto Dest{std::make_unique<char[]>(50)};

  strcpy_s(Dest.get(), 50, Source);

  std::cout << "Dest Content is: " << Dest;
}
Dest Content is: Hello

Similar to the strcat_s() behavior, the strcpy_s() function includes a runtime assertion that ensures this size is big enough to hold the copy. This hopefully allows us to catch any potential problems during development. In release configurations, this check is disabled in order to maximize performance.

Memory Corruption (Bypassing strcpy and strcat Warnings)

Even though strcpy() and strcat() have been deprecated, we may still want to use them. A likely scenario for this is if we’re using an old library that hasn’t been updated.

We can ask the compiler to ignore these warnings by defining _CRT_SECURE_NO_WARNINGS within our project settings, or as a #define directive that occurs before any #include directives:

#define _CRT_SECURE_NO_WARNINGS
#include <cstring>

int main(){
  const char* Source{"Hello"};
  char Dest[5];

  strcpy(Dest, Source);
}

This example has a memory corruption issue, as Dest is not big enough to store what we’re trying to copy. We forgot to include space for the null terminator.

The compiler will not detect an error here. Once built in release mode, our program will corrupt memory unrelated to our string, and keep running.

The following simple program demonstrates a buffer overflow, by specifically stating what we want to be in a memory location, and then overflowing that location:

#define _CRT_SECURE_NO_WARNINGS
#include <cstring>
#include <iostream>

int main(){
  char Name[5]{"Abby"};

  // Storing bank balance in the memory location
  // that is immediately after the string
  int* Balance = reinterpret_cast<int*>
    (Name + sizeof(Name));
  *Balance = 500;

  std::cout << "Hi " << Name <<
    ", your balance is $" << *Balance;

  // We don't have enough space to store this string
  // so it will overflow to adjacent memory
  std::cout << "\nUpdating name...\n";
  strcpy(Name, "Abbey"); 

  // Balance has now been corrupted
  std::cout << "Hi " << Name <<
    ", your balance is $" << *Balance;
}
Hi Abby, your balance is $500
Updating name...
Hi Abbey, your balance is $256

But, in real programs, the implications of memory corruption are highly variable. Each time it happens, it can look like a totally new bug that has never been seen before and will never be seen again.

Its effects depend on what is in the memory location that was corrupted, and that can be different every time. This is what makes memory corruption so insidious. It is something we should be aware of, and proactively be on the lookout for.

In the previous example, we’re doing a few things that are considered bad practices. Modern C++ gives us many better options that make bugs like this much more difficult to introduce. One of those options is a much better implementation of strings the std::string object.

Previously, we’ve been using std::string where possible. We’ll go back to that convention in the next lesson, where we dive a bit deeper into how std::string works, and the powers it gives us.

Was this lesson useful?

Ryan McCombe
Ryan McCombe
Posted
This lesson is part of the course:

Professional C++

Comprehensive course covering advanced concepts, and how to use them on large-scale projects.

7a.jpg
This lesson is part of the course:

Professional C++

Comprehensive course covering advanced concepts, and how to use them on large-scale projects.

Free, unlimited access!

This course includes:

  • 106 Lessons
  • 550+ Code Samples
  • 96% Positive Reviews
  • Regularly Updated
  • Help and FAQ
Next Lesson

A Deeper Look at the std::string Class

A detailed guide to std::string, covering the most essential methods and operators
832.jpg
Contact|Privacy Policy|Terms of Use
Copyright © 2023 - All Rights Reserved