In this lesson, we'll focus on some helper functions that can help us work with C-style strings. In most cases, we should not use C-style strings. Even with these helper functions, they are still unpleasant to work with and have many potential pitfalls.

More modern approaches such as std::string are typically easier and safer to work with, and they come with more capabilities out of the box.

However, C-style strings remain in common use with external libraries and tools that we will integrate with. Therefore, a basic knowledge of what they are and how they work is extremely useful.

Character Arrays

In the previous lesson, we covered how C-strings (and other null-terminated strings) are stored in an array-like layout in memory. A C-style string is typically a pointer - a char* to the first character in the array:

Diagram showing a character pointer pointing at a c-string in memory

This gives us another way to create C-style strings using C-style array syntax:

1int main(){
2  char MyString[6]{"Hello"};
3}

This allows us to specify how much memory is available for the string, remembering to include space for the null terminator.

The ability to specify the size is useful if we aren't going to provide an initial value for the string, or if we're going to modify the string to be longer later in our application.

The array form of strings (char[]) and the pointer form (char*) are mostly interchangeable. This is perhaps not surprising if we recall that C-style arrays aren't that different from a pointer to the first element, and often decay to exactly that.

C-Style Arrays

A detailed guide to working with classic C-style arrays within C++, and why we should avoid them where possible

View Related Lesson

But, there are some differences. Most notably, the array form will allocate the characters onto the stack by default. We can change that in the usual ways, with the new keyword and/or smart pointers:

1#include <memory>
2
3int main(){
4  // Dynamic allocation of string
5  auto MyStringA{std::make_unique<char[]>(50)};
6
7  // Dynamic allocation of string with initial value
8  std::unique_ptr<char[]> MyStringB{
9    new char[50]("Hello")};
10}

The `cstring` Header

The standard library includes a collection of functions that can be useful for working with C-style strings.

They're available by including the <cstring> header file.

1#include <cstring>

The full collection of utilities available can be seen from a standard library reference, such as cppreference.com. but we'll cover the most useful ones below, alongside some issues and pitfalls to consider.

String Length using `strlen()`

The strlen() function returns the number of characters in a C-style string, not counting the null terminator:

1#include <iostream>
2#include <cstring>
3
4int main(){
5  const char* MyString{"Hello"};
6  std::cout << "String Length: "
7    << strlen(MyString);
8}

1String Length: 5

Note that when using a character array, the length of the string is not necessarily the same as the length of the array allocated to store the string.

The array may have additional space, after the string's null terminator. strlen() returns the length of the string, not the length of the array:

1#include <iostream>
2#include <cstring>
3
4int main(){
5  char MyString[100]{"Hello"};
6  std::cout << "Array Length: "
7    << sizeof(MyString) / sizeof(char);
8  std::cout << "\nString Length: "
9    << strlen(MyString);
10}

1Array Length: 100
2String Length: 5

Comparison using `strcmp()`

The strcmp() function accepts two strings as arguments. It then compares them alphabetically, ie, in dictionary order. This is also sometimes referred to as lexical order or lexicographic order.

strcmp() will return an integer, which we should interpret in the following way:

If the integer is negative, the first string comes before the second string in the dictionary order
If the integer is zero, the two strings are the same
If the integer is positive, the first string comes after the second string in the dictionary order

In this example, we do a simple comparison to check if two strings are equal:

1#include <cstring>
2#include <iostream>
3
4int main(){
5  const char* A{"Bear"};
6  const char* B{"Bear"};
7  const char* C{"Zebra"};
8
9  if (strcmp(A, B) == 0) {
10    std::cout << "A and B are equal";
11  }
12
13  if (strcmp(B, C) != 0) {
14    std::cout << "\nB and C are not equal\n";
15  }
16
17  if (strcmp(B, C) < 0) {
18    std::cout << B << " comes before " << C;
19  }
20}

1A and B are equal
2B and C are not equal
3Bear comes before Zebra

Here, we use the strcmp() function as a predicate for the std::ranges::sort() algorithm, to put a range of strings in alphabetical order:

1#include <algorithm>
2#include <cstring>
3#include <iostream>
4#include <vector>
5
6int main(){
7  std::vector Animals{
8    "Bear", "Zebra", "Chicken", "Alligator"};
9
10  auto Predicate{
11    [](const char* A, const char* B){
12      return strcmp(A, B) < 0;
13    }
14  };
15
16  std::ranges::sort(Animals, Predicate);
17
18  for (const char* Animal : Animals) {
19    std::cout << Animal << '\n';
20  }
21}

1Alligator
2Bear
3Chicken
4Zebra

We introduced range-based algorithms and std::ranges::sort() earlier in this course:

Iterator and Range-Based Algorithms

An introduction to iterator and range-based algorithms, using examples from the standard library

View Related Lesson

Concatenation using `strcat_s()`

The strcat_s() function appends one C-style string onto another. Combining strings in this way is referred to as concatenation.

The function takes care of copying the characters for us and repositions the null terminator into the correct place.

However, we are responsible for ensuring the string has enough surplus memory to store what we're concatenating onto it.

1#include <iostream>
2#include <cstring>
3
4int main(){
5  char MyString[50]{"Hello"};
6  strcat_s(MyString, " World");
7  std::cout << MyString;
8}

1Hello World

The strcat_s() function is a modern alternative to the earlier strcat() function, which was deprecated for safety reasons. Specifically, it could be used in cases where the destination did not have enough memory to store everything we were concatenating.

This resulted in buffer overflow issues, where our program could corrupt the memory that falls after our string.

The original strcat() function was not doing enough to prevent this from happening, so it was replaced with the strcat_s() function.

strcat_s() includes additional checks to ensure the destination has enough space to perform the requested concatenation.

In many cases, the compiler can determine the size of the destination automatically, but not always. Given the propensity of C-style arrays to lose track of their size by decaying to a pointer, we sometimes have to intervene.

If the compiler cannot determine the size of the destination automatically, it will throw an error. We can provide the size manually using an alternative function signature, where we provide it as the second argument:

1#include <iostream>
2#include <memory>
3#include <cstring>
4
5int main(){
6  std::unique_ptr<char[]> MyString{
7    new char[50]("Hello")};
8
9  strcat_s(MyString.get(), 50, " World");
10
11  std::cout << MyString;
12}

1Hello World

At run time, our program will check if the size is not big enough before doing the concatenation. If it isn't, our program will throw a runtime error rather than corrupting memory.

In release configurations, this check is disabled for performance reasons, but the noisy failure should be enough for us to catch the problem during the development cycle.

Copying using `strcpy_s()`

Creating a copy of a C-style string is not as easy as we might expect. If we create a copy in the normal way, using the = operator, what we're creating is a copy of the pointer.

1int main(){
2  const char* Source{"Hello"};
3  const char* Destination{Source};
4}

After running this code, both Source and Dest point to the same location in memory.

Diagram showing the effect of copying a C-style string

As such, modifications to one string would affect the other.

1#include <iostream>
2#include <cstring>
3
4int main(){
5  char Source[50]{"Hello"};
6  const char* Copy{Source};
7
8  std::cout << "Copy Content is: " << Copy;
9
10  // We modify the source string...
11  strcat_s(Source, " World");
12
13  // ...but the copy will change too
14  std::cout << "\nCopy Content is: " << Copy;
15}

1Copy Content is: Hello
2Copy Content is: Hello World

This type of copy is often referred to as a shallow copy. We have multiple variables, but below the surface, those variables share one or more underlying resources.

To create a full, deep copy of the string, we first need to allocate enough space in memory for the copy. We can do this by creating a character array with enough space for the string and the null terminator.

We can then use the should use the strcpy_s() method, passing the destination first, and the source second. We are responsible for ensuring the destination has enough space for the source string, as well as the null terminator:

1#include <iostream>
2#include <cstring>
3
4int main(){
5  char Source[50]{"Hello"};
6  char Dest[50];
7
8  strcpy_s(Dest, Source);
9
10  std::cout << "Dest Content: " << Dest;
11
12  // We modify the source string...
13  strcat_s(Source, " World");
14  std::cout << "\nSource Content: " << Source;
15
16  // ...but now the copy will remain the same
17  std::cout << "\nDest Content: " << Dest;
18}

1Destination Content: Hello
2Source Content: Hello World
3Destination Content: Hello

As we can see from the output, modifications to the source string no longer affect the destination. This is because, unlike the earlier example, we are now doing a "deep copy". The entire string is copied in memory, not just the pointer:

Diagram showing the effect of copying a C-style string using strcpy or strcpy_s

Similar to strcat_s(), the strcpy_s() function replaces the older strcpy() function, which was deprecated due to poor protection against buffer overflows.

If the compiler believes the destination does not have enough space for the copy, it will throw an error to alert us to the danger.

If the compiler cannot determine the amount of space available at the destination, it will also throw an error. We can address this by switching to an alternative form of the strcpy_s() function, which allows us to provide the size as the second argument:

1#include <iostream>
2#include <cstring>
3
4int main(){
5  char Source[50]{"Hello"};
6  auto Dest{std::make_unique<char[]>(50)};
7
8  strcpy_s(Dest.get(), 50, Source);
9
10  std::cout << "Dest Content is: " << Dest;
11}

1Dest Content is: Hello

Similar to the strcat_s() behavior, the strcpy_s() function includes a runtime assertion that ensures this size is big enough to hold the copy. This hopefully allows us to catch any potential problems during development. In release configurations, this check is disabled to maximize performance.

Memory Corruption with `strcpy()`, `strcat()`

Even though strcpy() and strcat() have been deprecated, we may still want to use them. A likely scenario for this is if we're using an old library that hasn't been updated.

We can ask the compiler to ignore these warnings by defining _CRT_SECURE_NO_WARNINGS within our project settings, or as a #define directive that occurs before any #include directives:

1#define _CRT_SECURE_NO_WARNINGS
2#include <cstring>
3
4int main(){
5  const char* Source{"Hello"};
6  char Dest[5];
7
8  strcpy(Dest, Source);
9}

This example has a memory corruption issue, as Dest is not big enough to store what we're trying to copy. We forgot to include space for the null terminator.

The compiler will not detect an error here. Once built in release mode, our program will corrupt memory unrelated to our string, and keep running.

The following simple program demonstrates a buffer overflow, by specifically stating what we want to be in a memory location, and then overflowing that location:

1#define _CRT_SECURE_NO_WARNINGS
2#include <cstring>
3#include <iostream>
4
5int main(){
6  char Name[5]{"Ann"};
7
8  // Storing bank balance in the memory location
9  // that is immediately after the string
10  int* Balance = reinterpret_cast<int*>
11    (Name + sizeof(Name));
12  *Balance = 500;
13
14  std::cout << "Hi " << Name <<
15    ", your balance is $" << *Balance;
16
17  // We don't have enough space to store this string
18  // so it will overflow to adjacent memory
19  std::cout << "\nUpdating name...\n";
20  strcpy(Name, "Anna"); 
21
22  // Balance has now been corrupted
23  std::cout << "Hi " << Name <<
24    ", your balance is $" << *Balance;
25}

1Hi Ann, your balance is $500
2Updating name...
3Hi Anna, your balance is $256

But, in real programs, the implications of memory corruption are highly variable. Each time it happens, it can look like a new bug that has never been seen before and will never be seen again.

Its effects depend on what is in the memory location that was corrupted, and that can be different every time. This is what makes memory corruption so insidious. It is something we should be aware of, and proactively be on the lookout for.

In the previous example, we're doing a few things that are considered bad practices. Modern C++ gives us many better options that make bugs like this much more difficult to introduce. One of those options is a much better implementation of strings the std::string object.

Previously, we've been using std::string where possible. We'll go back to that convention in the next lesson, where we dive a bit deeper into how std::string works, and the powers it gives us.

Summary

In this lesson, we've explored working with C-style strings in C++, focusing on how to manipulate them safely using functions from the <cstring> library.

We've emphasized the transition from older, less secure functions to their safer counterparts and highlighted best practices to avoid pitfalls associated with manual memory management.

Main Points Learned

C-style strings are stored as null-terminated character arrays and can be manipulated using the <cstring> library.
The strlen() function is used to find the length of a C-style string, excluding the null terminator.
strcmp() allows for lexicographical comparison of two strings, returning an integer to indicate their relational order.
strcat_s() and strcpy_s() are safer alternatives to strcat() and strcpy(), designed to prevent buffer overflow by ensuring sufficient memory space.
Memory corruption and buffer overflow issues can arise from improper use of C-style strings

Working with C-Style Strings

Character Arrays

C-Style Arrays

The `cstring` Header

String Length using `strlen()`

Comparison using `strcmp()`

Iterator and Range-Based Algorithms

Concatenation using `strcat_s()`

Copying using `strcpy_s()`

Memory Corruption with `strcpy()`, `strcat()`

Summary

Main Points Learned

A Deeper Look at the `std::string` Class

Professional C++

Questions & Answers

Working with C-Style Strings

Character Arrays

C-Style Arrays

The cstring Header

String Length using strlen()

Comparison using strcmp()

Iterator and Range-Based Algorithms

Concatenation using strcat_s()

Copying using strcpy_s()

Memory Corruption with strcpy(), strcat()

Summary

Main Points Learned

A Deeper Look at the std::string Class

Questions & Answers

The `cstring` Header

String Length using `strlen()`

Comparison using `strcmp()`

Concatenation using `strcat_s()`

Copying using `strcpy_s()`

Memory Corruption with `strcpy()`, `strcat()`

A Deeper Look at the `std::string` Class