Converting Between Character Encodings

How can I convert between different character encodings in C++?

Converting between character encodings in C++ can be a bit tricky, as the standard library doesn't provide built-in functions for this purpose. However, we can use third-party libraries or platform-specific APIs to accomplish this task. Here's a guide on how to approach this:

Using a Third-Party Library

One popular library for handling character encoding conversions is ICU (International Components for Unicode). Here's an example of how you might use ICU to convert from UTF-8 to UTF-16:

#include <unicode/ucnv.h>
#include <unicode/unistr.h>

#include <iostream>
#include <string>

int main() {
  std::string utf8String = "Hello, !";

  // Convert UTF-8 to UTF-16
  icu::UnicodeString utf16String =
    icu::UnicodeString::fromUTF8(utf8String);

  // Convert back to UTF-8 for display
  std::string convertedString;
  utf16String.toUTF8String(convertedString);

  std::cout << "Original: "
    << utf8String << '\n';
  std::cout << "Converted: "
    << convertedString << '\n';
}
Original: Hello, !
Converted: Hello, !

Using Platform-Specific APIs

On Windows, you can use the Windows API functions for encoding conversion. Here's an example that converts from UTF-8 to UTF-16:

#include <windows.h>
#include <iostream>
#include <string>

int main() {
  std::string utf8String = "Hello, !";

  // Get the required buffer size
  int size = MultiByteToWideChar(
    CP_UTF8, 0, utf8String.c_str(),
    -1, nullptr, 0
  );

  // Allocate the buffer
  std::wstring utf16String(size, 0);

  // Perform the conversion
  MultiByteToWideChar(
    CP_UTF8, 0, utf8String.c_str(),
    -1, &utf16String[0], size);

  // Convert back to UTF-8 for display
  size = WideCharToMultiByte(
    CP_UTF8, 0, utf16String.c_str(), -1,
    nullptr, 0, nullptr, nullptr
  );

  std::string convertedString(size, 0);
  WideCharToMultiByte(
    CP_UTF8, 0, utf16String.c_str(), -1,
    &convertedString[0], size, nullptr, nullptr
  );

  // Set console output to UTF-8
  SetConsoleOutputCP(CP_UTF8);

  std::cout << "Original: "
    << utf8String << '\n';
  std::cout << "Converted: "
    << convertedString << '\n';
}
Original: Hello, !
Converted: Hello, !

Remember, when working with different encodings, it's crucial to be aware of the encoding of your source files and how your compiler interprets string literals. Always use libraries or APIs that are well-tested and widely used to avoid potential encoding errors.

Characters, Unicode and Encoding

An introduction to C++ character types, the Unicode standard, character encoding, and C-style strings

Questions & Answers

Answers are generated by AI models and may not have been reviewed. Be mindful when running any code on your device.

Handling Non-ASCII User Input
How do I handle user input that might contain non-ASCII characters?
Determining String Encoding at Runtime
Is there a way to determine the encoding of a given string at runtime?
C++ Localization Best Practices
How can I ensure my C++ program works correctly with different locales and languages?
Implementing Unicode Normalization
How do I handle Unicode normalization in C++?
Serializing Unicode Strings in C++
What are the best practices for serializing Unicode strings in C++?
Cross-Platform Unicode Support in C++
How do I implement proper Unicode support in a cross-platform C++ application?
Or Ask your Own Question
Get an immediate answer to your specific question using our AI assistant