Calculating Levenshtein Distance

How can I efficiently calculate the Levenshtein distance between two std::strings?

The Levenshtein distance, also known as the edit distance, is a measure of the difference between two strings. It represents the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into another.

Calculating this distance efficiently using std::string is an excellent exercise in dynamic programming and string manipulation.

Let's implement an efficient Levenshtein distance calculator:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

int levenshteinDistance(const std::string& s1,
                        const std::string& s2) {
  const size_t m{s1.length()};
  const size_t n{s2.length()};

  std::vector<std::vector<int>> dp(
    m + 1, std::vector<int>(n + 1)
  );

  // Initialize first row and column
  for (size_t i{0}; i <= m; ++i) dp[i][0] = i;
  for (size_t j{0}; j <= n; ++j) dp[0][j] = j;

  // Fill the dp table
  for (size_t i{1}; i <= m; ++i) {
    for (size_t j{1}; j <= n; ++j) {
      if (s1[i - 1] == s2[j - 1]) {
        dp[i][j] = dp[i - 1][j - 1];  
      } else {
        dp[i][j] = 1 + std::min({
          dp[i - 1][j],     // deletion
          dp[i][j - 1],     // insertion
          dp[i - 1][j - 1]  // substitution
        });
      }
    }
  }

  return dp[m][n];
}

int main() {
  std::string word1{"kitten"};
  std::string word2{"sitting"};

  int distance{levenshteinDistance(word1, word2)};

  std::cout
    << "The Levenshtein distance between '"
    << word1 << "' and '" << word2 << "' is: "
    << distance << '\n';
}
The Levenshtein distance between 'kitten' and 'sitting' is: 3

Let's break down the levenshteinDistance() function:

  1. We create a 2D vector dp to store our dynamic programming table.
  2. We initialize the first row and column of dp to represent the distance from an empty string.
  3. We iterate through both strings, filling the dp table:
    • If the characters match, we copy the diagonal value (no edit needed).
    • If they don't match, we take the minimum of the three possible operations (deletion, insertion, substitution) and add 1.
  4. The final cell dp[m][n] gives us the Levenshtein distance.

This implementation has a time complexity of O(mn)O(mn) and a space complexity of O(mn)O(mn), where mm and nn are the lengths of the input strings.

We can optimize this further to use only O(min(m,n))O(min(m,n)) space:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

int levenshteinDistance(
  const std::string& s1, const std::string& s2
) {
  const size_t m{s1.length()};
  const size_t n{s2.length()};

  // Ensure s1 is the shorter string
  if (m > n) return levenshteinDistance(s2, s1);

  std::vector<int> prevRow(m + 1);
  std::vector<int> currRow(m + 1);

  for (size_t i{0}; i <= m; ++i) prevRow[i] = i;

  for (size_t j{1}; j <= n; ++j) {
    currRow[0] = j;
    for (size_t i{1}; i <= m; ++i) {
      if (s1[i - 1] == s2[j - 1]) {
        currRow[i] = prevRow[i - 1];
      } else {
        currRow[i] = 1 + std::min(
          {prevRow[i],
          currRow[i - 1],
          prevRow[i - 1]
        });
      }
    }
    std::swap(prevRow, currRow);
  }

  return prevRow[m];
}

int main() {/*...*/}
The Levenshtein distance between 'kitten' and 'sitting' is: 3

This optimized version uses only two rows of the DP table at a time, significantly reducing memory usage for long strings.

The Levenshtein distance has many practical applications, including:

  • Spell checking and correction
  • DNA sequence alignment in bioinformatics
  • Fuzzy string matching in search algorithms

By implementing this algorithm efficiently with std::string, we've demonstrated advanced string manipulation techniques and the power of dynamic programming in C++.

A Deeper Look at the std::string Class

A detailed guide to std::string, covering the most essential methods and operators

Questions & Answers

Answers are generated by AI models and may not have been reviewed. Be mindful when running any code on your device.

Efficient String Concatenation in Loops
How can I efficiently concatenate multiple strings in a loop without excessive memory allocation?
Converting String Case in C++
What's the best way to convert all characters in a std::string to uppercase or lowercase?
Splitting a String into a Vector
How can I split a std::string into a vector of substrings based on a delimiter?
Removing Whitespace from a String
What's the most efficient way to remove all whitespace from a std::string?
Case-Insensitive String Comparison
How can I implement a case-insensitive string comparison using std::string?
Replacing All Substrings in a String
What's the best approach to replace all occurrences of a substring within a std::string?
Reversing a String in C++
What's the most efficient way to reverse a std::string?
Validating Email Addresses with Regex
How can I check if a std::string is a valid email address using regular expressions?
Checking if a String is a Palindrome
How can I efficiently check if a std::string is a palindrome?
Implementing Basic Autocomplete
How can I implement a basic autocomplete feature using a list of std::strings?
Or Ask your Own Question
Get an immediate answer to your specific question using our AI assistant