How can I efficiently calculate the Levenshtein distance between two `std::string`s?

Question

Ryan McCombe · Accepted Answer

The Levenshtein distance, also known as the edit distance, is a measure of the difference between two strings. It represents the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into another.

Calculating this distance efficiently using std::string is an excellent exercise in dynamic programming and string manipulation.

Let's implement an efficient Levenshtein distance calculator:

1#include <iostream>
2#include <string>
3#include <vector>
4#include <algorithm>
5
6int levenshteinDistance(const std::string& s1,
7                        const std::string& s2) {
8  const size_t m{s1.length()};
9  const size_t n{s2.length()};
10
11  std::vector<std::vector<int>> dp(
12    m + 1, std::vector<int>(n + 1)
13  );
14
15  // Initialize first row and column
16  for (size_t i{0}; i <= m; ++i) dp[i][0] = i;
17  for (size_t j{0}; j <= n; ++j) dp[0][j] = j;
18
19  // Fill the dp table
20  for (size_t i{1}; i <= m; ++i) {
21    for (size_t j{1}; j <= n; ++j) {
22      if (s1[i - 1] == s2[j - 1]) {
23        dp[i][j] = dp[i - 1][j - 1];  
24      } else {
25        dp[i][j] = 1 + std::min({
26          dp[i - 1][j],     // deletion
27          dp[i][j - 1],     // insertion
28          dp[i - 1][j - 1]  // substitution
29        });
30      }
31    }
32  }
33
34  return dp[m][n];
35}
36
37int main() {
38  std::string word1{"kitten"};
39  std::string word2{"sitting"};
40
41  int distance{levenshteinDistance(word1, word2)};
42
43  std::cout
44    << "The Levenshtein distance between '"
45    << word1 << "' and '" << word2 << "' is: "
46    << distance << '\n';
47}

1The Levenshtein distance between 'kitten' and 'sitting' is: 3

Let's break down the levenshteinDistance() function:

We create a 2D vector dp to store our dynamic programming table.
We initialize the first row and column of dp to represent the distance from an empty string.
We iterate through both strings, filling the dp table:
- If the characters match, we copy the diagonal value (no edit needed).
- If they don't match, we take the minimum of the three possible operations (deletion, insertion, substitution) and add 1.
The final cell dp[m][n] gives us the Levenshtein distance.

This implementation has a time complexity of $O(mn)$ and a space complexity of $O(mn)$ , where $m$ and $n$ are the lengths of the input strings.

We can optimize this further to use only $O(min(m,n))$ space:

1#include <iostream>
2#include <string>
3#include <vector>
4#include <algorithm>
5
6int levenshteinDistance(
7  const std::string& s1, const std::string& s2
8) {
9  const size_t m{s1.length()};
10  const size_t n{s2.length()};
11
12  // Ensure s1 is the shorter string
13  if (m > n) return levenshteinDistance(s2, s1);
14
15  std::vector<int> prevRow(m + 1);
16  std::vector<int> currRow(m + 1);
17
18  for (size_t i{0}; i <= m; ++i) prevRow[i] = i;
19
20  for (size_t j{1}; j <= n; ++j) {
21    currRow[0] = j;
22    for (size_t i{1}; i <= m; ++i) {
23      if (s1[i - 1] == s2[j - 1]) {
24        currRow[i] = prevRow[i - 1];
25      } else {
26        currRow[i] = 1 + std::min(
27          {prevRow[i],
28          currRow[i - 1],
29          prevRow[i - 1]
30        });
31      }
32    }
33    std::swap(prevRow, currRow);
34  }
35
36  return prevRow[m];
37}
38
39int main() {/*...*/}

1The Levenshtein distance between 'kitten' and 'sitting' is: 3

This optimized version uses only two rows of the DP table at a time, significantly reducing memory usage for long strings.

The Levenshtein distance has many practical applications, including:

Spell checking and correction
DNA sequence alignment in bioinformatics
Fuzzy string matching in search algorithms

By implementing this algorithm efficiently with std::string, we've demonstrated advanced string manipulation techniques and the power of dynamic programming in C++.

A Deeper Look at the `std::string` Class

Calculating Levenshtein Distance

A Deeper Look at the `std::string` Class

Professional C++

Questions & Answers

Calculating Levenshtein Distance

A Deeper Look at the std::string Class

Questions & Answers

A Deeper Look at the `std::string` Class