Edit Distance ( CSES ) Problem

and space optimized Dynamic Programming Solution using tabulation and memoization in Rust Language.

Introduction

Edit distance is another very commonly seen Dynamic Programming Problem and is very similar to Longest Common Subsequence Problem.

In this problem, we have to determine minimum number of operations required to transform one string into the other.

Now, the operations allowed are different for different websites or problems. We are solving CSES Edit Distance problem in this article.

In this problem, allowed operations are

  1. Add one character to the string.
  2. Remove one character from the string.
  3. Replace one character in the string.

For example, strings are ABC and XYA. For this, answer should be 3, because you can simply replace each character.

Observation

The observation tells us that either the last character of string matches, or does not matches.

1. Last character of string is same : Then, we do not have to change it. We can simply solve the answer for n-1 and m-1 length strings. It can be proved it is always optimal.

2. Last character of string is not same : Then we can perform either of 3 operations, and take minimum of them. It requires 1 operation.

Recursive Solution

Recursive solution to this problem should be pretty straightforward now.

Algorithm

  1. If either of string is empty, return length of other string. Because, you have to either add all the characters or remove all the characters, depending on which string is empty.
  2. If the last character is same, simply return the function for strings of n-1 and m-1 length. This requires no operation.
  3. Otherwise, return 1 + minimum of all three operations.

Function

Here is the function using above algorithm

Note : In Rust Language, the String is UTF-8 encoded by default, hence, indexing is not possible in String type. Hence, we use Vec<char> instead.

fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize) -> usize{
// If either of string is empty, return length of other string
if n==0 || m==0 {
return n+m;
}
// If the character is same, we don't have to change it
// And skip the character
if str1[n-1] == str2[m-1] {
return edit_distance(str1, str2, n-1, m-1);
}
// Else, we check all the 3 possibilities,
// Return minimum of them as answer
return 1+ min(
min(
// 1. If we add character, we can skip this character from
// string 2, because, string 1 and 2 has same last character now
edit_distance(str1, str2, n, m-1),
// 2. If we remove the character, we skip it from string 1
edit_distance(str1, str2, n-1, m)),
// 3. If we replace, both string 1 and 2 have same last character
edit_distance(str1, str2, n-1, m-1)
);
}

With driver code

fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize) -> usize{
// If either of string is empty, return length of other string
if n==0 || m==0 {
return n+m;
}
// If the character is same, we don't have to change it
// And skip the character
if str1[n-1] == str2[m-1] {
return edit_distance(str1, str2, n-1, m-1);
}
// Else, we check all the 3 possibilities,
// Return minimum of them as answer
return 1+ min(
min(
// 1. If we add character, we can skip this character from
// string 2, because, string 1 and 2 has same last character now
edit_distance(str1, str2, n, m-1),
// 2. If we remove the character, we skip it from string 1
edit_distance(str1, str2, n-1, m)),
// 3. If we replace, both string 1 and 2 have same last character
edit_distance(str1, str2, n-1, m-1)
);
}
// Driver Code
use std::io::stdin;
use std::cmp::min;
fn take_string() -> Vec<char> {
let mut input = String::new();
stdin().read_line(&mut input).unwrap();
let vec:Vec<char> = input.trim().chars().collect();
return vec;
}
fn main() {
let str1 = take_string();
let str2 = take_string();
println!("{}", edit_distance(&str1, &str2, str1.len(), str2.len()));
}

Input

ABC
XYA

Output

3

Time Complexity : O( 3m+n )
Space Complexity : O( m+n )

( Space complexity includes recursive stack space )

Overlapping Sub-problems

If we have a look carefully on recursive approach, we computed multiple results many times.

For example, if the strings are ABCDEFG and MNOPQRST, the result for substrings ABC and MNOP is computed hundreds of times, and take hundreds of recursions each time. These are called overlapping sub-problems, because it is a sub-problem of actual problem and is overlapping in multiple recursions.

To prevent this, we can store the output and each result will be calculated only once.

In Edit Distance problem, we can create a dp matrix, and store each value by lengths of first and second string. That is, answer for ABC and MNOP will be stored at dp[3][4].

This is called memoization or Top-down Dynamic Programming.

Memoization ( Top-down DP ) Method

In memoization method, we simply take a DP matrix, and store the computed result.

Algorithm

  1. First, take a dp matrix of size (n+1) × (m+1) and set all elements to None.
  2. If the now, if the value for given n and m is already stored, return it.
  3. Else, calculate the value using recursion and store it in the matrix and return it.

Function

Here is the function using above algorithm

fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize, dp : &mut Vec<Vec<Option<usize>>>) -> usize{
// If either of string is empty, return length of other string
if n==0 || m==0 {
return n+m;
}
// If result is already calculated, return it
if dp[n][m].is_some() { return dp[n][m].unwrap(); }
// If the character is same, we don't have to change it
// And skip the character. Store and return it
if str1[n-1] == str2[m-1] {
dp[n][m] = Option::from(edit_distance(str1, str2, n-1, m-1, dp));
return dp[n][m].unwrap();
}
// Else, we check all the 3 possibilities,
// Return minimum of them as answer
dp[n][m] = Option::from(1+min(
min(
// 1. If we add character, we can skip this character from
// string 2, because, string 1 and 2 has same last character now
edit_distance(str1, str2, n, m-1, dp),
// 2. If we remove the character, we skip it from string 1
edit_distance(str1, str2, n-1, m, dp)),
// 3. If we replace, both string 1 and 2 have same last character
edit_distance(str1, str2, n-1, m-1, dp)
));
return dp[n][m].unwrap();
}

With driver code

fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize, dp : &mut Vec<Vec<Option<usize>>>) -> usize{
// If either of string is empty, return length of other string
if n==0 || m==0 {
return n+m;
}
// If result is already calculated, return it
if dp[n][m].is_some() { return dp[n][m].unwrap(); }
// If the character is same, we don't have to change it
// And skip the character. Store and return it
if str1[n-1] == str2[m-1] {
dp[n][m] = Option::from(edit_distance(str1, str2, n-1, m-1, dp));
return dp[n][m].unwrap();
}
// Else, we check all the 3 possibilities,
// Return minimum of them as answer
dp[n][m] = Option::from(1+min(
min(
// 1. If we add character, we can skip this character from
// string 2, because, string 1 and 2 has same last character now
edit_distance(str1, str2, n, m-1, dp),
// 2. If we remove the character, we skip it from string 1
edit_distance(str1, str2, n-1, m, dp)),
// 3. If we replace, both string 1 and 2 have same last character
edit_distance(str1, str2, n-1, m-1, dp)
));
return dp[n][m].unwrap();
}
// Driver Code
use std::io::stdin;
use std::cmp::min;
fn take_string() -> Vec<char> {
let mut input = String::new();
stdin().read_line(&mut input).unwrap();
let vec:Vec<char> = input.trim().chars().collect();
return vec;
}
fn main() {
let str1 = take_string();
let str2 = take_string();
let mut dp = vec![vec![None; str2.len()+1]; str1.len()+1];
println!("{}", edit_distance(&str1, &str2, str1.len(), str2.len(), &mut dp));
}

Input

ABC
XYA

Output

3

Time Complexity : O( m×n )
Space Complexity : O( m×n )

Tabulation ( Bottom-up DP ) Method

Although time and space complexities of tabulation as well as memoization method are same, tabulation is much more efficient as there are a lot of expensive recursive calls in memoization.

In tabulation method, we make the matrix, and fill it first on the basis of base condition, and then on the basis of previous row.

Algorithm

  1. Firstly, for the first row, set the value of each cell of the matrix as corresponding column number, because if length of string1 is 0, the answer is length of string2. Similarly, fill the first column.
  2. Now, traverse the matrix row wise ( or column wise, as per your preference ), and if the given character of both strings is same, set the dp[i][j] as dp[i-1][j-1].
  3. Else, set the element as 1 + minimum of all three previous elements, that is, dp[i-1][j], dp[i-1][j-1] and dp[i][j-1].
  4. Finally, return the last value of matrix, that is, dp[n][m].

Function

Here is the function using above algorithm

fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize) -> usize{
// Create a dp matrix
let mut dp = vec![vec![0; str2.len()+1]; str1.len()+1];
// Set the first row as corresponding column number
for i in 1..str2.len()+1 {
dp[0][i]=i;
}
// Set the first column as corresponding row number
for i in 1..str1.len()+1 {
dp[i][0] = i;
}
// Now, traverse the matrix for each row and column.
for i in 1..str1.len()+1 {
for j in 1..str2.len()+1 {
// If the character is same, just copy dp[i-1][j-1]
if str1[i-1] == str2[j-1] { dp[i][j] = dp[i-1][j-1]; }
// Else, set it to 1 + minimum of previous 3 cells
else {
dp[i][j] = 1 + min(dp[i][j-1],
min(dp[i-1][j], dp[i-1][j-1]));
}
}
}
return dp[n][m];
}

With Driver code

fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize) -> usize{
// Create a dp matrix
let mut dp = vec![vec![0; str2.len()+1]; str1.len()+1];
// Set the first row as corresponding column number
for i in 1..str2.len()+1 {
dp[0][i]=i;
}
// Set the first column as corresponding row number
for i in 1..str1.len()+1 {
dp[i][0] = i;
}
// Now, traverse the matrix for each row and column.
for i in 1..str1.len()+1 {
for j in 1..str2.len()+1 {
// If the character is same, just copy dp[i-1][j-1]
if str1[i-1] == str2[j-1] { dp[i][j] = dp[i-1][j-1]; }
// Else, set it to 1 + minimum of previous 3 cells
else {
dp[i][j] = 1 + min(dp[i][j-1],
min(dp[i-1][j], dp[i-1][j-1]));
}
}
}
return dp[n][m];
}
// Driver Code
use std::io::stdin;
use std::cmp::min;
fn take_string() -> Vec<char> {
let mut input = String::new();
stdin().read_line(&mut input).unwrap();
let vec:Vec<char> = input.trim().chars().collect();
return vec;
}
fn main() {
let str1 = take_string();
let str2 = take_string();
println!("{}", edit_distance(&str1, &str2, str1.len(), str2.len()));
}

Input

ABC
XYA

Output

3

Time Complexity : O( m×n )
Space Complexity : O( m×n )

Space Optimized Tabulation Method

If we observe the above tabulation method carefully, we find that for calculating edit distance for two strings, only current and previous row are required.

Hence, we can optimize our space complexity, by storing only the previous row instead of the whole matrix.

Function

Here is the function using space optimization of tabulation method.

fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize) -> usize{
// Create a dp matrix
let mut prev = vec![0; m+1];
let mut curr = vec![0; m+1];
// Set the first row as corresponding column number
for i in 1..m+1 {
prev[i] = i;
}
// Now, traverse for each character of both the strings.
for i in 1..n+1 {
// Set current's first element as corresponding row number
curr[0] = i;
// Now traverse for each column in given row
for j in 1..m+1 {
// If the character is same, just copy previous[j-1]
if str1[i-1] == str2[j-1] { curr[j] = prev[j-1]; }
// Else, set it to 1 + minimum of previous 3 cells
else {
curr[j] = 1 + min(curr[j-1],
min(prev[j], prev[j-1]));
}
}
// Now, copy current row to previous row
// No need to reinitialise current row,
// Because we are not using those values
prev=curr.clone();
}
return curr[m];
}

Use the same driver code.

Input

ABC
XYA

Output

3

Time Complexity : O( m×n )
Space Complexity : O( m )

Conclusion

Edit distance is very commonly seen Dynamic Programming Problem and is very similar to Longest Common Subsequence Problem.

In this problem, we have to determine minimum number of operations required to transform one string into the other.

In this article, we saw how to solve the CSES Edit Distance, first using recursion and then using Dynamic Programming, memoization as well as tabulation method, and latter the space optimized tabulation method in Rust Language.

Here is the optimized function for easy access

fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize) -> usize{
let mut prev = vec![0; m+1];
let mut curr = vec![0; m+1];
for i in 1..m+1 { prev[i] = i; }
for i in 1..n+1 {
curr[0] = i;
for j in 1..m+1 {
if str1[i-1] == str2[j-1] { curr[j] = prev[j-1]; }
else {
curr[j] = 1 + min(curr[j-1],
min(prev[j], prev[j-1])); } }
prev=curr.clone();
}
return curr[m];
}

Thank You