Edit Distance ( CSES ) Problem
and space optimized Dynamic Programming Solution using tabulation and memoization in Rust Language.
Introduction
Edit distance is another very commonly seen Dynamic Programming Problem and is very similar to Longest Common Subsequence Problem.
In this problem, we have to determine minimum number of operations required to transform one string into the other.
Now, the operations allowed are different for different websites or problems. We are solving CSES Edit Distance problem in this article.
In this problem, allowed operations are
- Add one character to the string.
- Remove one character from the string.
- Replace one character in the string.
For example, strings are ABC
and XYA
. For this, answer should be 3, because you can simply replace each character.
Observation
The observation tells us that either the last character of string matches, or does not matches.
1. Last character of string is same : Then, we do not have to change it. We can simply solve the answer for n-1 and m-1 length strings. It can be proved it is always optimal.
2. Last character of string is not same : Then we can perform either of 3 operations, and take minimum of them. It requires 1 operation.
Recursive Solution
Recursive solution to this problem should be pretty straightforward now.
Algorithm
- If either of string is empty, return length of other string. Because, you have to either add all the characters or remove all the characters, depending on which string is empty.
- If the last character is same, simply return the function for strings of n-1 and m-1 length. This requires no operation.
- Otherwise, return 1 + minimum of all three operations.
Function
Here is the function using above algorithm
Note : In Rust Language, the String
is UTF-8 encoded by default, hence, indexing is not possible in String
type. Hence, we use Vec<char>
instead.
fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize) -> usize{// If either of string is empty, return length of other stringif n==0 || m==0 {return n+m;}// If the character is same, we don't have to change it// And skip the characterif str1[n-1] == str2[m-1] {return edit_distance(str1, str2, n-1, m-1);}// Else, we check all the 3 possibilities,// Return minimum of them as answerreturn 1+ min(min(// 1. If we add character, we can skip this character from// string 2, because, string 1 and 2 has same last character nowedit_distance(str1, str2, n, m-1),// 2. If we remove the character, we skip it from string 1edit_distance(str1, str2, n-1, m)),// 3. If we replace, both string 1 and 2 have same last characteredit_distance(str1, str2, n-1, m-1));}
With driver code
fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize) -> usize{// If either of string is empty, return length of other stringif n==0 || m==0 {return n+m;}// If the character is same, we don't have to change it// And skip the characterif str1[n-1] == str2[m-1] {return edit_distance(str1, str2, n-1, m-1);}// Else, we check all the 3 possibilities,// Return minimum of them as answerreturn 1+ min(min(// 1. If we add character, we can skip this character from// string 2, because, string 1 and 2 has same last character nowedit_distance(str1, str2, n, m-1),// 2. If we remove the character, we skip it from string 1edit_distance(str1, str2, n-1, m)),// 3. If we replace, both string 1 and 2 have same last characteredit_distance(str1, str2, n-1, m-1));}// Driver Codeuse std::io::stdin;use std::cmp::min;fn take_string() -> Vec<char> {let mut input = String::new();stdin().read_line(&mut input).unwrap();let vec:Vec<char> = input.trim().chars().collect();return vec;}fn main() {let str1 = take_string();let str2 = take_string();println!("{}", edit_distance(&str1, &str2, str1.len(), str2.len()));}
Input
ABC
XYA
Output
3
Time Complexity : O( 3m+n )
Space Complexity : O( m+n )
( Space complexity includes recursive stack space )
Overlapping Sub-problems
If we have a look carefully on recursive approach, we computed multiple results many times.
For example, if the strings are ABCDEFG and MNOPQRST, the result for substrings ABC and MNOP is computed hundreds of times, and take hundreds of recursions each time. These are called overlapping sub-problems, because it is a sub-problem of actual problem and is overlapping in multiple recursions.
To prevent this, we can store the output and each result will be calculated only once.
In Edit Distance problem, we can create a dp matrix, and store each value by lengths of first and second string. That is, answer for ABC and MNOP will be stored at dp[3][4].
This is called memoization or Top-down Dynamic Programming.
Memoization ( Top-down DP ) Method
In memoization method, we simply take a DP matrix, and store the computed result.
Algorithm
- First, take a dp matrix of size (n+1) × (m+1) and set all elements to None.
- If the now, if the value for given n and m is already stored, return it.
- Else, calculate the value using recursion and store it in the matrix and return it.
Function
Here is the function using above algorithm
fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize, dp : &mut Vec<Vec<Option<usize>>>) -> usize{// If either of string is empty, return length of other stringif n==0 || m==0 {return n+m;}// If result is already calculated, return itif dp[n][m].is_some() { return dp[n][m].unwrap(); }// If the character is same, we don't have to change it// And skip the character. Store and return itif str1[n-1] == str2[m-1] {dp[n][m] = Option::from(edit_distance(str1, str2, n-1, m-1, dp));return dp[n][m].unwrap();}// Else, we check all the 3 possibilities,// Return minimum of them as answerdp[n][m] = Option::from(1+min(min(// 1. If we add character, we can skip this character from// string 2, because, string 1 and 2 has same last character nowedit_distance(str1, str2, n, m-1, dp),// 2. If we remove the character, we skip it from string 1edit_distance(str1, str2, n-1, m, dp)),// 3. If we replace, both string 1 and 2 have same last characteredit_distance(str1, str2, n-1, m-1, dp)));return dp[n][m].unwrap();}
With driver code
fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize, dp : &mut Vec<Vec<Option<usize>>>) -> usize{// If either of string is empty, return length of other stringif n==0 || m==0 {return n+m;}// If result is already calculated, return itif dp[n][m].is_some() { return dp[n][m].unwrap(); }// If the character is same, we don't have to change it// And skip the character. Store and return itif str1[n-1] == str2[m-1] {dp[n][m] = Option::from(edit_distance(str1, str2, n-1, m-1, dp));return dp[n][m].unwrap();}// Else, we check all the 3 possibilities,// Return minimum of them as answerdp[n][m] = Option::from(1+min(min(// 1. If we add character, we can skip this character from// string 2, because, string 1 and 2 has same last character nowedit_distance(str1, str2, n, m-1, dp),// 2. If we remove the character, we skip it from string 1edit_distance(str1, str2, n-1, m, dp)),// 3. If we replace, both string 1 and 2 have same last characteredit_distance(str1, str2, n-1, m-1, dp)));return dp[n][m].unwrap();}// Driver Codeuse std::io::stdin;use std::cmp::min;fn take_string() -> Vec<char> {let mut input = String::new();stdin().read_line(&mut input).unwrap();let vec:Vec<char> = input.trim().chars().collect();return vec;}fn main() {let str1 = take_string();let str2 = take_string();let mut dp = vec![vec![None; str2.len()+1]; str1.len()+1];println!("{}", edit_distance(&str1, &str2, str1.len(), str2.len(), &mut dp));}
Input
ABC
XYA
Output
3
Time Complexity : O( m×n )
Space Complexity : O( m×n )
Tabulation ( Bottom-up DP ) Method
Although time and space complexities of tabulation as well as memoization method are same, tabulation is much more efficient as there are a lot of expensive recursive calls in memoization.
In tabulation method, we make the matrix, and fill it first on the basis of base condition, and then on the basis of previous row.
Algorithm
- Firstly, for the first row, set the value of each cell of the matrix as corresponding column number, because if length of string1 is 0, the answer is length of string2. Similarly, fill the first column.
- Now, traverse the matrix row wise ( or column wise, as per your preference ), and if the given character of both strings is same, set the dp[i][j] as dp[i-1][j-1].
- Else, set the element as 1 + minimum of all three previous elements, that is, dp[i-1][j], dp[i-1][j-1] and dp[i][j-1].
- Finally, return the last value of matrix, that is, dp[n][m].
Function
Here is the function using above algorithm
fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize) -> usize{// Create a dp matrixlet mut dp = vec![vec![0; str2.len()+1]; str1.len()+1];// Set the first row as corresponding column numberfor i in 1..str2.len()+1 {dp[0][i]=i;}// Set the first column as corresponding row numberfor i in 1..str1.len()+1 {dp[i][0] = i;}// Now, traverse the matrix for each row and column.for i in 1..str1.len()+1 {for j in 1..str2.len()+1 {// If the character is same, just copy dp[i-1][j-1]if str1[i-1] == str2[j-1] { dp[i][j] = dp[i-1][j-1]; }// Else, set it to 1 + minimum of previous 3 cellselse {dp[i][j] = 1 + min(dp[i][j-1],min(dp[i-1][j], dp[i-1][j-1]));}}}return dp[n][m];}
With Driver code
fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize) -> usize{// Create a dp matrixlet mut dp = vec![vec![0; str2.len()+1]; str1.len()+1];// Set the first row as corresponding column numberfor i in 1..str2.len()+1 {dp[0][i]=i;}// Set the first column as corresponding row numberfor i in 1..str1.len()+1 {dp[i][0] = i;}// Now, traverse the matrix for each row and column.for i in 1..str1.len()+1 {for j in 1..str2.len()+1 {// If the character is same, just copy dp[i-1][j-1]if str1[i-1] == str2[j-1] { dp[i][j] = dp[i-1][j-1]; }// Else, set it to 1 + minimum of previous 3 cellselse {dp[i][j] = 1 + min(dp[i][j-1],min(dp[i-1][j], dp[i-1][j-1]));}}}return dp[n][m];}// Driver Codeuse std::io::stdin;use std::cmp::min;fn take_string() -> Vec<char> {let mut input = String::new();stdin().read_line(&mut input).unwrap();let vec:Vec<char> = input.trim().chars().collect();return vec;}fn main() {let str1 = take_string();let str2 = take_string();println!("{}", edit_distance(&str1, &str2, str1.len(), str2.len()));}
Input
ABC
XYA
Output
3
Time Complexity : O( m×n )
Space Complexity : O( m×n )
Space Optimized Tabulation Method
If we observe the above tabulation method carefully, we find that for calculating edit distance for two strings, only current and previous row are required.
Hence, we can optimize our space complexity, by storing only the previous row instead of the whole matrix.
Function
Here is the function using space optimization of tabulation method.
fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize) -> usize{// Create a dp matrixlet mut prev = vec![0; m+1];let mut curr = vec![0; m+1];// Set the first row as corresponding column numberfor i in 1..m+1 {prev[i] = i;}// Now, traverse for each character of both the strings.for i in 1..n+1 {// Set current's first element as corresponding row numbercurr[0] = i;// Now traverse for each column in given rowfor j in 1..m+1 {// If the character is same, just copy previous[j-1]if str1[i-1] == str2[j-1] { curr[j] = prev[j-1]; }// Else, set it to 1 + minimum of previous 3 cellselse {curr[j] = 1 + min(curr[j-1],min(prev[j], prev[j-1]));}}// Now, copy current row to previous row// No need to reinitialise current row,// Because we are not using those valuesprev=curr.clone();}return curr[m];}
Use the same driver code.
Input
ABC
XYA
Output
3
Time Complexity : O( m×n )
Space Complexity : O( m )
Conclusion
Edit distance is very commonly seen Dynamic Programming Problem and is very similar to Longest Common Subsequence Problem.
In this problem, we have to determine minimum number of operations required to transform one string into the other.
In this article, we saw how to solve the CSES Edit Distance, first using recursion and then using Dynamic Programming, memoization as well as tabulation method, and latter the space optimized tabulation method in Rust Language.
Here is the optimized function for easy access
fn edit_distance(str1:&Vec<char>, str2:&Vec<char>, n:usize, m:usize) -> usize{let mut prev = vec![0; m+1];let mut curr = vec![0; m+1];for i in 1..m+1 { prev[i] = i; }for i in 1..n+1 {curr[0] = i;for j in 1..m+1 {if str1[i-1] == str2[j-1] { curr[j] = prev[j-1]; }else {curr[j] = 1 + min(curr[j-1],min(prev[j], prev[j-1])); } }prev=curr.clone();}return curr[m];}
Thank You