Longest Common Subsequence ( LCS )
and program to find Longest Common Subsequence using Dynamic Programming in Rust.
Introduction
Longest Common Subsequence is a classical Dynamic Programming problem, in which we have to find the length of longest common subsequence of the given strings.
Subsequence of a string is defined as a string, that is formed by removing 1 or more elements from the original string.
For Example : For the string "RUSTP", the strings "RUST", "RP", "R", "UP" etc. are subsequences, but the strings "PT", "Z", "TS"
So, we are given 2 strings, say string1
and string2
we have to find length of Longest Common Subsequence.
For Example : If string1
= "abcdef" and string2
= "acfbde", the longest common subsequence is "abde", so answer is 4.
Note : In Rust Language, the String
is UTF-8 encoded by default, hence, indexing is not possible in String
type. Hence, we use Vec<char>
instead.
Recursive Solution
Before jumping into dynamic programming solution, we will first have a look at recursive solution.
Algorithm
In recursive solution, we apply the given algorithm.
- If the last character of both the string is same, we must include it to our solution and recursively find the LCS of remaining string.
- Else, we first check by removing last character from first string and then the last character from second string, and return the maximum of both the cases.
- Repeat steps 1 and 2 till either of string is empty.
Function
Here is the function using above algorithm
fn longest_common_subsequence(string1:&Vec<char>, string2:&Vec<char>, n:usize, m:usize) -> usize {// If there is no character in any of the string, return 0if n==0 || m==0 {return 0;}// If last character of both strings is same, we include that in our lcs// Hence, we return 1 + lcs of remaining parts of both stringsif string1[n-1] == string2[m-1] {return 1+longest_common_subsequence(string1, string2, n-1, m-1);}// Now, if the last character is not same, we check by removing 1 letter from each string// And return their max as answerreturn max( longest_common_subsequence(string1, string2, n-1, m),longest_common_subsequence(string1, string2, n, m-1));}
With driver code
use std::cmp::max;use std::io;fn longest_common_subsequence(string1:&Vec<char>, string2:&Vec<char>, n:usize, m:usize) -> usize {// If there is no character in any of the string, return 0if n==0 || m==0 {return 0;}// If last character of both strings is same, we include that in our lcs// Hence, we return 1 + lcs of remaining parts of both stringsif string1[n-1] == string2[m-1] {return 1+longest_common_subsequence(string1, string2, n-1, m-1);}// Now, if the last character is not same, we check by removing 1 letter from each string// And return their max as answerreturn max( longest_common_subsequence(string1, string2, n-1, m),longest_common_subsequence(string1, string2, n, m-1));}// Driver Code// Take vector of charactersfn take_string() -> Vec<char> {let mut input = String::new();io::stdin().read_line(&mut input).unwrap();let vec:Vec<char> = input.trim().chars().collect();return vec;}fn main() {// Input Stringlet string1 = take_string();let string2 = take_string();// Print the length of longest common subsequenceprintln!("Length of longest common subsequence is : {}",longest_common_subsequence(&string1, &string2, string1.len(), string2.len()));}
Input
abcdef
acfbde
Output
Length of longest common subsequence is : 4
Time Complexity : O( 2m+n )
Space Complexity : O( m+n )
( Space complexity includes recursive stack space )
Overlapping Sub-problems
If we have a look carefully on recursive approach, we computed the LCS of multiple subsequence or substrings of original strings again and again. This is called overlapping sub-problems.
For example, if we have to compute LCS of strings abcdefghi
and jklmnopqr
, the LCS of strings abcd
and jklm
is computed more than 1000 times, and takes hundreds of iterations each time.
So, we simply store the result for each sub-problem, and return it if it is already computed, hence preventing it computing again and again.
In the LCS problem, we take 2D matrix, such that matrix[i][j] contains the length of LCS of first i
characters of string1 with first j
characters of string2.
For example, if the string1 is abcdef
and string 2 is acfbde
, then matrix[3][4] will contain length of LCS of strings abc
and acfb
Memoization ( Top-down DP ) Method
In memoization method, we simply take a DP matrix, and store the computed result.
Algorithm
- Initially, set all the elements of dp matrix to -1. ( We do not set it to 0 because LCS of completely mismatch strings is also 0. Hence it will compute again and again)
- If the length of LCS has already been found and stored in matrix, return the length of LCS.
- Else, compute the length of LCS and store it in DP matrix at dp[i][j], where i is length of string1 and j is length of string2.
Function
Here is the function using above algorithm
fn longest_common_subsequence(string1:&Vec<char>, string2:&Vec<char>, n:usize, m:usize, dp:&mut Vec<Vec<i64>>) -> i64 {// If there is no character in any of the string, return 0if n==0 || m==0 {return 0;}// If already computed, return computed valueif dp[n][m]!=-1 {return dp[n][m];}// If last character of both strings is same, we include that in our lcs// Hence, we return 1 + lcs of remaining parts of both stringsif string1[n-1] == string2[m-1] {dp[n][m] = 1+longest_common_subsequence(string1, string2, n-1, m-1, dp);return dp[n][m];}// Now, if the last character is not same, we check by removing 1 letter from each string// And return their max as answerdp[n][m] = max( longest_common_subsequence(string1, string2, n-1, m, dp),longest_common_subsequence(string1, string2, n, m-1, dp));return dp[n][m];}
With Driver code
use std::cmp::max;use std::io;fn longest_common_subsequence(string1:&Vec<char>, string2:&Vec<char>, n:usize, m:usize, dp:&mut Vec<Vec<i64>>) -> i64 {// If there is no character in any of the string, return 0if n==0 || m==0 {return 0;}// If already computed, return computed valueif dp[n][m]!=-1 {return dp[n][m];}// If last character of both strings is same, we include that in our lcs// Hence, we return 1 + lcs of remaining parts of both stringsif string1[n-1] == string2[m-1] {dp[n][m] = 1+longest_common_subsequence(string1, string2, n-1, m-1, dp);return dp[n][m];}// Now, if the last character is not same, we check by removing 1 letter from each string// And return their max as answerdp[n][m] = max( longest_common_subsequence(string1, string2, n-1, m, dp),longest_common_subsequence(string1, string2, n, m-1, dp));return dp[n][m];}// Driver Code// Take vector of charactersfn take_string() -> Vec<char> {let mut input = String::new();io::stdin().read_line(&mut input).unwrap();let vec:Vec<char> = input.trim().chars().collect();return vec;}fn main() {// Input Stringlet string1 = take_string();let string2 = take_string();// Make a DP arraylet n = string1.len();let m = string2.len();// We are making a vector of vectors with n rows and m columns// We set each element to -1 initially// If we take 0, its complexity will become 2^(m+n) in the worst case// Because if strings mismatch, there LCS is also 0let mut dp = vec![vec![-1 as i64; m+1];n+1];// Print the length of longest common subsequenceprintln!("Length of longest common subsequence is : {}",longest_common_subsequence(&string1, &string2, string1.len(), string2.len() , &mut dp));}
Input
abcdef
acfbde
Output
Length of longest common subsequence is : 4
Time Complexity : O( m*n )
Space Complexity : O( m*n )
Tabulation ( Bottom-up DP ) Method
Although time and space complexities of tabulation as well as memoization method are same, tabulation is much more efficient as there are a lot of expensive recursive calls in memoization.
In tabulation method, we make the matrix, and fill it on the basis of base condition, and then on the basis of LCS of previous values.
Algorithm
- Initially, set all the elements of all the columns and rows to 0, because LCS = 0 if either of string is empty.
- For ith row and jth column, if string1[i] == string2[j], then set dp[i][j] = 1+dp[i-1][j-1].
- Else, set dp[i][j] = maximum of dp[i][j-1] and dp[i-1][j]
If we traverse the matrix either row-wise or column-wise, it is always guaranteed that dp[i-1][j-1], dp[i][j-1] and dp[i-1][j] are already processed, hence ensuring our answer is correct.
Function
Here is the function using above algorithm
fn longest_common_subsequence(string1:&Vec<char>, string2:&Vec<char>, n:usize, m:usize, dp:&mut Vec<Vec<i64>>) -> i64 {// Set all the elements in 0th row and column to 0for i in 0..m+1 { dp[0][i] = 0; }for i in 0..n+1 { dp[i][0] = 0; }// Traverse each row and column, row wisefor i in 1..n+1 {for j in 1..m+1 {// If both string have same character, dp[i][j] = dp[i-1][j-1]if string1[i-1] == string2[j-1] {dp[i][j] = 1+dp[i-1][j-1];}// Else, find maximum of dp[i-1][j] and dp[i][j-1]else {dp[i][j] = max(dp[i-1][j], dp[i][j-1]);}}}return dp[n][m];}
Use the same driver code.
Input
abcdef
acfbde
Output
Length of longest common subsequence is : 4
Time Complexity : O( m*n )
Space Complexity : O( m*n )
Though time and space complexities are same as memoization method, yet it is much more optimized than memoization method.
Conclusion
Longest Common Subsequence is a classical Dynamic Programming problem, in which we have to find the length of longest common subsequence of the given strings.
In this article, we saw how to find Longest Common Subsequence ( LCS ) of two strings, first using Recursion and then using Dynamic Programming methods, Memoization as well as Tabulation method in Rust Language.
Here is the optimized function for easy access
fn longest_common_subsequence(string1:&Vec<char>, string2:&Vec<char>, n:usize, m:usize, dp:&mut Vec<Vec<i64>>) -> i64 {for i in 0..m+1 { dp[0][i] = 0; }for i in 0..n+1 { dp[i][0] = 0; }for i in 1..n+1 {for j in 1..m+1 {if string1[i-1] == string2[j-1] { dp[i][j] = 1+dp[i-1][j-1]; }else { dp[i][j] = max(dp[i-1][j], dp[i][j-1]); }}}return dp[n][m];}
Thank You