Longest Common Subsequence ( LCS )

and program to find Longest Common Subsequence using Dynamic Programming in Rust.

Introduction

Longest Common Subsequence is a classical Dynamic Programming problem, in which we have to find the length of longest common subsequence of the given strings.

Subsequence of a string is defined as a string, that is formed by removing 1 or more elements from the original string.

For Example : For the string "RUSTP", the strings "RUST", "RP", "R", "UP" etc. are subsequences, but the strings "PT", "Z", "TS"

So, we are given 2 strings, say string1 and string2 we have to find length of Longest Common Subsequence.

For Example : If string1 = "abcdef" and string2 = "acfbde", the longest common subsequence is "abde", so answer is 4.

Note : In Rust Language, the String is UTF-8 encoded by default, hence, indexing is not possible in String type. Hence, we use Vec<char> instead.

Recursive Solution

Before jumping into dynamic programming solution, we will first have a look at recursive solution.

Algorithm

In recursive solution, we apply the given algorithm.

  1. If the last character of both the string is same, we must include it to our solution and recursively find the LCS of remaining string.
  2. Else, we first check by removing last character from first string and then the last character from second string, and return the maximum of both the cases.
  3. Repeat steps 1 and 2 till either of string is empty.

Function

Here is the function using above algorithm

fn longest_common_subsequence(string1:&Vec<char>, string2:&Vec<char>, n:usize, m:usize) -> usize {
// If there is no character in any of the string, return 0
if n==0 || m==0 {
return 0;
}
// If last character of both strings is same, we include that in our lcs
// Hence, we return 1 + lcs of remaining parts of both strings
if string1[n-1] == string2[m-1] {
return 1+longest_common_subsequence(string1, string2, n-1, m-1);
}
// Now, if the last character is not same, we check by removing 1 letter from each string
// And return their max as answer
return max( longest_common_subsequence(string1, string2, n-1, m),
longest_common_subsequence(string1, string2, n, m-1));
}

With driver code

use std::cmp::max;
use std::io;
fn longest_common_subsequence(string1:&Vec<char>, string2:&Vec<char>, n:usize, m:usize) -> usize {
// If there is no character in any of the string, return 0
if n==0 || m==0 {
return 0;
}
// If last character of both strings is same, we include that in our lcs
// Hence, we return 1 + lcs of remaining parts of both strings
if string1[n-1] == string2[m-1] {
return 1+longest_common_subsequence(string1, string2, n-1, m-1);
}
// Now, if the last character is not same, we check by removing 1 letter from each string
// And return their max as answer
return max( longest_common_subsequence(string1, string2, n-1, m),
longest_common_subsequence(string1, string2, n, m-1));
}
// Driver Code
// Take vector of characters
fn take_string() -> Vec<char> {
let mut input = String::new();
io::stdin().read_line(&mut input).unwrap();
let vec:Vec<char> = input.trim().chars().collect();
return vec;
}
fn main() {
// Input String
let string1 = take_string();
let string2 = take_string();
// Print the length of longest common subsequence
println!("Length of longest common subsequence is : {}",
longest_common_subsequence(&string1, &string2, string1.len(), string2.len()));
}

Input

abcdef
acfbde

Output

Length of longest common subsequence is : 4

Time Complexity : O( 2m+n )
Space Complexity : O( m+n )

( Space complexity includes recursive stack space )

Overlapping Sub-problems

If we have a look carefully on recursive approach, we computed the LCS of multiple subsequence or substrings of original strings again and again. This is called overlapping sub-problems.

For example, if we have to compute LCS of strings abcdefghi and jklmnopqr, the LCS of strings abcd and jklm is computed more than 1000 times, and takes hundreds of iterations each time.

So, we simply store the result for each sub-problem, and return it if it is already computed, hence preventing it computing again and again.

In the LCS problem, we take 2D matrix, such that matrix[i][j] contains the length of LCS of first i characters of string1 with first j characters of string2.

For example, if the string1 is abcdef and string 2 is acfbde, then matrix[3][4] will contain length of LCS of strings abc and acfb

Memoization ( Top-down DP ) Method

In memoization method, we simply take a DP matrix, and store the computed result.

Algorithm

  1. Initially, set all the elements of dp matrix to -1. ( We do not set it to 0 because LCS of completely mismatch strings is also 0. Hence it will compute again and again)
  2. If the length of LCS has already been found and stored in matrix, return the length of LCS.
  3. Else, compute the length of LCS and store it in DP matrix at dp[i][j], where i is length of string1 and j is length of string2.

Function

Here is the function using above algorithm

fn longest_common_subsequence(string1:&Vec<char>, string2:&Vec<char>, n:usize, m:usize, dp:&mut Vec<Vec<i64>>) -> i64 {
// If there is no character in any of the string, return 0
if n==0 || m==0 {
return 0;
}
// If already computed, return computed value
if dp[n][m]!=-1 {
return dp[n][m];
}
// If last character of both strings is same, we include that in our lcs
// Hence, we return 1 + lcs of remaining parts of both strings
if string1[n-1] == string2[m-1] {
dp[n][m] = 1+longest_common_subsequence(string1, string2, n-1, m-1, dp);
return dp[n][m];
}
// Now, if the last character is not same, we check by removing 1 letter from each string
// And return their max as answer
dp[n][m] = max( longest_common_subsequence(string1, string2, n-1, m, dp),
longest_common_subsequence(string1, string2, n, m-1, dp));
return dp[n][m];
}

With Driver code

use std::cmp::max;
use std::io;
fn longest_common_subsequence(string1:&Vec<char>, string2:&Vec<char>, n:usize, m:usize, dp:&mut Vec<Vec<i64>>) -> i64 {
// If there is no character in any of the string, return 0
if n==0 || m==0 {
return 0;
}
// If already computed, return computed value
if dp[n][m]!=-1 {
return dp[n][m];
}
// If last character of both strings is same, we include that in our lcs
// Hence, we return 1 + lcs of remaining parts of both strings
if string1[n-1] == string2[m-1] {
dp[n][m] = 1+longest_common_subsequence(string1, string2, n-1, m-1, dp);
return dp[n][m];
}
// Now, if the last character is not same, we check by removing 1 letter from each string
// And return their max as answer
dp[n][m] = max( longest_common_subsequence(string1, string2, n-1, m, dp),
longest_common_subsequence(string1, string2, n, m-1, dp));
return dp[n][m];
}
// Driver Code
// Take vector of characters
fn take_string() -> Vec<char> {
let mut input = String::new();
io::stdin().read_line(&mut input).unwrap();
let vec:Vec<char> = input.trim().chars().collect();
return vec;
}
fn main() {
// Input String
let string1 = take_string();
let string2 = take_string();
// Make a DP array
let n = string1.len();
let m = string2.len();
// We are making a vector of vectors with n rows and m columns
// We set each element to -1 initially
// If we take 0, its complexity will become 2^(m+n) in the worst case
// Because if strings mismatch, there LCS is also 0
let mut dp = vec![vec![-1 as i64; m+1];n+1];
// Print the length of longest common subsequence
println!("Length of longest common subsequence is : {}",
longest_common_subsequence(&string1, &string2, string1.len(), string2.len() , &mut dp));
}

Input

abcdef
acfbde

Output

Length of longest common subsequence is : 4

Time Complexity : O( m*n )
Space Complexity : O( m*n )

Tabulation ( Bottom-up DP ) Method

Although time and space complexities of tabulation as well as memoization method are same, tabulation is much more efficient as there are a lot of expensive recursive calls in memoization.

In tabulation method, we make the matrix, and fill it on the basis of base condition, and then on the basis of LCS of previous values.

Algorithm

  1. Initially, set all the elements of all the columns and rows to 0, because LCS = 0 if either of string is empty.
  2. For ith row and jth column, if string1[i] == string2[j], then set dp[i][j] = 1+dp[i-1][j-1].
  3. Else, set dp[i][j] = maximum of dp[i][j-1] and dp[i-1][j]

If we traverse the matrix either row-wise or column-wise, it is always guaranteed that dp[i-1][j-1], dp[i][j-1] and dp[i-1][j] are already processed, hence ensuring our answer is correct.

Function

Here is the function using above algorithm

fn longest_common_subsequence(string1:&Vec<char>, string2:&Vec<char>, n:usize, m:usize, dp:&mut Vec<Vec<i64>>) -> i64 {
// Set all the elements in 0th row and column to 0
for i in 0..m+1 { dp[0][i] = 0; }
for i in 0..n+1 { dp[i][0] = 0; }
// Traverse each row and column, row wise
for i in 1..n+1 {
for j in 1..m+1 {
// If both string have same character, dp[i][j] = dp[i-1][j-1]
if string1[i-1] == string2[j-1] {
dp[i][j] = 1+dp[i-1][j-1];
}
// Else, find maximum of dp[i-1][j] and dp[i][j-1]
else {
dp[i][j] = max(dp[i-1][j], dp[i][j-1]);
}
}
}
return dp[n][m];
}

Use the same driver code.

Input

abcdef
acfbde

Output

Length of longest common subsequence is : 4

Time Complexity : O( m*n )
Space Complexity : O( m*n )

Though time and space complexities are same as memoization method, yet it is much more optimized than memoization method.

Conclusion

Longest Common Subsequence is a classical Dynamic Programming problem, in which we have to find the length of longest common subsequence of the given strings.

In this article, we saw how to find Longest Common Subsequence ( LCS ) of two strings, first using Recursion and then using Dynamic Programming methods, Memoization as well as Tabulation method in Rust Language.

Here is the optimized function for easy access

fn longest_common_subsequence(string1:&Vec<char>, string2:&Vec<char>, n:usize, m:usize, dp:&mut Vec<Vec<i64>>) -> i64 {
for i in 0..m+1 { dp[0][i] = 0; }
for i in 0..n+1 { dp[i][0] = 0; }
for i in 1..n+1 {
for j in 1..m+1 {
if string1[i-1] == string2[j-1] { dp[i][j] = 1+dp[i-1][j-1]; }
else { dp[i][j] = max(dp[i-1][j], dp[i][j-1]); }
}
}
return dp[n][m];
}

Thank You