Description
Problem 1: Pattern-matching: The brute-force
Problem 1.1: The brute-force pattern-matching algorithm [10 pt.]
Describe a text D and a pattern P such that the brute-force pattern-matching algorithm runs in
Ω(dp) time.The lengths of D and P are d and p, respectively.
Problem 1.2: Python’s str class and pattern-matching [20 pt]
In this part, you are asked to modify three pattern matching programs given to you (See appendix). Run your modified programs for varying-length patterns and show your results.
The count method in Python’s str class takes a text D and a pattern P and returns the
maximum number of non-overlapping occurrences of a P within D. As an example ‘cdcdcdcdc’.count(‘cdc’) returns 2.
1. Modify the brute-force pattern-matching to return non-overlapping occurrences of a P
within D.
2. Similar to the previous question (Problem 1.2.1), do the same on the Boyer-Moore program.
3. Similar to problem 1.2.1, modify the KMP program.
Problem 2: Experimental Analysis of Pattern-Matching Algorithms
[20 pt.]
Perform an experimental analysis of pattern matching algorithms in terms of:
1. Number of character comparison: Perform an experimental analysis of the efficiency of
the brute-force, the KMP and Boyer-Moore pattern matching algorithms for varying-length
patterns.
2. Relative speed comparison: Perform an experimental comparison of the brute-force, KMP,
and Boyer-Moore pattern-matching algorithms. Run each algorithm against large text documents using varying-length patterns and report the relative running times.
Assignment № 5 Page 2
Problem 3: Matrix-chain Multiplication
The matrix-chain multiplication problem: Given a chain of < D1, D2, . . . , Dn > of n matrices fully
parenthesize the product < D1 ·D2 · · · Dn > in a way so that the number of scalar multiplications
is minimized. Each Di has a pi−1 × pi dimension and i = 1, 2, . . . , n.
1. The Brute-Force: [10 pt.]: Implement a Python program to solve the matrix-chain multiplication problem by the brute force algorithm.
2. Bottom-up Dynamic Programming [20 pt.]: Implement a Python program to solve the
matrix-chain multiplication problem using bottom-up dynamic programming approach.
3. Dynamic Programming with Memoization [Extra Credit, 10 pt.]: Implement a Python program to solve the matrix-chain multiplication problem using dynamic programming with
memoization.
Problem 4: Longest Common Sub-sequence (LCS) Problem [20 pt.]
Implement a Python program to solve LCS problem using dynamic programming. Run your
program to find the best sequence alignment between DNA strings. Show your results.
Longest Common Sub-sequence (LCS) problem: Given two character strings over some
alphabet, find a longest string that is a sub-sequence of given two strings.
Data source: https://www.ncbi.nlm.nih.gov/genbank/
Directions
Please follow the syllabus guidelines in turning in your homework. While testing your programs,
run them with a variety of inputs and discuss your findings. This homework is due Sunday, Nov
14, 2021 10:00pm. OBSERVE THE TIME. Absolutely no homework will be accepted after that
time. All the work should be your own.
Assignment № 5 Page 3
Appendix
Python program for the Brute-Force pattern-matching algorithm
1 # Brute force
2 def find_brute (T , P ) :
3 n , m = len( T ) , len ( P )
4 # every starting position
5 for i in range (n – m +1) :
6 k = 0
7 # conduct O(k) comparisons
8 while k < m and T [ i + k ] == P [ k ]:
9 k += 1
10 if k == m :
11 return i
12 return -1
Python program for the Boyer-Moore pattern-matching algorithm
1 # Boyer - Moore
2 def find_boyer_moore (T , P ) :
3 n , m = len( T ) , len ( P )
4 if m == 0:
5 return 0
6 last = {}
7 for k in range ( m ) :
8 last [ P [ k ]] = k
9 i = m -1
10 k = m -1
11 while i < n :
12 # If match , decrease i,k
13 if T [ i ] == P [ k ]:
14 if k == 0:
15 return i
16 else :
17 i -= 1
18 k -= 1
19 # Not match , reset the positions
20 else :
21 j = last . get ( T [ i ] , -1)
22 i += m - min (k , j +1)
23 k = m -1
24 return -1
Assignment № 5 Page 4
Python program for the Knuth-Morris-Pratt pattern-matching algorithm
1 # KMP failure function
2 def compute_kmp_fail ( P ) :
3 m = len( P )
4 fail = [0] * m
5 j = 1
6 k = 0
7 while j < m :
8 if P [ j ] == P [ k ]:
9 fail [ j ] = k +1
10 j += 1
11 k += 1
12 elif k > 0:
13 k = fail [k -1]
14 else :
15 j += 1
16 return fail
1 # KMP
2 def find_kmp (T , P ) :
3 n , m = len( T ) , len ( P )
4 if m == 0:
5 return 0
6 fail = compute_kmp_fail ( P )
7 # print ( fail )
8 j = 0
9 k = 0
10 while j < n :
11 if T [ j ] == P [ k ]:
12 if k == m -1:
13 return j - m +1
14 j += 1
15 k += 1
16 elif k > 0:
17 k = fail [k -1]
18 else :
19 j += 1
20 return -1
Assignment № 5 Page 5