C6 Dynamic Programming
Dated: 31-01-2025
Fibonacci Sequence
History
Leonardo Pisano (nickname: Fibonacci
, son of Bonacci
born 1170, died 1250) published a book named Liber abaci
in 1202 which was based on arithmetic and algebra which had a rabbits problem in its 3rd section which led to introduction to Fibonacci numbers
and Fibonacci sequence
.
The problem was that if we start from 2 rabbits and each month they create a new one, how many rabbits we will get in the end of the year?
FIB(n){
if (n < 2)
then return n
else return FIB(n - 1) + FIB(n - 2)
}
recursion calls during computation of Fibonacci number
A single recursive call to \(fib(n)\) results into one recursive call to \(fib(n - 1)\), two recursive calls to \(fib(n - 2)\) and in general, \(k\) recursive calls to \(fib(n - k)\).
With each call, we are computing the number from scratch and to avoid that, we will store the results of the recursive calls and look them up again when we need them later. This process is called memorization
.
MemoFib(n) {
if (n < 2)
then return n
if (F[n] is undefined)
then F[n] <- MemoFib(n - 1) + MemoFib(n + 2)
return F[n]
}
Tracing through the recursive calls to \(MemoFib(n)\), we find that the array \(F[]\) gets filled bottom up starting from \(F[2]\) up to \(F[n]\).
We can replace recursion with simple for
loop.
This gives us our first explicit dynamic programming
algorithm
IterFib(n) {
F[0] <- 0
F[1] <- 1
for i <- 2 to n
do
F[i] <- F[i - 1] + F[i - 2]
return F[n]
}
This algorithm takes \(O(n)\) to compute \(F_n\).
The original algorithm with recursion takes \(\Theta(\phi^n), \phi = \frac{1 + \sqrt 5}{2} \approx 1.618\)
Dynamic Programming
Dynamic programming is recursion without repetition.
Developing such algorithms generally involves 2 steps:
Formulate the Problem Recursively
Write the formula for the whole problem as a combination of answers to smaller sub problems.
Build Solution to Recurrence from Bottom up
Write an algorithm which starts from base cases and works itself up to the final answer.
Dynamic programming algorithms involve saving the answers to sub problems in some sort of table.
Edit Distance
If we are given 2 strings \(S_1\) and \(S_2\) then edit distance
is the minimum number of point mutations which allows us to convert \(S_1 \rightarrow S_2\) or \(S_2 \rightarrow S_1\).
Where the point mutations can be:
- Modifying a letter
- Inserting a letter
- Deleting a letter
Example
The edit distance
between "Food" and "Money" is at most 4
.
- \(\text{\textbf{F}ood} \to \text{\textbf{M}ood}\) (Modification)
- \(\text{Mo\textbf{o}d} \to \text{Mo\textbf{n}d}\) (Modification)
- \(\text{Mon\textbf{d}} \to \text{Mon\textbf{y}}\) (Modification)
- \(\text{Mony} \to \text{Mon\textbf{e}y}\) (Insertion)
Applications
Spell Checking
When a word is found such that it is not in the dictionary then other words with small edit distance
which are part of the dictionary are suggested.
Plagiarism Detection
Assume an essay has been copied then the edit distance
can be used to determine how much similar the given copy of essay is to the original one.
Edit Distance Algorithm
S D I M D M
-----------
M A _ T H S
A _ R T _ S
The string SDIMDM
is the edit transcript
which describes the transformation of one string to another one.
- S for substitution
- I for Insertion
- D for deletion
- M for match
\(M\) is not counted in when considering the edit distance
.
Since the edit transcript
is \(SDIMDM\), therefore the edit distance
is
Dynamic Programming Algorithm
Imagine 2 strings \(A\) and \(B\) with sizes \(m\) and \(n\) respectively then the edit distance between the entire strings can be defined as \(E(m, n)\).
There are some base cases.
- The only way to convert an empty
string
into astring
of \(j\) characters is by doing \(j\) insertions.
- The only way to convert a
string
of \(i\) characters into an emptystring
is by doing \(i\) deletions.
There are 4 possibilities for the last column in shortest possible edit sequence
.
Deletion
Last entry in bottom row is empty.
- - - i = 3
A L G O R I T H M
A L _ T R U I S T I C
- - - j = 2
In this case
Insertion
Last entry in top row is empty
- - - - - - i = 5
A L G O R _ I T H M
A L _ T R U I S T I C
- - - - - - j = 5
In this case
Substitution
Both rows have last character in the last column
Different
- - - - - i = 5
A L G O R I T H M
A L T R _ U I S T I C
- - - - - j = 4
Same
If they are same then no substitution is needed
The edit distance
\(E(i, j)\) is the smallest of 4 possibilities
Example
Consider the edit distance between "ARTS" and "MATHS"
A R T S
M A T H S
The edit distance would be \(E(4, 5)\).
We will use the dynamic programming approach where base case \(E(0, j)\) is to fill the first row and base case \(E(i, 0)\) is to fill the first column.
First row and first column entries using the base cases
The table not only shows the values of \(E[i, j]\) but also shows arrows to indicate how it was computed using \(E[i - 1, j], E[i , j - 1], E[i - 1, j - 1]\).
Thus if a cell \(E[i, j]\) has a:
- Down arrow, the minimum was found using \(E[i - 1, j]\).
- Right arrow, the minimum was found using \(E[i, j - 1]\).
- Diagonal down arrow, the minimum was found using \(E[i - 1, j - 1]\)
These arrows are later used to determine the edit transcript
.
Computing \(E[1, 1]\) and \(E[1, 2]\)
Computing \(E[1, 3]\) and \(E[1, 4]\)
The final table with all \(E[i, j]\) entries computed
An edit transcript
can be extracted by following a path from \(E[0, 0]\) to \(E[4, 5]\).
all red paths show the possible edit transcripts
which could be extracted
Solution Path 1
1 0 1 1 0 = 3
D M S S M
---------
M A T H S
_ A R T S
Solution Path 2
1 1 0 1 0 = 3
S S M D M
---------
M A T H S
A R T _ S
Solution Path 3
1 0 1 0 1 0 = 3
D M I M D M
-----------
M A _ T H S
_ A R T _ S
Analysis of Dynamic Programming Edit Distance
There are \(\Theta(n^2)\) entries in the matrix and each entry \(E(i, j)\) takes \(\Theta(1)\) time to compute.
Hence the total running time is \(\Theta(n^2)\).
Chain Matrix Multiply
Suppose we want to multiply a series of matrices
.
If \(A\) is a matrix
of dimensions \(p \times q\) and \(B\) is a matrix
with dimensions \(q \times r\) then \(C = A \times B\) has dimensions \(p \times r\).
There are \(p \times r\) entries in \(C\) and each takes \(O(q)\) time to compute.
Therefore, the total number of multiplications are \(p \times r \times q\).
Consider 3 matrices
.
- \(A_1\) which is \(5 \times 4\).
- \(A_2\) which is \(4 \times 6\).
- \(A_3\) which is \(6 \times 2\).
Their multiplication can be carried out as such
The cost however is
The chain matrix multiplication problem
is stated as follows:
Chain Matrix Multiplication Problem
Given a sequence
\(A_1, A_2, \ldots, A_n\) and dimensions \(p_0, p_1, \ldots, p_n\) where \(A_i\) is of dimension \(p_{i - 1} \times p_i\), determine the order of multiplication that minimizes the number of operations.
We could brute force to figure out the best way to parenthesis the multiplication but for \(n\) items, there are \(n - 1\) ways in which outer most parentheses can be placed which is large.
Once we split after \(k\) matrices
.
We create 2 sub lists, one with \(k\) matrices
and other with \(n - k\) matrices
.
If there are \(L\) ways to parenthesize left sub list and \(R\) ways to parenthesize the right sub list then the total ways is \(L \times R\).
Therefore, the number of different ways to parenthesize \(n\) items is:
This is related to a famous function
1 in combinatorics
called catalan numbers
which is given by
The dominating factor is \(4^n\) which suggests that \(P(n)\) will grow large very quickly hence this approach is not practical.
Chain Matrix Multiplication Dynamic Programming Formulation
Let \(A_{i .. j}\) be the result of multiplying matrices
through \(i\) to \(j\).
It is easy to see that \(A_{i .. j}\) is a \(p_{i - 1} \times p_j\) matrix
.
At the highest level of parenthesizing, we multiply 2 matrices
.
For \(1 \le i \le j \le n\), let \(m[i, j]\) denote the minimum number of multiplications needed to compute \(A_{i .. j}\)
The optimum can be described by the following recursion formation:
- If \(i = j\) then there is only one
matrix
and thus \(m[i, i] = 0\) (the diagonal entries). - If \(i < j\) then we are asking for the product \(A_{i .. j}\)
- This can be split by considering each \(k\), \(i \le k \le j\) as \(A_{i .. k}\) times \(A_{k + 1 .. j}\)
The optimum time to compute \(A_{i .. k}\) is \(m[i, k]\) and for \(A_{k + 1 .. j}\), it is \(m[k + 1, j]\).
Since \(A_{i .. k}\) is a \(p_{i - 1} \times p_k\) matrix
and \(A_{k + 1 .. j}\) is a \(p_k \times p_j\) matrix
, the time to multiply them is \(p_{i - 1} \times p_k \times p_j\).
This suggests
We will fill \(m\) along the diagonals.
Set all \(m[i, i] = 0\) using the base condition.
Compute cost of multiplication of sequence
of 2 matrices
.
Example:
Product of 5 matrices
Next we compute cost of multiplication for sequences of 3 matrices
.
These are \(m[1, 3], m[2, 4], m[3, 5], \ldots, m[n - 2, n] \cdot m[1, 3]\).
We repeat this for 4, 5 and higher number of matrices
. The final result will end up in \(m[1, n]\).
Example
Imagine 5 matrices
which we want to multiply, with dimensions:
- \(5 \times 4\)
- \(4 \times 6\)
- \(6 \times 2\)
- \(2 \times 7\)
- \(7 \times 3\)
We will compute the entries of \(m\) matrix
starting with base condition.
We will first fill up the diagonal
Next, we compute the entries in the first super diagonal, which is the diagonal above the main diagonal.
The \(m\) matrix
now looks as follows:
Now we proceed to second super diagonal but this time we will try two possible values of \(k\).
There are 2 possible splits to compute \(m[1, 3]\) but we will choose the one which yields the minimum.
The minimum \(m[1, 3] = 88\) occurs with \(k = 1\)
Similarly for \(m[2, 4]\) and \(m[3, 5]\)
minimum \(m[2, 4] = 104\) at \(k = 3\)
minimum \(m[3, 5] = 78\) at \(k = 3\)
We repeat the same process for other diagonals but the number of \(k\) splits increases.
minimum \(m[1, 4] = 158\) at \(k = 3\)
minimum \(m[2, 5] = 114\) at \(k = 3\)
The matrix
\(m\) at this stage is:
That leaves \(m[1, 5]\) which can be computed as follows:
minimum \(m[1, 5] = 160\) at \(k = 3\)
Thus the final cost matrix
looks like
Here is the order in which m
entries are calculated
and the split \(k\) values that lead to a minimum \(m[i, j]\) value
Based on the computation, the minimum cost for multiplying the five matrices
is 160 and the optimal order for multiplication is
This can be represented by a binary tree
.
Optimum matrix
multiplication order for the five matrices example
MatrixChain(p, N) {
for i = 1, N
do m[i, i] <- 0
for L = 2, N
do
for i = 1, n - L + 1
do
j <- i + L - 1
m[i, j] <- infinity
for k = 1, j - 1
do
t <- m[i, k] + m[k + 1, j] + p_{i - 1} * p_k * p_j
if (t < m[i, j])
then
m[i, j] <- t
s[i, j] <- k
}
Analysis
There are 3 nested loops and each runs maximum of \(n\) times.
Therefore, total time is \(\Theta(n^3)\).
The \(s\) matrix
stores the values \(k\).
The \(s\) matrix
can be used to extract the order in which matrices
are to be multiplied.
Hence the algorithm
which carries out matrix
multiplication to compute \(A[i .. j]\)
if (i = j)
then return A[i]
else
k <- s[i, j]
X <- Multiply(i, k)
Y <- Multiply(k + 1, j)
return X + Y
0/1 Knapsack Problem
Given a bag with maximum capacity \(W\) and a set \(S\) consisting of \(n\) items.
Each item \(i\) has some weight \(w_i\) and value \(v_i\).
How to pack the bag to achieve maximum amount of value of packed items?
This problem belongs to optimization problems.
Mathematically, the problem is
The problem is called the "0 - 1" problem, because each item has to be either entirely accepted or rejected.
- Since there are \(n\) items so there are total \(2^n\) combinations of items because item is either accepted or rejected.
- We go through all combinations and find the one with most total value with total weight less than or equal to \(W\).
The running time for this algorithm
is \(O(n^2)\) and can be improved by using dynamic programming.
Simple sub Problems
We should be able to break the problem into sub problems with same structure.
Principle of Optimality
Recursively define the value of an optimal solution.
Express the solution of the original problem in terms of optimal solutions for smaller problems.
Bottom up Computation
Compute the value of an optimal solution in a bottom-up fashion by using a table structure.
Construction of Optimal Solution
Construct an optimal solution from computed information
If the items are labelled \(1, 2, \ldots, n\) then sub problem would be to find an optimal solution for
Now the question is: can we define the final solution \(S_n\) in terms of sub problems \(S_k\)? Here's why we cannot do that.
Consider the optimal solution if we can choose items 1 through 4 only.
Solution \(S_4\)
- Items chosen: \(1, 2, 3, 4\)
- Total Weight: \(2 + 3 + 4 + 5 = 14\)
- Total value: \(3 + 4 + 5 + 8 = 20\)
Item | \(w_i\) | \(v_i\) |
---|---|---|
1 | 2 | 3 |
2 | 3 | 4 |
3 | 4 | 5 |
4 | 5 | 8 |
Now consider optimal solution when items 1 to 5 are available
Solution \(S_5\)
- Items chosen: \(1, 3, 4, 5\)
- Total weight: \(2 + 4 + 5 + 9 = 20\)
- Total value: \(3 + 5 + 8 + 10 = 26\)
\(S_4\) is not part of \(S_5\).
Item | \(w_i\) | \(v_i\) |
---|---|---|
1 | 2 | 3 |
2 | 3 | 4 |
3 | 4 | 5 |
4 | 5 | 8 |
5 | 9 | 10 |
The solution for \(S_4\) is not part of solution of \(S_5\).
Hence our definition of the sub problem is flawed and we need another one.