C6 Dynamic Programming

Dated: 31-01-2025

Fibonacci Sequence

History

Leonardo Pisano (nickname: Fibonacci, son of Bonacci born 1170, died 1250) published a book named Liber abaci in 1202 which was based on arithmetic and algebra which had a rabbits problem in its 3rd section which led to introduction to Fibonacci numbers and Fibonacci sequence.
The problem was that if we start from 2 rabbits and each month they create a new one, how many rabbits we will get in the end of the year?

\[F_0 = 0\]

\[F_1 = 1\]

\[F_n = F_{n - 1} + F_{n - 2}\]

FIB(n){
    if (n < 2)
    then return n
    else return FIB(n - 1) + FIB(n - 2)
}

recursion calls during computation of `Fibonacci number`

A single recursive call to \(fib(n)\) results into one recursive call to \(fib(n - 1)\), two recursive calls to \(fib(n - 2)\) and in general, \(k\) recursive calls to \(fib(n - k)\).
With each call, we are computing the number from scratch and to avoid that, we will store the results of the recursive calls and look them up again when we need them later. This process is called memorization.

MemoFib(n) {
    if (n < 2)
        then return n
    if (F[n] is undefined)
        then F[n] <- MemoFib(n - 1) + MemoFib(n + 2)
    return F[n]
}

Tracing through the recursive calls to \(MemoFib(n)\), we find that the array \(F[]\) gets filled bottom up starting from \(F[2]\) up to \(F[n]\).
We can replace recursion with simple for loop.
This gives us our first explicit dynamic programming algorithm

IterFib(n) {
    F[0] <- 0
    F[1] <- 1

    for i <- 2 to n
    do
        F[i] <- F[i - 1] + F[i - 2]

    return F[n]
}

This algorithm takes \(O(n)\) to compute \(F_n\).
The original algorithm with recursion takes \(\Theta(\phi^n), \phi = \frac{1 + \sqrt 5}{2} \approx 1.618\)

Dynamic Programming

Dynamic programming is recursion without repetition.
Developing such algorithms generally involves 2 steps:

Formulate the Problem Recursively

Write the formula for the whole problem as a combination of answers to smaller sub problems.

Build Solution to Recurrence from Bottom up

Write an algorithm which starts from base cases and works itself up to the final answer.

Dynamic programming algorithms involve saving the answers to sub problems in some sort of table.

Edit Distance

If we are given 2 strings \(S_1\) and \(S_2\) then edit distance is the minimum number of point mutations which allows us to convert \(S_1 \rightarrow S_2\) or \(S_2 \rightarrow S_1\).
Where the point mutations can be:

Modifying a letter
Inserting a letter
Deleting a letter

Example

The edit distance between "Food" and "Money" is at most 4.

\(\text{\textbf{F}ood} \to \text{\textbf{M}ood}\) (Modification)
\(\text{Mo\textbf{o}d} \to \text{Mo\textbf{n}d}\) (Modification)
\(\text{Mon\textbf{d}} \to \text{Mon\textbf{y}}\) (Modification)
\(\text{Mony} \to \text{Mon\textbf{e}y}\) (Insertion)

Applications

Spell Checking

When a word is found such that it is not in the dictionary then other words with small edit distance which are part of the dictionary are suggested.

Plagiarism Detection

Assume an essay has been copied then the edit distance can be used to determine how much similar the given copy of essay is to the original one.

Edit Distance Algorithm

S D I M D M
-----------
M A _ T H S
A _ R T _ S

The string SDIMDM is the edit transcript which describes the transformation of one string to another one.

S for substitution
I for Insertion
D for deletion
M for match

\(M\) is not counted in when considering the edit distance.
Since the edit transcript is \(SDIMDM\), therefore the edit distance is

\[1 + 1 + 1 + 0 + 1 + 0 = 4\]

Dynamic Programming Algorithm

Imagine 2 strings \(A\) and \(B\) with sizes \(m\) and \(n\) respectively then the edit distance between the entire strings can be defined as \(E(m, n)\).
There are some base cases.

The only way to convert an empty string into a string of \(j\) characters is by doing \(j\) insertions.

\[E(0, j) = j\]

The only way to convert a string of \(i\) characters into an empty string is by doing \(i\) deletions.

\[E(i, 0) = i\]

There are 4 possibilities for the last column in shortest possible edit sequence.

Deletion

Last entry in bottom row is empty.

- - - i = 3
A L G O R I T H M 
A L _ T R U I S T I C
- - - j = 2

In this case

\[E(i, j) = E(i - 1, j) + 1\]

Insertion

Last entry in top row is empty

- - - - - - i = 5
A L G O R _ I T H M
A L _ T R U I S T I C
- - - - - - j = 5

In this case

\[E(i, j) = E(i, j - 1) + 1\]

Substitution

Both rows have last character in the last column

Different

\[E(i, j) = E(i - 1, j - 1) + 1\]

- - - - - i = 5
A L G O R I T H M
A L T R _ U I S T I C
- - - - - j = 4

Same

If they are same then no substitution is needed

\[E(i, j) = E(i - 1, j - 1)\]

The edit distance \(E(i, j)\) is the smallest of 4 possibilities

\[E(i, j) = \min\left(\begin{matrix}E(i - 1, j) + 1\\E(i, j - 1) + 1\\ E(i - 1, j - 1) + 1 &\text{if } A[i] \ne B[j]\\ E(i - 1, j - 1) &\text{if } A[i] = B[j]\\ \end{matrix}\right)\]

Example

Consider the edit distance between "ARTS" and "MATHS"

A R T S
M A T H S

The edit distance would be \(E(4, 5)\).

\[ E(4, 5) = \min\left( \begin{matrix} E(3, 5) + 1\\ E(4, 4) + 1\\ E(3, 4) + 1 &\text{if } A[4] \ne B[5]\\ E(3, 4) &\text{if } A[4] = B[5] \end{matrix} \right) \]

We will use the dynamic programming approach where base case \(E(0, j)\) is to fill the first row and base case \(E(i, 0)\) is to fill the first column.

Pasted image 20250131095326.png — First row and first column entries using the base cases

The table not only shows the values of \(E[i, j]\) but also shows arrows to indicate how it was computed using \(E[i - 1, j], E[i , j - 1], E[i - 1, j - 1]\).
Thus if a cell \(E[i, j]\) has a:

Down arrow, the minimum was found using \(E[i - 1, j]\).
Right arrow, the minimum was found using \(E[i, j - 1]\).
Diagonal down arrow, the minimum was found using \(E[i - 1, j - 1]\)

These arrows are later used to determine the edit transcript.

Pasted image 20250131100203.png — Computing \(E[1, 1]\) and \(E[1, 2]\)

Pasted image 20250131100257.png — Computing \(E[1, 3]\) and \(E[1, 4]\)

Pasted image 20250131100440.png — The final table with all \(E[i, j]\) entries computed

An edit transcript can be extracted by following a path from \(E[0, 0]\) to \(E[4, 5]\).

Pasted image 20250131100801.png — all red paths show the possible `edit transcripts` which could be extracted

Solution Path 1

1 0 1 1 0 = 3
D M S S M
---------
M A T H S
_ A R T S

Solution Path 2

1 1 0 1 0 = 3
S S M D M
---------
M A T H S
A R T _ S

Solution Path 3

1 0 1 0 1 0 = 3
D M I M D M
-----------
M A _ T H S
_ A R T _ S

Analysis of Dynamic Programming Edit Distance

There are \(\Theta(n^2)\) entries in the matrix and each entry \(E(i, j)\) takes \(\Theta(1)\) time to compute.
Hence the total running time is \(\Theta(n^2)\).

Chain Matrix Multiply

Suppose we want to multiply a series of matrices.

\[A_1 A_2 \ldots A_n\]

If \(A\) is a matrix of dimensions \(p \times q\) and \(B\) is a matrix with dimensions \(q \times r\) then \(C = A \times B\) has dimensions \(p \times r\).

\[1 \le i \le p, \quad 1 \le j \le r\]

There are \(p \times r\) entries in \(C\) and each takes \(O(q)\) time to compute.

\[C[i, j] = \sum_{k = 1}^q A[i, k]B[k, j]\]

Therefore, the total number of multiplications are \(p \times r \times q\).
Consider 3 matrices.

\(A_1\) which is \(5 \times 4\).
\(A_2\) which is \(4 \times 6\).
\(A_3\) which is \(6 \times 2\).

Their multiplication can be carried out as such

\[(A_1 A_2) A_3 \quad \text{or} \quad A_1 (A_2 A_3)\]

The cost however is

\[(A_1 A_2) A_3 = (5 \times 4 \times 6) + (5 \times 6 \times 2) = 180\]

\[A_1(A_2 A_3) = (4 \times 6 \times 2) + (5 \times 4 \times 2) = 88\]

The chain matrix multiplication problem is stated as follows:

Chain Matrix Multiplication Problem

Given a sequence \(A_1, A_2, \ldots, A_n\) and dimensions \(p_0, p_1, \ldots, p_n\) where \(A_i\) is of dimension \(p_{i - 1} \times p_i\), determine the order of multiplication that minimizes the number of operations.

We could brute force to figure out the best way to parenthesis the multiplication but for \(n\) items, there are \(n - 1\) ways in which outer most parentheses can be placed which is large.

Once we split after \(k\) matrices.

\[(A_1 A_2 \ldots A_k)(A_{k + 1} \ldots A_n)\]

We create 2 sub lists, one with \(k\) matrices and other with \(n - k\) matrices.
If there are \(L\) ways to parenthesize left sub list and \(R\) ways to parenthesize the right sub list then the total ways is \(L \times R\).
Therefore, the number of different ways to parenthesize \(n\) items is:

\[P(n) = \begin{cases} 1 &\text{if } n = 1\\ \sum_{k = 1}^{n - 1}P(k)P(n - k) &\text{if } n \ge 2 \end{cases}\]

This is related to a famous function¹ in combinatorics called catalan numbers which is given by

\[C(n) = \frac 1 {n + 1}\binom{2n}n\]

\[P(n) = C(n - 1)C(n) \in \Omega \left(\frac{4^n}{n^{3/2}}\right)\]

The dominating factor is \(4^n\) which suggests that \(P(n)\) will grow large very quickly hence this approach is not practical.

Chain Matrix Multiplication Dynamic Programming Formulation

Let \(A_{i .. j}\) be the result of multiplying matrices through \(i\) to \(j\).
It is easy to see that \(A_{i .. j}\) is a \(p_{i - 1} \times p_j\) matrix.
At the highest level of parenthesizing, we multiply 2 matrices.

\[A_{1 .. n} = A_{1 .. k} \cdot A_{k + 1 .. n} \quad 1 \le k \le n - 1\]

For \(1 \le i \le j \le n\), let \(m[i, j]\) denote the minimum number of multiplications needed to compute \(A_{i .. j}\)
The optimum can be described by the following recursion formation:

If \(i = j\) then there is only one matrix and thus \(m[i, i] = 0\) (the diagonal entries).
If \(i < j\) then we are asking for the product \(A_{i .. j}\)
This can be split by considering each \(k\), \(i \le k \le j\) as \(A_{i .. k}\) times \(A_{k + 1 .. j}\)

The optimum time to compute \(A_{i .. k}\) is \(m[i, k]\) and for \(A_{k + 1 .. j}\), it is \(m[k + 1, j]\).
Since \(A_{i .. k}\) is a \(p_{i - 1} \times p_k\) matrix and \(A_{k + 1 .. j}\) is a \(p_k \times p_j\) matrix, the time to multiply them is \(p_{i - 1} \times p_k \times p_j\).
This suggests

\[m[i, i] = 0\]

\[m[i, j] = \min_{i \le k < j}(m[i, k] + m[k + 1, j] + p_{i - 1} p_k p_j)\]

We will fill \(m\) along the diagonals.
Set all \(m[i, i] = 0\) using the base condition.
Compute cost of multiplication of sequence of 2 matrices.
Example:

\[m[1, 2] = m[1, 1] + m[2, 2] + p_0 \cdot p_1 \cdot p_2\]

Pasted image 20250131143947.png — Product of 5 `matrices`

Next we compute cost of multiplication for sequences of 3 matrices.
These are \(m[1, 3], m[2, 4], m[3, 5], \ldots, m[n - 2, n] \cdot m[1, 3]\).

\[m[1, 3] = \min \begin{cases} m[1, 1] + m[2, 3] + p_0 \cdot p_1 \cdot p_3\\ m[1, 2] + m[3, 3] + p_0 \cdot p_2 \cdot p_3 \end{cases}\]

We repeat this for 4, 5 and higher number of matrices. The final result will end up in \(m[1, n]\).

Example

Imagine 5 matrices which we want to multiply, with dimensions:

\(5 \times 4\)
\(4 \times 6\)
\(6 \times 2\)
\(2 \times 7\)
\(7 \times 3\)

We will compute the entries of \(m\) matrix starting with base condition.
We will first fill up the diagonal

Next, we compute the entries in the first super diagonal, which is the diagonal above the main diagonal.

\[m[1, 2] = m[1, 1] + m[2, 2] + p_0 \cdot p_1 \cdot p_2 = 0 + 0 + 5 \cdot 4 \cdot 6 = 120\]

\[m[2, 3] = m[2, 2] + m[3, 3] + p_1 \cdot p_2 \cdot p_3 = 0 + 0 + 4 \cdot 6 \cdot 2 = 48\]

\[m[3, 4] = m[3, 3] + m[4, 4] + p_2 \cdot p_3 \cdot p_4 = 0 + 0 + 6 \cdot 2 \cdot 7 = 84\]

\[m[4, 5] = m[4, 4] + m[5, 5] + p_3 \cdot p_4 \cdot p_5 = 0 + 0 + 2 \cdot 7 \cdot 3 = 42\]

The \(m\) matrix now looks as follows:

Now we proceed to second super diagonal but this time we will try two possible values of \(k\).
There are 2 possible splits to compute \(m[1, 3]\) but we will choose the one which yields the minimum.

\[m[1, 3] = m[1, 1] + m[2, 3] + p_0 \cdot p_1 \cdot p_3 = 0 + 48 + 5 \cdot 4 \cdot 2 = 88\]

\[m[1, 3] = m[1, 2] + m[3, 3] + p_0 \cdot p_2 \cdot p_3 = 120 + 0 + 5 \cdot 6 \cdot 2 = 180\]

The minimum \(m[1, 3] = 88\) occurs with \(k = 1\)
Similarly for \(m[2, 4]\) and \(m[3, 5]\)

\[m[2,4] = m[2,2] + m[3,4] + p_1 \cdot p_2 \cdot p_4 = 0 + 84 + 4 \cdot 6 \cdot 7 = 252\]

\[m[2,4] = m[2,3] + m[4,4] + p_1 \cdot p_3 \cdot p_4 = 48 + 0 + 4 \cdot 2 \cdot 7 = 104\]

minimum \(m[2, 4] = 104\) at \(k = 3\)

\[m[3,5]=m[3,3]+m[4,5]+p_2 \cdot p_3 \cdot p_5=0+42+6 \cdot 2 \cdot 3=78\]

\[m[3,5]=m[3,4]+m[5,5]+p_2 \cdot p_4 \cdot p_5=84+0+6 \cdot 7 \cdot 3=210\]

minimum \(m[3, 5] = 78\) at \(k = 3\)

We repeat the same process for other diagonals but the number of \(k\) splits increases.

\[m[1,4] = m[1,1] + m[2,4] + p_0 \cdot p_1 \cdot p_4 = 0 + 104 + 5 \cdot 4 \cdot 7 = 244\]

\[m[1,4] = m[1,2] + m[3,4] + p_0 \cdot p_2 \cdot p_4 = 120 + 84 + 5 \cdot 6 \cdot 7 = 414\]

\[m[1,4] = m[1,3] + m[4,4] + p_0 \cdot p_3 \cdot p_4 = 88 + 0 + 5 \cdot 2 \cdot 7 = 158\]

minimum \(m[1, 4] = 158\) at \(k = 3\)

\[m[2,5] = m[2,2] + m[3,5] + p_1 \cdot p_2 \cdot p_5 = 0 + 78 + 4 \cdot 6 \cdot 3 = 150\]

\[m[2,5] = m[2,3] + m[4,5] + p_1 \cdot p_3 \cdot p_5 = 48 + 42 + 4 \cdot 2 \cdot 3 = 114\]

\[m[2,5] = m[2,4] + m[5,5] + p_1 \cdot p_4 \cdot p_5 = 104 + 0 + 4 \cdot 7 \cdot 3 = 188\]

minimum \(m[2, 5] = 114\) at \(k = 3\)

The matrix \(m\) at this stage is:

That leaves \(m[1, 5]\) which can be computed as follows:

\[m[1,5] = m[1,1] + m[2,5] + p_0 \cdot p_1 \cdot p_5 = 0 + 114 + 5 \cdot 4 \cdot 3 = 174\]

\[m[1,5] = m[1,2] + m[3,5] + p_0 \cdot p_2 \cdot p_5 = 120 + 78 + 5 \cdot 6 \cdot 3 = 288\]

\[m[1,5] = m[1,3] + m[4,5] + p_0 \cdot p_3 \cdot p_5 = 88 + 42 + 5 \cdot 2 \cdot 3 = 160\]

\[m[1,5] = m[1,4] + m[5,5] + p_0 \cdot p_4 \cdot p_5 = 158 + 0 + 5 \cdot 7 \cdot 3 = 263\]

minimum \(m[1, 5] = 160\) at \(k = 3\)

Thus the final cost matrix looks like

Here is the order in which m entries are calculated

and the split \(k\) values that lead to a minimum \(m[i, j]\) value

Based on the computation, the minimum cost for multiplying the five matrices is 160 and the optimal order for multiplication is

\[A_1 (A_2 A_3) (A_4 A_5)\]

This can be represented by a binary tree.

Optimum `matrix` multiplication order for the five matrices example

MatrixChain(p, N) {
    for i = 1, N
    do m[i, i] <- 0

    for L = 2, N
    do
        for i = 1, n - L + 1
        do
            j <- i + L - 1
            m[i, j] <- infinity

            for k = 1, j - 1
            do 
                t <- m[i, k] + m[k + 1, j] + p_{i - 1} * p_k * p_j

                if (t < m[i, j])
                then
                    m[i, j] <- t
                    s[i, j] <- k
}

Analysis

There are 3 nested loops and each runs maximum of \(n\) times.
Therefore, total time is \(\Theta(n^3)\).
The \(s\) matrix stores the values \(k\).
The \(s\) matrix can be used to extract the order in which matrices are to be multiplied.
Hence the algorithm which carries out matrix multiplication to compute \(A[i .. j]\)

if (i = j)
then return A[i]
else
    k <- s[i, j]
    X <- Multiply(i, k)
    Y <- Multiply(k + 1, j)
    return X + Y

0/1 Knapsack Problem

Given a bag with maximum capacity \(W\) and a set \(S\) consisting of \(n\) items.
Each item \(i\) has some weight \(w_i\) and value \(v_i\).
How to pack the bag to achieve maximum amount of value of packed items?
This problem belongs to optimization problems.
Mathematically, the problem is

\[\text{maximize } \sum_{i \in T} v_i \text{ subject to } \sum_{i \in T} w_i \le W\]

The problem is called the "0 - 1" problem, because each item has to be either entirely accepted or rejected.

Since there are \(n\) items so there are total \(2^n\) combinations of items because item is either accepted or rejected.
We go through all combinations and find the one with most total value with total weight less than or equal to \(W\).

The running time for this algorithm is \(O(n^2)\) and can be improved by using dynamic programming.

Simple sub Problems

We should be able to break the problem into sub problems with same structure.

Principle of Optimality

Recursively define the value of an optimal solution.
Express the solution of the original problem in terms of optimal solutions for smaller problems.

Bottom up Computation

Compute the value of an optimal solution in a bottom-up fashion by using a table structure.

Construction of Optimal Solution

Construct an optimal solution from computed information

If the items are labelled \(1, 2, \ldots, n\) then sub problem would be to find an optimal solution for

\[S_k = \text{items labelled } 1, 2, \ldots, k\]

Now the question is: can we define the final solution \(S_n\) in terms of sub problems \(S_k\)? Here's why we cannot do that.
Consider the optimal solution if we can choose items 1 through 4 only.

Solution \(S_4\)

Items chosen: \(1, 2, 3, 4\)
Total Weight: \(2 + 3 + 4 + 5 = 14\)
Total value: \(3 + 4 + 5 + 8 = 20\)

Item	\(w_i\)	\(v_i\)
1	2	3
2	3	4
3	4	5
4	5	8

Now consider optimal solution when items 1 to 5 are available

Solution \(S_5\)

Items chosen: \(1, 3, 4, 5\)
Total weight: \(2 + 4 + 5 + 9 = 20\)
Total value: \(3 + 5 + 8 + 10 = 26\)

\(S_4\) is not part of \(S_5\).

Item	\(w_i\)	\(v_i\)
1	2	3
2	3	4
3	4	5
4	5	8
5	9	10

The solution for \(S_4\) is not part of solution of \(S_5\).
Hence our definition of the sub problem is flawed and we need another one.

Dynamic Programming Approach

References

Read more about functions. ↩