Skip to content

42. Least Square Solution

Dated: 20-06-2025

Least Squares Solution

\[A \vec x = \vec b\]

The least squares solution is the value of \(\vec x\) which makes \(A \vec x\) the closest point to \(\vec b\) in \(\text{Col } A\).

Solution of the General Least Squares Problem

Apply best approximation theorem1 on the subspace2 \(\text{Col }A\)

\[\vec b^\prime = \text{Proj}_{\text{Col }A} \vec b\]

Since \(\vec b^\prime \in \text{Col } A\), the equation \(A \vec x = \vec b^\prime\) is consistent3 and there is a solution \(\vec x^\prime \in \mathbb R^n\) such that

\[A \vec x^\prime = \vec b^\prime\]

Since \(\vec b^\prime\) is closest point in \(\text{Col }A\) to \(\vec b\), therefore, \(\vec x^\prime\) is least squares solution of \(A \vec x = \vec b\) if and only if \(\vec x^\prime\) satisfies \(A \vec x^\prime = \vec b\).
Such \(\vec x^\prime \in \mathbb R^n\) is a list of weights that will build \(\vec b\) out of the columns of \(A\).

Normal Equations for \(\vec x^\prime\)

Suppose that \(\vec x^\prime\) satisfies \(A \vec x^\prime = \vec b^\prime\). Then by orthogonal decomposition theorem,4 \(\vec b\) has the property that \((\vec b - \vec b^\prime) \perp \text{Col }A\).

\[\because A \vec x^\prime = \vec b^\prime\]
\[\therefore (\vec b - A \vec x^\prime) \perp \text{Col } A\]

This means that \(\vec b - A \vec x^\prime\) is perpendicular to each column of \(A\).
Let \(a_j\) be a column of \(A\).

\[\implies a_j \cdot (\vec bA \vec x^\prime) = 0\]
\[\implies a_j^T \cdot (\vec b - A \vec x^\prime) = 0\]

Since \(a^T_j\) is a row of \(A^T\)

\[A^T \cdot (\vec b - A \vec x^\prime) = 0\]
\[A^T \cdot \vec b - A^TA \vec x^\prime = 0\]
\[A^T \cdot \vec b = A^TA \vec x^\prime\]

This equation represents a system of linear equations5 called normal equations for \(\vec x^\prime\).

Decomposition of \(\vec b\) into the sum of a vector6 from \(\text{Col } A\) and a vector6 perpendicular to \(\text{Col }A\)

\[\vec b = A \vec x^\prime + (\vec b - A \vec x^\prime)\]

Definition

If \(A\) is an \(m \times n\) matrix7 and \(\vec b \in \mathbb R^n\), a least square solution of \(A \vec x = \vec b\) is an \(I \vec x^\prime \in \mathbb R^n\) such that

\[||\vec b - A \vec x^\prime|| \le ||\vec b - A \vec x|| \quad \forall \vec x \in \mathbb R^n\]

Theorem

The matrix7 \(A^TA\) is invertible8 if and only if columns of \(A\) are linearly independent.9 In this case

\[\vec x^\prime = (A^TA)^{-1} A^T \vec b\]

Theorem

Given an \(m \times n\) matrix7 \(A\) with linearly independent columns,9 let \(A = QR\) and for each \(\vec b \in \mathbb R^m\), the equation \(A \vec x = \vec b\) has a unique least squares solution given by

\[\vec x^\prime = R^{-1} Q^T \vec b\]

References

Read more about notations and symbols.


  1. Read more about best approximation theorem

  2. Read more about vector spaces

  3. Read more about consistency of a linear system

  4. Read more about orthogonal decomposition

  5. Read more about linear systems

  6. Read more about vectors

  7. Read more about matrices

  8. Read more about invertible matrices

  9. Read more about linear dependency