Let \(\boldsymbol{x}\) be the solution of \({\bf A} \boldsymbol{x} = \boldsymbol{b}\) and \(\hat{\boldsymbol{x}}\) be the solution of the perturbed problem \({\bf A} \hat{\boldsymbol{x}} = \boldsymbol{b} + \Delta \boldsymbol{b}\).
In addition, let \(\Delta \boldsymbol{x} = \hat{\boldsymbol{x}} - \boldsymbol{x}\) be the absolute error in output.
Then we have \({\bf A} \boldsymbol{x} + {\bf A} \Delta \boldsymbol{x} = \boldsymbol{b} + \Delta \boldsymbol{b},\) so \({\bf A} \Delta \boldsymbol{x} = \Delta \boldsymbol{b}.\)
Using the above equations, we'll see how the relative error in output \(\left(\frac{\|\Delta \boldsymbol{x}\|}{\|\boldsymbol{x}\|}\right)\) is related to the relative error in input \(\left(\frac{\|\Delta \boldsymbol{b}\|}{\|\boldsymbol{b}\|}\right)\):
\[\begin{align} \frac{\|\Delta \boldsymbol{x}\| / \|\boldsymbol{x}\|}{\|\Delta \boldsymbol{b}\| / \|\boldsymbol{b}\|} &= \frac{\|\Delta \boldsymbol{x}\| \|\boldsymbol{b}\|}{\|\boldsymbol{x}\| \|\Delta \boldsymbol{b}\|}\\ &= \frac{\|{\bf A}^{-1} \Delta \boldsymbol{b}\| \|{\bf A} \boldsymbol{x}\|}{\|\boldsymbol{x}\| \|\Delta \boldsymbol{b}\|}\\ &\le \frac{\|{\bf A}^{-1}\| \|\Delta \boldsymbol{b}\| \|{\bf A}\| \|\boldsymbol{x}\|}{\|\boldsymbol{x}\| \|\Delta \boldsymbol{b}\|} \\ &= \|{\bf A}^{-1}\| \|{\bf A}\|\\ &= \text{cond}({\bf A}) \end{align}\] where we used \(\|{\bf A}\boldsymbol{x}\| \le \|{\bf A}\| \|\boldsymbol{x}\|, \forall \boldsymbol{x}.\)
Matrix Perturbation and Error Bound
Then
\[\frac{\|\Delta \boldsymbol{x}\|}{\|\boldsymbol{x}\|} \le \text{cond}({\bf A})\frac{\|\Delta \boldsymbol{b}\|}{\|\boldsymbol{b}\|} \qquad (1)\]
Instead of changing the input \(\boldsymbol{b}\), we can also add a perturbation to the matrix \(\boldsymbol{A}\) (input) by a small amount \(\boldsymbol{E}\), such that
\[\begin{align} (\boldsymbol{A} + \boldsymbol{E}) \hat{\boldsymbol{x}} = \boldsymbol{b} \end{align}\] and in a similar way obtain:
\[\frac{\|\Delta \boldsymbol{x}\|}{\|\boldsymbol{x}\|} \le \text{cond}({\bf A})\frac{\| \boldsymbol{E}\|}{\|\boldsymbol{A}\|}\] Therefore, if we know the relative error in input, then we can use the condition number of the system to obtain an upper bound for the relative error of our computed solution (output).
Example
Give an example of a matrix that is very well-conditioned (i.e., has a condition number that is good for computation). Select the best possible condition number(s) of a matrix?
\[\begin{flalign} & \text{A) } \text{cond}({\bf A}) < 0 \\ & \text{B) } \text{cond}({\bf A}) = 0 \\ & \text{C) } 0 < \text{cond}({\bf A}) < 1 \\ & \text{D) } \text{cond}({\bf A}) = 1 \\ & \text{E) } \text{cond}({\bf A}) = \text{large numbers} \\ && \end{flalign}\] Answer
\(\bf D\)
The condition number of a matrix \({\bf A}\) cannot be less than 1 and a smaller condition number will minimize the relative error in output \(\left(\frac{\|\Delta \boldsymbol{x}\|}{\|\boldsymbol{x}\|}\right)\). Since we want to minimize the error in computation, choice \( \text{D} \) is the correct answer. Condition Number
Condition Number Definition
Condition Number: a measure of sensitivity of solving a linear system of equations to variations in the input.
The condition number of a square nonsingular matrix \({\bf A}\) is defined by \(\text{cond}({\bf A}) = \kappa({\bf A}) = \|{\bf A}\| \|{\bf A}^{-1}\|\). This is also the condition number associated with solving the linear system \({\bf A} \boldsymbol{x} = \boldsymbol{b}\). A matrix with a large condition number is said to be ill-conditioned, while a matrix with a small condition is said to be well-conditioned.
The identity matrix is well conditioned. We show this by the following:
Assuming the inverse of \(\|{\bf A}\|\) exists, \(\text{cond}({\bf A}) = \|{\bf A}\| \|{\bf A}^{-1}\| \geq \|{\bf A}{\bf A}^{-1}\| = \|{\bf I}\| = 1.\)
This is the smallest possible condition number. Small condition numbers correspond to little error amplification. Remember, small condition numbers are good!
If \({\bf A}\) is singular (\({\bf A}^{-1}\) does not exist), we can define \(\text{cond}({\bf A}) = \infty\) by convention.
Induced Matrix Norms
Recall that the induced matrix norm is given by:
\[\begin{align} \|{\bf A}\|_p := \max_{\|\mathbf{x}\|=1} \|{\bf A}\mathbf{x}\|_p \end{align}\] The condition number can be measured with any \(p\)-norm, so to be precise we typically specify the norm being used, i.e. \(\text{cond}_2\), \(\text{cond}_1, \text{cond}_{\infty}\).
For the 1-norm, we take the maximum absolute column sum of the matrix \(\boldsymbol{A}\)
\[\|{\bf A}\|_1 = \max_j \sum_{i=1}^n \vert a_{ij} \vert.\] For the \(\infty\)-norm, we take the maximum absolute row sum of the matrix \(\boldsymbol{A}\)
\[\|{\bf A}\|_{\infty} = \max_i \sum_{j=1}^n \vert a_{ij} \vert.\] For the 2-norm, \(\sigma_k\) are the singular values of the matrix \(\boldsymbol{A}\)
\[\|{\bf A}\|_{2} = \max_k \sigma_k\] Condition Number of Orthogonal Matrices
What is the 2-norm condition number of an orthogonal matrix \(\boldsymbol{A}\)?
\[\begin{align} \text{cond}({\bf A}) = \|{\bf A}\|_2 \|{\bf A}^{-1}\|_2 = \|{\bf A}\|_2 \|{\bf A}^{T}\|_2 = 1 \end{align}\] Hence, this means that orthogonal matrices have optimal conditioning.
Things to Remember About Condition Numbers
- For any matrix \({\bf A}\), \(\text{cond}({\bf A}) \geq 1.\)
- For the identity matrix \({\bf I}\), \(\text{cond}({\bf I}) = 1.\)
- For any matrix \({\bf A}\) and a nonzero scalar \(\gamma\), \(\text{cond}(\gamma {\bf A}) = \text{cond}({\bf A}).\)
- For any diagonal matrix \({\bf D}\), \(\text{cond}({\bf D})\) = \(\frac{\text{max}\mid d_{i} \mid}{\text{min}\mid d_{i} \mid}.\)
- The condition number is a measure of how close a matrix is to being singular: a matrix with large condition number is nearly singular, whereas a matrix with a condition number close to 1 is far from being singular.
- The determinant of a matrix is NOT a good indicator to check whether a matrix is near singularity.
Example
What is the 2-norm-based condition number of the diagonal matrix
\[\mathbf{A} = \begin{bmatrix} 100 & 0 & 0 \\ 0 & 13 & 0 \\ 0 & 0 & 0.5 \end{bmatrix}?\] \[\begin{flalign} & \text{A) } 1 \\ & \text{B) } 50 \\ & \text{C) } 100 \\ & \text{D) } 200 \\ && \end{flalign}\] Answer
\(\bf D\)
As mentioned above, for any diagonal matrix \({\bf D}\), \(\text{cond}({\bf D}) \) = \(\frac{\text{max}\mid d_{i} \mid}{\text{min}\mid d_{i} \mid}.\) Hence, the answer is \( 100 / 0.5 = 200. \)
Another way to go about this question is by obtaining the inverse of the diagonal matrix \({\bf A}\):
\( \mathbf{A^{-1}} = \begin{bmatrix} 1/100 & 0 & 0 \\ 0 & 1/13 & 0 \\ 0 & 0 & 2 \end{bmatrix}? \)
\( \begin{align} \text{cond}({\bf A}) = \|{\bf A}\|_2 \|{\bf A}^{-1}\|_2 = 100 * 2 = 200 \end{align} \) Residual vs Error
The residual vector \(\boldsymbol{r}\) of approximate solution \(\hat{\boldsymbol{x}}\) for the linear system \({\bf A} \boldsymbol{x} = \boldsymbol{b}\) is defined as \(\boldsymbol{r} = \boldsymbol{b} - {\bf A} \hat{\boldsymbol{x}}\). Since \({\bf A} \hat{\boldsymbol{x}} = \boldsymbol{b} + \Delta \boldsymbol{b}\), we have
\[\boldsymbol{r} = \boldsymbol{b} - (\boldsymbol{b} + \Delta \boldsymbol{b}) = -\Delta \boldsymbol{b}\] Therefore, equation (1) can also be written as
\[\frac{\|\Delta \boldsymbol{x}\|}{\|\boldsymbol{x}\|} \le \text{cond}({\bf A})\frac{\|\boldsymbol{r}\|}{\|\boldsymbol{b}\|}\] If we define relative residual as \(\frac{\|\boldsymbol{r}\|}{\|\boldsymbol{b}\|}\), we can see that a small relative residual implies small relative error in approximate solution only if \({\bf A}\) is well-conditioned (\(\text{cond}({\bf A})\) is small).
In addition, it's important to note the difference between relative residual and relative error. The relative residual \(\frac{\|\boldsymbol{r}\|}{\|\boldsymbol{b}\|}\) tells us how well the approximate solution \(\hat{\boldsymbol{x}}\) satisfies the linear system \({\bf A} \boldsymbol{x} = \boldsymbol{b}\). The relative error \(\frac{\|\Delta \boldsymbol{x}\|}{\|\boldsymbol{x}\|}\) tells us how close the approximated solution \(\hat{\boldsymbol{x}}\) is to the exact solution \(\boldsymbol{x}\). Keep in mind that we don't know the exact solution \(\boldsymbol{x}\), this is why we started using the residual vector \(\boldsymbol{r}\).
Example
\[\mathbf{A} = \begin{bmatrix} 13 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 15 \end{bmatrix}\] Suppose we have \({\boldsymbol{x}}\) as a solution of \({\bf A} \boldsymbol{x} = \boldsymbol{b}\), and \(\hat {\boldsymbol{x}}\) as a solution of \({\bf A} \hat{\boldsymbol{x}} = \boldsymbol{b} + \Delta \boldsymbol{b}\). We define \({\bf r} = {\bf A} \hat{\boldsymbol{x}} - \boldsymbol{b}\). If we know that the relative residual \(\frac{\|\boldsymbol{r}\|}{\|\boldsymbol{b}\|}\) is \(10^{-4}\), determine the upper bound for the output relative error. Assume 2-norm.
Answer
\(\frac{\|\Delta \boldsymbol{x}\|}{\|\boldsymbol{x}\|} \le \text{cond}({\bf A})\frac{\|\boldsymbol{r}\|}{\|\boldsymbol{b}\|} = \mathbf{15 * 10^{-4}} \) Alternative Definitions of Relative Residual
As a reminder, the relative residual \(\frac{\|\boldsymbol{r}\|}{\|\boldsymbol{b}\|}\) tells us how well the approximate solution \(\hat{\boldsymbol{x}}\) satisfies the linear system \({\bf A} \boldsymbol{x} = \boldsymbol{b}\).
There are other closely related quantities that also have the name "relative residual". Note that
\[\begin{align} \|\Delta \boldsymbol{x}\| &= \|\hat{\boldsymbol{x}} - \boldsymbol{x}\| \\ &= \|\boldsymbol{A}^{-1}\boldsymbol{A}\hat{\boldsymbol{x}} - \boldsymbol{A}^{-1}\boldsymbol{b}\| \\ &= \|\boldsymbol{A}^{-1}(\boldsymbol{A}\hat{\boldsymbol{x}} - \boldsymbol{b})\| \\ &= \|\boldsymbol{A}^{-1}\boldsymbol{r}\|\\ &\leq \|\boldsymbol{A}^{-1}\|\cdot \| \boldsymbol{r}\| \\ &= \|\boldsymbol{A}^{-1}\|\cdot \|\boldsymbol{A}\| \frac{\|\boldsymbol{r}\|}{\|\boldsymbol{A}\|} \\ &= \text{cond}(\boldsymbol{A})\frac{\|\boldsymbol{r}\|}{\|\boldsymbol{A}\|}. \end{align}\] In summary,
\[\|\Delta \boldsymbol{x}\| \leq \text{cond}(\boldsymbol{A})\frac{\|\boldsymbol{r}\|}{\|\boldsymbol{A}\|}\qquad (2)\] We can divide this inequality by \(\|\boldsymbol{x}\|\) to obtain
\[\frac{\|\Delta \boldsymbol{x}\|}{\|\boldsymbol{x}\|} \le \text{cond}({\bf A})\frac{\|\boldsymbol{r}\|}{\|\boldsymbol{A}\|\cdot\|\boldsymbol{x}\|}.\] The quantity \(\frac{\|\boldsymbol{r}\|}{\|\boldsymbol{A}\|\cdot\|\boldsymbol{x}\|}\) is also known as the relative residual. This inequality is useful mathematically, but involves the norm \(\|\mathbf{x}\|\) of the unknown solution, so it isn't a practical way to bound the relative error. Since \(\|\boldsymbol{b}\| = \|\boldsymbol{A}\boldsymbol{x}\| \leq \|\boldsymbol{A}\|\cdot \|\boldsymbol{x}\|\), we have
\[\frac{\|\boldsymbol{r}\|}{\|\boldsymbol{A}\|\cdot\|\boldsymbol{x}\|} \leq \frac{\|\boldsymbol{r}\|}{\|\boldsymbol{b}\|}\] Note that the two sides are sometimes equal for certain choices of \(\boldsymbol{b}\).
We can also divide equation (2) by \(\|\hat{\boldsymbol{x}}\|\) to obtain
\[\frac{\|\Delta \boldsymbol{x}\|}{\|\hat{\boldsymbol{x}}\|} \le \text{cond}({\bf A})\frac{\|\boldsymbol{r}\|}{\|\boldsymbol{A}\|\cdot\|\hat{\boldsymbol{x}}\|}.\] The left-hand side is no longer the relative error (the denominator is the norm of the approximate solution now, not the exact solution), but the right-hand side can still provide a reasonable estimate of the relative error. It is also computable, since the norm of the true solution does not appear on the right-hand side.
For this reason, the quantity \(\frac{\|\boldsymbol{r}\|}{\|\boldsymbol{A}\|\cdot\|\hat{\boldsymbol{x}}\|}\) is also known as the relative residual. This is used in the next section to describe the relationship between the residual and errors in the matrix \(\boldsymbol{A}\).
While 3 different quantities all being named "relative residual" may be confusing, you should be able to determine which one is being discussed by the context.
Gaussian Elimination (with Partial Pivoting) is Guaranteed to Produce a Small Residual
When we use Gaussian elimination with partial pivoting to compute the solution for the linear system \({\bf A} \boldsymbol{x} = \boldsymbol{b}\) and obtain an approximate solution \(\hat{\boldsymbol{x}}\), the residual vector \(\boldsymbol{r}\) satisfies:
\[\frac{\|\boldsymbol{r}\|}{\|{\bf A}\| \|\hat{\boldsymbol{x}}\|} \le \frac{\|E\|}{\|{\bf A}\|} \le c \epsilon_{mach}\] where \(E\) is backward error in \({\bf A}\) (which is defined by \(({\bf A}+E)\hat{\boldsymbol{x}} = \boldsymbol{b}\)), \(c\) is a coefficient related to \({\bf A}\) and \(\epsilon_{mach}\) is machine epsilon.
Typically \(c\) is small with partial pivoting, but \(c\) can be arbitrarily large without pivoting.
Therefore, Gaussian elimination with partial pivoting yields small relative residual regardless of conditioning of the system .
For more details, see Gaussian Elimination & Roundoff Error.
Accuracy Rule of Thumb for Conditioning
Suppose we apply Gaussian elimination with partial pivoting and back substitution to the linear system \({\bf A} \boldsymbol{x} = \boldsymbol{b}\) and obtain a computed solution \(\hat{\boldsymbol{x}}\). If the entries in \({\bf A}\) and \(\boldsymbol{b}\) are accurate to \(s\) decimal digits, and \(\text{cond}({\bf A}) \approx 10^w\), then the elements of the solution vector \(\hat{\boldsymbol{x}}\) will be accurate to about \(s-w\) decimal digits.
For a proof of this rule of thumb, please see Fundamentals of Matrix Computations by David S. Watkins.
Example
How many accurate decimal digits in the solution can we expect to obtain if we solve a linear system \({\bf A} \boldsymbol{x} = \boldsymbol{b}\) where \(\text{cond}({\bf A}) = 10^{10}\) using Gaussian elimination with partial pivoting, assuming we are using IEEE double precision and the inputs are accurate to machine precision?
Answer
In IEEE double precision, \(\epsilon_{mach} \approx 2.2\times 10^{-16}\), which means the entries in \({\bf A}\) and \(\boldsymbol{b}\) are accurate to \(\vert\log_{10}(2.2\times 10^{-16})\vert \approx 16\) decimal digits.
Then, using the rule of thumb, we know the entries in \(\hat{\boldsymbol{x}}\) will be accurate to about \(16-10 = 6\) decimal digits. Review Questions
- What is the definition of a condition number?
- What is the condition number of solving \({\bf A}\mathbf{x} = \mathbf{b}\)?
- What is the condition number of matrix-vector multiplication?
- Calculate the \(p\)-norm condition number of a matrix for a given \(p\).
- Do you want a small condition number or a large condition number?
- What is the condition number of an orthogonal matrix?
- If you have \(p\) accurate digits in \({\bf A}\) and \(\mathbf{b}\), how many accurate digits do you have in the solution of \({\bf A}\mathbf{x} = \mathbf{b}\) if the condition number of \({\bf A}\) is \(\kappa\)?
- When solving a linear system \({\bf A}\mathbf{x} = \mathbf{b}\), does a small residual guarantee an accurate result?
- Consider solving a linear system \({\bf A}\mathbf{x} = \mathbf{b}\). When does Gaussian elimination with partial pivoting produce a small residual?
- How does the condition number of a matrix \({\bf A}\) relate to the condition number of \({\bf A}^{-1}\)?