Higham - Accuracy and Stability of Numerical Algorithms (523152), страница 29
Текст из файла (страница 29)
Normwise AnalysisFirst, we present some classical normwise perturbation results. We denote by||·|| any vector norm and the corresponding subordinate matrix norm. Asusual, κ(A) = ||A|| ||A- 1 || is the matrix condition number. Throughout thischapter the matrix E and the vector f are arbitrary and represent tolerancesagainst which the perturbations are measured (their role becomes clear whenwe consider componentwise results).Our first result makes precise the intuitive feeling that if the residual issmall then then have a “good” approximate solution.Theorem 7.1 (Rigal and Gaches). The normwise backward errorη E , f(y) := min{(A + ∆A)y = b + ∆b, ||∆ A|| < ||E||, ||∆b|| <||f|| }(7-1)is given by(7.2)where r = b - Ay.Proof.
It is straight forward to show that the right-hand side of (7.2) is alower bound for ηE , f (y ). This lower bound is attained for the perturbations(7.3)where z is a vector dual to y (see §6.1).For the particular choice E = A and f = b, η E , f (y) is called the normwiserelative backward error.The next result measures the sensitivity of the system.7.1 N ORMWISE ANALYSIS133Theorem 7.2. Let Ax = b and (A + ∆A)y = b + ∆b, where ||∆ A|| <and ||∆ b|| < ||f|| , and assume that ||A- 1 ||||E|| < 1. Then||E||(7.4)and this bound is attainable to first order inProof. The bound (7.4) follows easily from the equation A( y - x) =∆ b -∆A x+∆ A(x - y).
It is attained to first order infor ∆ A = ||E||||x||wv T- 1- 1and ∆b =||f||w, where ||w|| = 1, ||A w|| = ||A || and υ is a vector dualto x.Associated with the way of measuring perturbations used in these twotheorems is the normwise condition numberBecause the bound of Theorem 7.2 is sharp, it follows thatFor the choice E = A and f = b we have κ (A) < κE,f (A, x) < 2κ(A), and thebound (7.4) can be weakened slightly to yield the familiar formA numerical example illustrates the above results. Let A be the 8 × 8Vandermonde matrix with (i,j) element j2 (i -1), and let b = e1 be the first unitvector, so that x is the first column of A-1.
We take y to be the approximatesolution to Ax = b computed by Gaussian elimination with partial pivoting.Computations are performed in MATLAB (u1.1 × 10 -16 ). We find that-21η A , b (y) = 3.05 × 10for the-norm, and= 1.68 × 10 13 . Thisis an admirably small backward error, but it may be uninformative for tworeasons. First, the elements of A vary over 12 orders of magnitude, so whileour backward error perturbations are small compared with the large elementsof A, we may be making large perturbations in the small elements (indeed weare in this particular example).
Second, we are perturbing the zero elementsof b (as can be seen from (7.3) together with the fact that for this examplethe residual r has no zero entries); this is unsatisfactory if we wish to regardy as the first column of the inverse of a perturbed matrix.133P ERTURBATION T HEORYFORLINEAR S YSTEMSNext.
let b = Ae. where e = [1, 1,. . . . 1]T, and let z be the solution to theperturbed q-stem (A + ∆A)z = b + ∆b, where ∆A = tol|A| and ∆b = tol|b|,with to1 = 8u. We find that(7.5)while the corresponding bound from (7.4) with= tol, E = A, and f = b is3.03 × 10-2. Thus the normwise forward error bound is extremely pessimisticfor this special choice of perturbation.To obtain a more satisfactory backward error measure and a sharper perturbation bound in this example, we need componentwise analysis.7.2. Componentwise AnalysisThe componentwise backward error is defined as(7.6)where E and f are now assumed to have nonnegative entries.
Inequalitiesbetween matrices or vectors are understood to hold componentwise. In thisdefinition each element of a perturbation is measured relative to its individualtolerance, so, unlike in the normwise definition, we are making full use of then 2 + n parameters in E and f.How should E and f be chosen? The most common choice of tolerances isE = |A| and f = |b|, which yields the componentwise relative backward error.For this choicein (7.6), and so if wE,f (y) is small then y solves a problem that is close tothe original one in the’ sense of componentwise relative perturbations and hasthe same sparsity pattern. Another attractive property of the componentwiserelative backward error is that it is insensitive to the scaling of the system: ifAx = b is scaled to (S 1 AS2 )= S 1 b, where S 1 and S 2 are diagonal, andy is scaled tothen w remains unchanged.The choice E = |A|eeT, f = |b| gives a row-wise backward error.
Theconstraint |∆A| <is now |∆a i j | <where (ai is the 1-norm of the ithrow of A, so perturbations to the ith row of A are being measured relative tothe norm of that row. A columnwise backward error can be formulated in asimilar way, by taking E = eeT|A| and f =The third natural choice of tolerances is E = ||A||eeT and f = ||b||e, forwhich wE,f (y) is the same as the normwise backward error η E , f (y). up to aconstant.As for the normwise backward error in Theorem 7.1, there is a simpleformula for wE,f (g).7.2 C OMPONENTWISE A NALYSIS135Theorem 7.3 (Oettli and Prager). The componentwise backward error isgiven by(7.7)where r = b- Ay, andis interpreted as zero ifand infinity otherwise.Proof. It is easy to show that the right-hand side of (7.7) is a lower boundfor w(y), and that this bound is attained for the perturbations∆A = D1 ED2 ,∆ b = -D 1 f,(7.8)where D 1 = diag(ri/ ( E|y| + f ) i) and D 2 = diag(sign( y i ) ).The next result gives a forward error bound corresponding to the componentwise backward error.Theorem 7.4.
Let Ax = b and (A + ∆A)y = b + ∆b, where |∆A| <and|∆b| <and assume thatwhere ||·|| is an absolute norm.Then(7.9)and for the-norm this bound is attainable to first order inProof. The bound (7.9) follows easily from the equation A(y - x) =∆b - ∆Ax + ∆A(x - y). For the -norm the bound is attained, to firstorder infor ∆A =and ∆b =where D 2 = diag(sign(x i ) )and D 1 = diagwhere= sign( A - 1 ) k j andTheorem 7.4 implies that the condition numberis given by(7.10)For the special case E = |A| and f = |b| we have the condition numbersintroduced by Skeel [919, 1979]:136PERTURBATION THEORYFORLI N E A R S Y S T E M Swhich differs from cond| A | , | b | (A , x) by at most a factor 2.
and(7.11)How does cond compare with κ? Since cond(A) is invariant under rowscaling Ax = b(DA)x = Db, where D is diagonal. it can be arbitrarilysmaller than(A). In fact, it is straightforward to show that(7.12)where the optimal scaling D R equilibrates the rows of A, that is, DRA hasrows of unit 1-norm (DR|A|e=e)Chandrasekaran and Ipsen [197, 1995] note the following inequalities. First.with DR as just defined,(7.13)(these inequalities imply (7.12)). Thus cond(A) can be much smaller than(A) only when the rows of A are badly scaled. Second, if DC, equilibratesthe columns of A (eT|A|DC=e T) thenThese inequalities show that cond(A , x) can be much smaller than(A) onlywhen the columns of either A or A-1 are badly scaled.Returning to the numerical example of §7.1, we find that wE,f (y) = 1.10 ×10 -12 for E = |A| and f = |b| or f = 0.
This tells us that if we measurechanges to A in a componentwise relative sense, then for y to be the firstcolumn of the inverse of a perturbed matrix we must make relative changes toA four orders of magnitude larger than the unit roundoff. For the perturbedsystem. Theorem 7.4 with= tol, E = |A|, and f = |b| gives the boundwhich is eight orders of magnitude smaller than the normwise bound fromTheorem 7.2, and only a factor 170 larger than the actual forward error (7.5).An example of Kahan [626, 1966] is also instructive Let(7.14)7.3 SCALINGTOMINIMIZETHECONDITION NUMBER137whereso that x =The normwise condition numberso the system is very sensitive to arbitrary perturbationsin A and b. Moreover,which implies that the system is also very senso cond(A) = 3 +sitive to componentwise perturbations for some right-hand sides. However,cond(A, x) = 5/2 +so for this particular b the system is very well conditioned under componentwise perturbations.A word is in order concerning the choice of condition number.
Everycondition number for a linear system is defined with respect to a particularclass of perturbations. It is important to use the right condition number forthe occasion. For example, ifis a computed solution to Ax = b and weknow its normwise backward error η A , bthen it is the condition numberκ (A) that appears in the relevant forward error bound (multiplyingand therefore tells us something about the accuracy ofThe componentwise condition number cond(A, x) is relevant only if we are dealing with thecomponentwise relative backward error, w|A|,|b|Looked at another way,each algorithm has an associated error analysis that determines the conditionnumber relevant to that algorithm.7.3. Scaling to Minimize the Condition NumberIn the last section we noted the invariance of cond(A) under row scaling, whichcontrasts with the strong dependence ofupon the row scaling.
Theopportunity to scale the rows or columns of A arises in various applications,so we now take a closer look at the effect of scaling on the normwise conditionnumber.First, we consider one-sided scaling, by giving a generalization of a wellknown result of van der Sluis [1039, 1969]. It shows that, for one-sided scalingin a Hölder p-norm, equilibrating the rows or columns is a nearly optimalstrategy. We state the result for rectangular matrices A, for which we definewhere A+ is the pseudo-inverse of A (see Problem 19.3).Theorem 7.5 (van der Sluis).
Lethave full rank, letdenote the set of nonsingular diagonal matrices, and definePERTURBATION THEORY138FORLI N E A R S Y S T E M SThen(7.15)(7.16)Proof. For any Xwe have, from (6.12),(7.17)Therefore(7.18)Now, for any D(7.19)using the first inequality in (7.17). Multiplying (7.18) and (7.19) and minimizing over D, we obtain (7.15). Inequality (7.16) follows by noting thatκp (DA) = κq (A T D), where p -1 + q -1 = 1 (see (6.21)).For p =(7.16) confirms what we already know from (7.12) and (7.13):that in the-norm, row equilibration is an optimal row scaling strategy.Similarly, for p = 1, column equilibration is the best column scaling, by(7.15).
Theorem 7.5 is usually stated for the 2-norm, for which it shows thatrow and column equilibration produce condition numbers within factorsandrespectively, of the minimum 2-norm condition numbers achievableby row and column scaling.As a corollary of Theorem 7.5 we have the result that among two-sideddiagonal scalings of a symmetric positive definite matrix. the one that givesA a unit diagonal is not far from optimal.Corollary 7.6 (van der Sluis). Letand let D* =Thenbe symmetric positive definite(7.20)Proof. Let A = RTR be a Cholesky factorization, note that κ2 (DAD) =κ2 (RD)2, and apply Theorem 7.5 to RD.Is the scaling D * in Corollary 7.6 ever optimal? Forsythe and Straus [386,show that it is optimal if A is symmetric positive definite with property1955]7.3 S CALINGTOM INIMIZETHEC ONDITION N UMBER139A (that is, there exists a permutation matrix P such that PAP T can beexpressed as a block 2 × 2 matrix whose (1,1) and (2,2) blocks are diagonal).Thus, for example, any symmetric positive definite tridiagonal matrix withunit diagonal is optimally scaled.We note that by using (6.22) in place of (7.17), the inequalities of Theorem 7.5 and Corollary 7.6 can be strengthened by replacing m and n with themaximum number of nonzeros per column and row, respectively.Here is an independent result for the Frobenius norm.Theorem 7.7 (Stewart and Sun).
Let A = [a 1, . . . , an]gular, with B := A-1 = [b 1, . . . , bn ]T, and let DCThenProof. For D = diag(di )inequality,with equality if d j||aj||2 =equality forbe nonsin-we have, using the Cauchy-Schwarzfor all j, for some a0. There isAs we have seen in this and the previous section, the minimum value ofThe next result shows that for two-sided scalingsthe matrix |A- 1 ||A| again features in the formula for the minimal conditionnumber.