Higham - Accuracy and Stability of Numerical Algorithms (523152), страница 52
Текст из файла (страница 52)
as A–1 = U –1 × L –1 × P, and as the solutionto UA–1 = L –1 × P. These methods generally achieve different levels of efficiency on high-performance computers, and they propagate rounding errorsin different ways. We concentrate in this chapter on the numerical stability,but comment briefly on performance issues.The quality of an approximation YA–1 can be assessed by lookingat the right and left residuals, AY — I and YA — I , and the forward error,Y – A-1. Suppose we perturb AA + ∆A with |∆ A| <thus, weare making relative perturbations of size at mostto the elements of A.
IfY = (A+ ∆A)-1 then (A + ∆A)Y = Y(A + ∆A) = I, so that(13.1)(13.2)and, since (A + ∆A)–1 = A–1 – A– 1 ∆ AA–1 +(13.3)(Note that (13.3) can also be derived from (13.1) or (13.2 ).) The bounds(13.1)-(13.3) represent “ideal” bounds for a computed approximation Y toA – 1, if we regardas a small multiple of the unit roundoff u. We will showthat, for triangular matrix inversion, appropriate methods do indeed achieve(13.1) or (13.2) (but not both) and (13.3).It is important to note that neither (13.1), (13.2), nor (13.3) implies thatY + ∆Y = (A + ∆A)-1 withandthatis, Y need not be close to the inverse of a matrix near to A, even in the normsense.
Indeed, such a result would imply that both the left and right residualsare bounded in norm byand this is not the case for anyof the methods we will consider.To illustrate the latter point we give a numerical example. Define thematrix Anas triu(qr(vand(n))), in M ATLAB notation (vand is aroutine from the Test Matrix Toolbox—see Appendix E); in other words, Anis the upper triangular QR factor of the n × n Vandermonde matrix based on264M ATRIX I NVERSIONFigure 13.1. Residuals for inverses computed by MATLAB’S INV function.equispaced points on [0, 1]. We inverted An , for n = 1:80, using MATLAB’SINV function, which uses GEPP.
The left and right normwise relative residualsare plotted in Figure 13.1. We see that while the left residual is always lessthan the unit roundoff, the right residual becomes large as n increases. Thesematrices are very ill conditioned (singular to working precision for n > 20),yet it is still reasonable to expect a small residual, and we will prove in §13.3.2that the left residual must be small, independent of the condition number.In most of this chapter we are not concerned with the precise values ofconstants (§13.4 is the exception); thus cn denotes a constant of order n.
Tosimplify the presentation we introduce a special notation. Let Aii = 1:k, be matrices such that the product A 1 A 2 . . . Ak is defined and letThen ∆(A1, A2, . . . , Ak)denotes a matrix bounded according to13.2 I NVERTINGAT RIANGULAR M ATRIXThis notation is chosen so that ifevaluated in any order, then265= fl(A 1 A 2 . .
.A k), with the product13.2. Inverting a Triangular MatrixtreatingWe consider the inversion of a lower triangular matrixunblocked and blocked methods separately. We do not make a distinctionbetween partitioned and block methods in this section. All the results in thisand the next section are from Du Croz and Higham [322, 1992].13.2.1. Unblocked MethodsWe focus our attention on two “j” methods that compute L–1 a column at atime.
Analogous “i” and “k” methods exist, which compute L–1 row-wise oruse outer products, respectively, and we comment on them at the end of thesection.The first method computes each column of X = L-1 independently, usingforward substitution. We write it as follows, to facilitate comparison with thesecond method.Method 1.for j = 1:nX(j + 1:n,j) = -x jj L(j + 1:n,j)Solve L(j + 1:n, j + l:n)X(j + 1:n, j) = X(j + 1:n,j),by forward substitution.endIn BLAS terminology, this method is dominated by n calls to the level-2BLAS routine xTRSV (Triangular SolVe).The second method computes the columns in the reverse order. On thejth step it multiplies by the previously computed inverse L(j + 1:n, j + 1:n) - linstead of solving a system with coefficient matrix L(j + 1:n, j + 1:n).Method 2.for j = n:–1:1X (j + 1:n, j) = X(j + 1:n, j + 1:n)L(j + 1:n, j)X (j + 1:n, j) = -xjjX(j + 1:n, j)end266M ATRIX I NVERSIONMethod 2 uses n calls to the level-2 BLAS routine xTRMV (TriangularMatrix times Vector).
On most high-performance machines xTRMV can beimplemented to run faster than xTRSV , so Method 2 is generally preferable toMethod 1 from the point of view of efficiency (see the performance figures atthe end of §13.2.2). We now compare the stability of the two methods.Theorem 8.5 shows that the jth column of the computedfrom Method 1satisfiesIt follows that we have the componentwise residual bound(13.4)and the componentwise forward error bound(13.5)Since= L-1 + O(u), (13.5) can be written as(13.6)which is invariant under row and column scaling of L.
If we take norms weobtain normwise relative error bounds that are either row or column scalingindependent: from (13.6) we have(13.7)and the same bound holds with cond(L –1) replaced by cond(L).Notice that (13.4) is a bound for the right residual, LX – I. This is becauseMethod 1 is derived by solving LX = I. Conversely, Method 2 can be derivedby solving XL = I, which suggests that we should look for a bound on theleft residual for this method.Lemma 13.1. The computed inversefrom Method 2 satisfies(13.8)Proof. The proof is by induction on n, the case n = 1 being trivial.Assume the result is true for n – 1 and writewhereandthe first column of X by solving XL = I according toβ = a- 1 ,z = −βN y.Method 2 computes13.2 I NVERTINGAT RIANGULAR M ATRIX267In floating point arithmetic we obtainThusThis may be written asBy assumption, the corresponding inequality holds for the (2:n , 2:n ) submatrices and so the result is proved.Lemma 13.1 shows that Method 2 has a left residual analogue of the rightresidual bound (13.4) for Method 1.
Since there is, in general, no reason tochoose between a small right residual and a small left residual, our conclusionis that Methods 1 and 2 have equally good numerical stability properties.More generally, it can be shown that all three i, j, and k inversion variantsthat can be derived from the equations LX = I produce identical roundingerrors under suitable implementations, and all satisfy the same right residualbound; likewise, the three variants corresponding to the equation XL = Iall satisfy the same left residual bound.
The LINPACK routine xTRDI usesa k variant derived from XL = I; the LINPACK routines xGEDI and xPODIcontain analogous code for inverting an upper triangular matrix (but the LINPACK Users’ Guide [307, 1979, Chaps. 1 and 3] describes a different variantfrom the one used in the code).13.2.2. Block MethodsLet the lower triangular matrixbe partitioned in block form as(13.9)where we place no restrictions on the block sizes, other than to require thediagonal blocks to be square.
The most natural block generalizations of Methods 1 and 2 are as follows. Here, we use the notation Lp:q,r:s to denote theMATRIX INVERSION268submatrix comprising the intersection of block rows p to q and block columnsr to s of L.Method 1B.for j = 1:N(by Method 1)X j+ 1 :N,j = -Lj+ 1 :N , j X j jSolve L j+ 1 :N , j+ 1 :N X j+ 1 :N,j =by forward substitutionXj+ 1 :N , j,endMethod 2B.f o r j = N:–1:1(by Method 2)X j+ 1 :N,j = Xj+ 1 :N , j+ 1 :N L j+ 1 :N , jX j+ 1 :N,j = -Xj+ 1 :N , j X jjendOne can argue that Method 1B carries out the same arithmetic operationsas Method 1, although possibly in a different order, and that it thereforesatisfies the same error bound (13.4).
For completeness, we give a directproof.Lemma 13.2. The computed inversefrom Method 1B satisfies(13.10)Proof. Equating block columns in (13.10), we obtain the N independentinequalities(13.11)It suffices to verify the inequality with j = 1. Writeand L11 is the (1, 1) block in the partitioning of (13.9).where L 11, X1 1X 11 is computed by Method 1 and so, from (13.4),(13.12)X 21 is computed by forming T = –L2 1 X 11 and solving L2 2 X 21 = T.
Thecomputed X 21 satisfies13.2 I NVERTINGAT RIANGULAR M ATRIX269Hence(13.13)Together, inequalities (13.12) and (13.13) are equivalent to (13.11) with j = 1,as required.We can attempt a similar analysis for Method 2B. With the same notationas above, X21 is computed as X21 = –X 2 2L 2 1X 11. Thus(13.14)To bound the left residual we have to postmultiply by L 11 and use the factthat X11 is computed by Method 2:This leads to a bound of the formwhich would be of the desired form in (13.8) were it not for the factorThis analysis suggests that the left residual is not guaranteedto be small. Numerical experiments confirm that the left and right residualscan be large simultaneously for Method 2B, although examples are quite hardto find [322, 1992]; therefore the method must be regarded as unstable whenthe block size exceeds 1.The reason for the instability is that there are two plausible block generalizations of Method 2 and we have chosen an unstable one that does notcarry out the same arithmetic operations as Method 2.
If we perform a solvewith Ljj instead of multiplying by Xjj we obtain the second variation, whichis used by LAPACK’s xTRTRI :Method 2C.for j = N:-1:1(by Method 2)X j+ 1 :N,j = Xj+ 1 :N , j+ 1 :N L j+ 1 :N , jSolve Xj+ 1 :N,jLjj = –Xj+ 1 :N,j by back substitution.endFor this method, the analogue of (13.14) iswhich yieldsHence Method 2C enjoys a very satisfactory residual bound.270M ATRIX I NVERSIONTable 13.2. Mflop rates for inverting a triangular matrix on a Cray 2.Method 1Method 2k variantBlocked:Method 1B(block size 64) Method 2Ck variantUnblocked:n = 128 n = 256 n = 512 n = 102416223195283114211330289157178114191125246348405129269428378148344263383Lemma 13.3.
The computed inversefrom Method 2C satisfiesIn summary, block versions of Methods 1 and 2 are available that havethe same residual bounds as the point methods. However, in general, there isno guarantee that stability properties remain unchanged when we convert apoint method to block form, as shown by Method 2B.In Table 13.2 we present some performance figures for inversion of a lowertriangular matrix on a Cray 2.
These clearly illustrate the possible gains inefficiency from using block methods, and also the advantage of Method 2 overMethod 1. For comparison, the performance of a k variant is also shown(both k variants run at the same rate). The performance characteristics ofthe i variants are similar to those of the j variants, except that since they arerow oriented rather than column oriented, they are liable to be slowed downby memory-bank conflicts, page thrashing, or cache missing.13.3. Inverting a Full Matrix by LU FactorizationNext, we consider four methods for inverting a full matrixgiven anLU factorization computed by GEPP.