Higham - Accuracy and Stability of Numerical Algorithms (523152), страница 53
Текст из файла (страница 53)
We assume, without loss of generality,that there are no row interchanges. We write the computed LU factors as Land U. Recall that A + ∆A = LU, with |∆ A| < cnu|L||U| (Theorem 9.3).13.3.1. Method APerhaps the most frequently described method for computing X = A-1 is thefollowing one.Method A.for j = 1:n13.3 INVERTINGAFULL MATRIXBYLU FACTORIZATION271Solve AxJ = ejendCompared with the methods to be described below, Method A has thedisadvantages of requiring more temporary storage and of not having a convenient partitioned version. However, it is simple to analyse.
From Theorem 9.4we have(13.15)and so(13.16)This bound departs from the form (13. 1) only in that |A| is replaced by itsupper bound |L||U| + O(u). The forward error bound corresponding to (13.16)is(13.17)Note that (13.15) says thatis the jth column of the inverse of a matrixclose to A, but it is a different perturbation ∆Aj for each column. It isnot true thatitself is the inverse of a matrix close to A, unless A is wellconditioned.13.3.2. Method BNext, we consider the method used by LINPACK’S xGEDI , LAPACK’S xGETRI,and MATLAB’S INV function.Method B.Compute U -1 and then solve for X the equation XL = U - 1 .To analyse this method we will assume that U -1 is computed by ananalogue of Method 2 or 2C for upper triangular matrices that obtains thecolumns of U –1 in the order 1 to n. Then the computed inverse XuU– 1will satisfy the residual boundWe also assume that the triangular solve from the right with L is done byback substitution.
The computed X therefore satisfies XL = Xu + ∆(X, L)and soThis leads to the residual bound(13.18)272MATRIX INVERSIONwhich is the left residual analogue of (13.16). From (13.18) we obtain theforward error boundNote that Methods A and B are equivalent, in the sense that Method Asolves for X the equation LUX = I while Method B solves XLU = I. Thusthe two methods carry out analogous operations but in different orders. It follows that the methods must satisfy analogous residual bounds, and so (13.18)can be deduced from (13.16).We mention in passing that the LINPACK manual states that for Method Ba bound holds of the form ||AX – I|| < dnu||A|| ||X|| [307, 1979, p.
1.20]. Thisis incorrect, although counterexamples are rare; it is the left residual that isbounded this way, as follows from (13.18).13.3.3. Method CThe next method that we consider is from Du Croz and Higham [322, 1992].It solves the equation UXL = I, computing X a partial row and column at atime. To derive the method partitionwhere the (1, 1) blocks are scalars, and assume that the trailing submatrixX 22 is already known. Then the rest of X is computed according toThe method can also be derived by forming the product X = U -1 × L - 1using the representation of L and U as a product of elementary matrices (anddiagonal matrices in the case of U).
In detail the method is as follows.Method C.for k = n:–1:1endThe method can be implemented so that X overwrites L and U, with theaid of a work vector of length n (or a work array to hold a block row or column13.3 I NVERTINGAFULL M ATRIXBYLU FACTORIZATION273in the partitioned case). Because most of the work is performed by matrix–vector (or matrix-matrix) multiplication, Method C is likely to be the fastestof those considered in this section on many machines. (Some performancefigures are given at the end of the section.)A straightforward error analysis of Method C shows that the computedsatisfies(13.19)We will refer toas a “mixed residual”. From (13.19) we can obtainbounds on the left and right residual that are weaker than those in (13.18) and(13.16) by a factor |U- 1 ||U| on the left or |L||L - 1| on the right, respectively.We also obtain from (13.19) the forward error boundwhich is (13.17) with |A- 1 | replaced by its upper bound |U – 1 ||L - 1 | + O(u)and the factors reordered.The LINPACK routine xSIDI uses a special case of Method C in conjunction with the diagonal pivoting method to invert a symmetric indefinitematrix; see Du Croz and Higham [322, 1992] for details.13.3.4.
Method DThe next method is based on another natural way to form A -1 and is usedby LAPACK’S xPOTRI , which inverts a symmetric positive definite matrix.Method D.Compute L-1 and U -1 and then form A-1 = U -1 × L- 1 .The advantage of this method is that no extra workspace is needed; U - 1and L–1 can overwrite U and L, and can then be overwritten by their product.However, Method D is significantly slower on some machines than MethodsB or C, because it uses a smaller average vector length for vector operations.To analyse Method D we will assume initially that L -1 is computed byMethod 2 (or Method 2C) and, as for Method B above, that U -1 is computedby an analogue of Method 2 or 2C for upper triangular matrices. We have(13.20)Since A = LU – ∆A,(13.21)Rewriting the first term of the right-hand side using XLL = I + ∆ ( X L , L ) ,and similarly for U, we obtain(13.22)274M ATRIX I NVERSIONand so(13.23)This bound is weaker than (13.18), since+ O(u).
Note,however, that the term ∆(XU,XL)A in the residual (13.22) is an unavoidableconsequence of forming XU XL, and so the bound (13.23) is essentially thebest possible.The analysis above assumes that XL and XU both have small left residuals.If they both have small right residuals, as when they are computed usingMethod 1, then it is easy to see that a bound analogous to (13.23) holds forthe right residual– I. If XL has a small left residual and XU has a smallright residual (or vice versa) then it does not seem possible to derive a boundof the form (13.23).
However, we have|X L L - I| = |L- 1 (L X L - I)L| < |L- 1||LXL - I||L|,(13.24)and since L is unit lower triangular with |lij| < 1, we have |( L - 1 ) ij| < 2n - 1 ,which places a bound on how much the left and right residuals of XL can differ.Furthermore, since the matrices L from GEPP tend to be well conditionedand since our numerical experience is that large residualstend to occur only for ill-conditioned matrices, we would expect the left andright residuals of XL almost always to be of similar size. We conclude thateven in the “conflicting residuals” case, Method D will, in practice, usuallysatisfy (13.23) or its right residual counterpart, according to whether XU has asmall left or right residual respectively.
Similar comments apply to Method Bwhen U –1 is computed by a method yielding a small right residual.These considerations are particularly pertinent when we consider MethodD specialized to symmetric positive definite matrices and the Cholesky factorization A = R T R. Now A-1 is obtained by computing XR = R-1 andthen forming A–1 =this is the method used in the LINPACK routine xPODI [307, 1979, Chap. 3]. If XR has a small right residual thenhas a small left residual, so in this application we naturally encounter conflicting residuals. Fortunately, the symmetry and definiteness of the problemhelp us to obtain a satisfactory residual bound.
The analysis parallels thederivation of (13.23), so it suffices to show how to treat the term(cf. (13.21)), where R now denotes the computed Cholesky factor. AssumingRX R = I + ∆(R, XR), and using (13.24) with L replaced by R, we have13.4 G AUSS –J ORDAN E LIMINATION275Table 13.3. Mflop rates for inverting a full matrix on a Cray 2.Method BMethod CMethod DBlocked:Method B(block size 64) Method CMethod DUnblocked:n = 641181257014214470n = 128 n = 256229310314235267166259353264363178306n = 512347351329406415390andFrom the inequality+ O(u), it follows thattogether with ||A||2 =and thus overall we have a bound of the formSinceand A are symmetric the same bound holds for the right residual.13.3.5. SummaryIn terms of the error bounds, there is little to choose between Methods A, B,C, and D. Numerical results reported in [322, 1992] show good agreement withthe bounds.
Therefore the choice of method can be based on other criteria,such as performance and the use of working storage. Table 13.3 gives someperformance figures for a Cray 2, covering both blocked (partitioned) andunblocked forms of Methods B, C, and D.On a historical note, Tables 13.4 and 13.5 give timings for matrix inversionon some early computing devices; times for two modern machines are givenfor comparison.
The inversion methods used for the timings on the earlycomputers in Table 13.4 are not known, but are probably methods from thissection or the next.13.4. Gauss–Jordan EliminationWhereas Gaussian elimination (GE) reduces a matrix to triangular form byelementary operations, Gauss–Jordan elimination (GJE) reduces it all the wayM ATRIX I NVERSION276Table 13.4.
Times (minutes and seconds) for inverting an n × n matrix. Source forDEUCE, Pegasus, and Mark 1 timings: [181, 1981].PegasusDEUCE(English Electric) (Ferranti)1956195526s20s2m 37s58s7m 57s3m 36s17m 52s7m 44sn8162432ManchesterMark 11951——8m16mHP 48GCalculator19934s18s48s—Sun SPARCstation ELC1991.004s.01s.02s.04sTable 13.5. Additional timings for inverting an n × n matrix.MachineAiken Relay CalculatorIBM 602 Calculating PunchSEAC (National Bureau of Standards)DatatronIBM 704Year19481949195419571957nTime38 59½ hours108 hours493 hours8 0 ’ 2½ hours115*19m 30sReference[764, 1948][1053, 1949][1004, 1954][753, 1957][320, 1957]aBlock tridiagonal matrix, using an inversion method designed for such matrices.asymmetric positive definite matrix.to diagonal form.
GJE is usually presented as a method for matrix inversion,but it can also be regarded as a method for solving linear equations. Wewill take the method in its latter form, since it simplifies the error analysis.Error bounds for matrix inversion are obtained by taking unit vectors for theright-hand sides.At the kth stage of GJE, all off-diagonal elements in the kth column areeliminated, instead of just those below the diagonal, as in GE. Since the elements in the lower triangle (including the pivots on the diagonal) are identicalto those that occur in GE, Theorem 9.1 tells us that GJE succeeds if all theleading principal submatrices of A are nonsingular.
With no form of pivotingGJE is unstable in general, for the same reasons that GE is unstable. Partialand complete pivoting are easily incorporated.Algorithm 13.4 (Gauss–Jordan elimination). This algorithm solves the linis nonsingular, by GJE with partialear system Ax = b, wherepivoting.13.4 G AUSS –J ORDAN E LIMINATION277for k = 1:nFind r such that |ark| = maxi>k |aik|.A(k, k:n)A(r, k:n), b(k)b(r) % Swap rows k and r.row_ind = [1:k – 1, k + 1:n] % Row indices of elements to eliminate.m = A(row_ind,k)/A(k,k) % Multipliers.A(row_ind,k:n) = A(row_ind,k:n) – m*A(k,k:n)b(row_ind) = b(row_ind) – m*b(k)endxi = bi /aii , i = 1:nCost: n3 flops.The numerical stability properties of GJE are rather subtle and error analysis is trickier than for GE.