Higham - Accuracy and Stability of Numerical Algorithms (523152), страница 66
Текст из файла (страница 66)
However, if A has distinct eigenvalues then the results in Theorems 7.5and 7.7 on diagonal scalings are applicable and enable us to determine (anapproximation to) the minimal condition number. Explicit expressions can begiven for the minimal 2-norm condition number for n = 2; see Young [1122,1971, §3.8].A trivial bound is ||Ak|| < ||A||k.
A sharper bound can be derived in termsof the numerical radiuswhich is the point of largest modulus in the field of values of A. It is not hardto show that ||A||2/2 < r(A) < ||A||2 [580, 1985, p. 331]. The (nontrivial)inequality r(Ak) < r(A)k [580, 1985, p. 333] leads to the boundIf A is diagonalizable then, from (17.1a), we have the bound(17.4)for any p-norm. (Since p(A) < ||A|| for any norm, we also have the lower boundp (A)k < ||Ak||p .) This bound is unsatisfactory for two reasons.
First, bychoosing A to have well-conditioned large eigenvalues and ill-conditioned smalleigenvalues we can make the bound arbitrarily pessimistic (see Problem 17.1).Second, it models norms of powers of convergent matrices as monotonicallydecreasing sequences, which is qualitatively incorrect if there is a large hump.The Jordan canonical form can also be used to bound the norms of thepowers of a defective matrix. If XJX-1 is the Jordan canonical form of δ- 1 Athen(17.5)for all δ > 0. This is a special case of a result of Ostrowski [812, 1973,Thin. 20.1] and the proof is straightforward: We can write δ - 1 A = X(δ-1 D +M)X –1, where D = diagand M is the off-diagonal part of the Jordan17.1 M ATRIX P OWERSINE XACT A RITHMETIC351form.
Then A = X(D + δM)X–1, and (17.5) follows by taking norms. Analternative way of writing this bound iswhere A = XJX–1 and D =Note that this is notthe same X as in (17.5): multiplying A by a scalar changes κ(X) when A isnot diagonalizable. Both bounds suffer from the same problems as the bound(17.4) for diagonalizable matrices.Another bound in terms of the Jordan canonical form (17.1) of A is givenby Gautschi [430, 1953]. For convergent matrices, it can be written in theform(17.6)where p =and c is a constant depending only on A (c is notdefined explicitly in [430, 1953]). The factor kp-1 makes this bound somewhatmore effective at predicting the shapes of the actual curve than (17.5), butagain c can be unsuitably large.Since the norm estimation problem is trivial for normal matrices, it isnatural to look for bounds that involve a measure of nonnormality.
Considerthe Schur decomposition Q*AQ = D+N, where N is strictly upper triangular,and let S represent the set of all such N. The nonnormality of A can bemeasured by Henrici’s departure from normality [516, 1962]For the Frobenius norm, Henrici shows that ||N||F is independent of the particular Schur form and thatLászló [690, 1994] has recently shown that ∆F(A) is within a constant factorof the distance from A to the nearest normal matrix:where v(A) = min{||E|| F : A + E is normal}. Henrici uses the departurefrom normality to derive the 2-norm bounds(17.7)352M ATRIX P OWERS.Empirical evidence suggests that the first bound in (17.7) can be very pessimistic. However, for normal matrices both the bounds are equalities.Another bound involving nonnormality is given by Golub and Van Loan [470,1989, Lem.
7.3.2]. They show that, in the above notation,for any θ > 0. This bound is an analogue of (17.5) with the Schur formreplacing the Jordan form. Again, there is equality when A is normal (if weset θ = 0).To compare bounds based on the Schur form with ones based on the Jordanform we need to compare ∆(A) with κ(X). If A is diagonalizable then [710,1969, Thin. 4]it can be shown by a 2 × 2 example that minX κ 2 (X) can exceed ∆F(A)/||A||Fby an arbitrary factor [201, 1993, §4.2.7], [190, 1996, §9.1.1].Another tool that can be used to bound the norms of powers is the pseudospectrum of a matrix, popularized by Trefethen [1017, 1992], [1018]. Theofis defined, for a givento be the setand it can also be represented, in terms of the resolvent (zI – A) -1, asAs Trefethen notes [1017, 1992], by using the Cauchy integral representationof Ak (which involves a contour integral of the resolvent) one can show that(17.8)where the(17.9)This bound is very similar in flavour to (17.5).
The difficulty is transferredfrom estimating κ(X) to choosingand estimatingBai, Demmel, and Gu [39, 1994] consider A with p (A) < 1 and manipulate the Cauchy integral representation of Ak in a slightly different way fromTrefethen to produce a bound in terms of the distance to the nearest unstablematrix,17.2 B OUNDSFORFINITE P RECISION A RITHMETIC353Their bound iswhere e < a m := (1+ 1/m) m +1 < 4. Note that d(A) < 1 when p (A) < 1, asis easily seen from the Schur decomposition. The distance d(A) is not easy tocompute.
One approach is a bisection technique of Byers [175, 1988].Finally, we mention that the Kreiss matrix theorem provides a good estimate of supk >0 ||Ak|| for a generalalbeit in terms of an expressionthat involves–the resolvent and is not easy to compute:where φ(A) = sup{(|z| – 1)||(zI – A)–1||2 : |z| > 1} and e = exp(1). Detailsand references are given by Wegert and Trefethen [1071, 1994].17.2. Bounds for Finite Precision ArithmeticThe formulae A · Ak or Ak · A can be implemented in several ways, corresponding to different loop orderings in each individual product, but as longas each product is formed using the standard formula (AB)ij =all these variations satisfy the same rounding error bounds.
We do notanalyse here the use of the binary powering technique, where, for example, A9 is formed as A((A2)2)2; alternate multiplication on the left and right(fl(A k ) = fl(Afl(A k-2 )A)); or fast matrix multiplication techniques such asStrassen’s method. None of these methods is equivalent to repeated multiplication in finite precision arithmetic.We suppose, without loss of generality, that the columns of Am are computed one at a time, the jth as fl(A(A(. . . (Aej) . .
.))), where ej is the jthunit vector. The error analysis for matrix–vector multiplication shows thatthe jth computed column of Am satisfies(17.10)where, for both real and complex matrices, we have (Problem 3.7)(17.11)It follows thatand so a sufficient condition for convergence of the computed powers is that(17.12)M ATRIX P OWERS354This result is useful in certain special cases: p(|A|) = p(A) if A is triangularor has a checkerboard sign pattern (since then |A| = DAD –1 where D =diag(±1)); if A is normal then p(|A| <(this bound being attainedfor a Hadamard matrix); and in Markov processes, where the aij are transitionprobabilities, |A| = A. However, in general p (|A|) can exceed p(A) by anarbitrary factor (see Problem 17.2).To obtain sharper and more informative results it is necessary to use moreinformation about the matrix.
In the following theorem we give a sufficientcondition, baaed on the Jordan canonical form, for the computed powers of amatrix to converge to zero. Although the Jordan form is usually eschewed bynumerical analysts because of its sensitivity to perturbations, it is convenientto work with in this application and leads to an informative result.Theorem 17.1 (Higham and Knight).
Letwith the Jordan form(17.1) have spectral radius p(A) < 1. A sufficient condition for fl(Am)as mis(17.13)where t = maxi ni .Proof. It is easy to see that if we can find a nonsingular matrix S suchthat(17.14)for all i, then the producttends to 0 as mIn the rest of the proof we construct such a matrix Sfor the ∆Ai in (17.10).Now consider the matrixIts ith diagonal block is of the formwhere the only nonzeros in N are 1s on the first superdiagonal, and soDefining S =we haveand(17.15)Now we setwhere 0 < θ < 1 and we determine θ so that(17.14) is satisfied, that is, so thatfor all i.
From (17.11)and (17.15) we have17.2 B OUNDSFORFINITE P RECISION A RITHMETIC355Therefore (17.14) is satisfied ifthat is, ifIf the integer t is greater than 1 then the function f ( θ ) = (1 – θ)t-1 θ hasa maximum on [0,1] at θ* = t–1 and f(θ * ) = (t – 1)–1(1 – t– 1 ) t satisfies(4(t – 1))-1 < f(θ*) < e-1. We conclude that for all integers 1 < t < n.is sufficient to ensure that (17.14) holds.If A is normal then ||A||2 = p(A) < 1, t = 1, and κ2 (X) = 1, so (17.13)can be written in the formwhere cn is a constant depending on n. This condition is also easily derivedby taking 2-norms in (17.10) and (17.11).We can show the sharpness of the condition in Theorem 17.1 by using theChebyshev spectral differentiation matrix Cndescribed by Trefethenand Trummer [1020, 1987].
The matrix Cn arises from degree n – 1 polynomialinterpolation of n arbitrary data values at n Chebyshev points, including aboundary condition at 1. It is nilpotent and is similar to a single Jordan blockof dimension n. We generate Cn in MATLAB using the routine chebspec fromthe Test Matrix Toolbox (see Appendix E). Figure 17.4 shows the norms ofthe powers of four variants of the Cn matrix.The powers of C 8 converge to zero, while the powers of 15C 8 diverge.Using a technique for estimating k2 (X) described in [565, 1995], we find that1.08 × 10–9, which is safely less than 1, so that Theorem 17.1predicts convergence. For 15C 8 we have2.7, so the theoremcorrectly does not predict convergence.Next, for the matrix A = C13 + 0.36I, whose powers diverge, we have13.05, and for A = C13 + 0.01I, whose powersconverge,0.01, so again the theorem is reasonablysharp.The plots reveal interesting scalloping patterns in the curves of the norms.For C8 and 15C8 the dips are every 8 powers, but the point of first dip andthe dipping intervals are altered by adding different multiples of the identitymatrix, as shown by the C13 examples.
Explaining this behaviour is an openproblem (see Problem 17.3).356M ATRIX P OWERSFigure 17.4. Computed powers of chebspec matrices.We saw in the last section that the powers of A can be bounded in termsof the pseudospectral radius. Can the pseudospectrum provide informationabout the behaviour of the computed powers? Figure 17.5 shows approximations to thefor the matrices used in Figure 17.4, wherethe (computed) eigenvalues are plotted as crosses “×”. We seethat the pseudospectrum lies inside the unit disc precisely when the powersconverge to zero.A heuristic argument based on (17.10) and (17.11) suggests that, if for randomly chosen perturbations ∆Ai with ||∆Ai || < cnu||A||, most of the eigenvalues of the perturbed matrices lie outside the unit disc, then we can expecta high percentage of the terms A + ∆Ai in (17.10) to have spectral radiusbigger than 1 and hence we can expect the product to diverge.