Higham - Accuracy and Stability of Numerical Algorithms (523152), страница 56
Текст из файла (страница 56)
Peters and Wilkinson [828,1975] state that “it is well known that Gauss–Jordan is stable” for a diagonallydominant matrix, but a proof does not seem to have been published.Previous HomeChapter 14Condition Number EstimationMost of LAPACK’S condition numbers and error bounds are based onestimated condition numbers . . .The price one pays for using an estimatedrather than an exact condition number isoccasional (but very rare) underestimates of the true error;years of experience attest to the reliability of our estimators,although examples where they badly underestimate the error can be constructed.— E.
ANDERSON et al,, LAPACK Users’ Guide, Release 2.0 (1995)The importance of the counter-examples is that they make clear thatany effort toward proving that the algorithmsalways produce useful estimations is fruitless.It may be possible to prove that the algorithmsproduce useful estimations in certain situations, however,and this should be pursued.An effort simply to construct more complex algorithms is dangerous.— A. K.
CLINE and R. K. REW, A Set of Counter-Examplesto Three Condition Number Estimators (1983)Singularity is almost invariably a clue.— SIR ARTHUR CONAN DOYLE, The Boscombe Valley Mystery “(1892)289Next290CONDITION NUMBER ESTIMATION14.1. How to Estimate Componentwise ConditionNumbersWhen bounding the forward error of a computed solution to a linear systemwe would like to obtain the bound with an order of magnitude less work thanis required to compute the solution.
For a dense n × n system, where thesolution process usually requires O(n 3) operations, we need to compute thebound in O(n 2) operations. An estimate of the bound that is correct to withina factor 10 is usually acceptable, because it is the magnitude of the error thatis of interest, not its precise value.In the perturbation theory of Chapter 7 for linear equations we obtainedperturbation bounds that involve the condition numbersand their variants.
To compute these condition numbers exactly we needto compute A–1 , which requires O(n 3 ) operations, even if we are given afactorization of A. Is it possible to produce reliable estimates of both conditionnumbers in O(n 2) operations? The answer is yes, but to see why we first needto rewrite the expression for condE , f(A, x). Consider the general expressionwhere d is a given nonnegative vector (thus, d = f + E|x| forcond E , f (A, x)); note that the practical error bound (7.27) is of this form.TWriting D = diag(d) and e = [1, 1, . . . . 1] , we have(14.1)Hence the problem is equivalent to that of estimatingIf we can estimate this quantity then we can estimate any of the conditionnumbers or perturbation bounds for a linear system.
There is an algorithm,described in §14.3, that produces reliable order-of-magnitude estimates of||B||1, for an arbitrary B, by computing just a few matrix-vector productsBx and BTy for carefully selected vectors x and y and then approximating||B|| 1||Bx|| 1 /||x||1. If we assume that we have a factorization of A (say,PA = LU or A = QR), then we can form the required matrix-vector productsfor B = A- 1 D in O(n 2) flops.
Since ||B||1 =it follows that we canuse the algorithm to estimatein O(n 2) flops.Before presenting the l-norm condition estimation algorithm, we describea more general method that estimates ||B||p for anyThis moregeneral method is of interest in its own right and provides insight into thespecial case p = 1.14.2 THE p-NORMPOWER291METHOD14.2. The p-Norm Power MethodAn iterative “power method” for computing ||A||p was proposed by Boyd in1974 [139, 1974]. When p = 2 it reduces to the usual power method appliedto ATA. We use the notation dualp (x) to denote any vector y of unit qnorm such that equality holds in the Holder inequality(thisnormalization is different from the one we used in (6.3), but is more convenientfor our present purposes).
Throughout this section, p > 1 and q is defined byp -1 + q-1 = 1.Algorithm 14.1 (p-norm power method). Givenandthis algorithm computes γ and x such that γ < ||A||p and ||Ax||p = γ||x||p .x = x0repeatyzif/||x0 ||p= Ax= AT dualp (y)||z||q < zTxγ = ||y||pquitendx = dualq (z)endCost: 4rmn flops (for r iterations).There are several ways to derive Algorithm 14.1. Perhaps the most naturalway is to examine the optimality conditions for maximization ofFirst, we note that the subdifferential (that is, the set of subgradients) of anarbitrary vector norm ||·|| is given by [378, 1987, p.
379]If x 0 thenx 0,from the Holder inequality, and so, ifIt can also be shown that if A has full rank,(14.2)292CONDITION NUMBER ESTIMATIONWe assume now that A has full rank, 1 < p <and xis easy to see that there is a unique vector dualp ( x ), soelement, that is, ||x||p is differentiable. Hence we have0. Then ithas just one(14.3)The first-order Kuhn–Tucker condition for a local maximum of F is thereforeSince dualq (dual p (x)) = x/||x||p if p1,this equation can be written(14.4)The power method is simply functional iteration applied to this transformedset of Kuhn–Tucker equations (the scale factoris irrelevant sinceF(ax) = F(x)).For the 1- and-norms, which are not everywhere differentiable, a different derivation can be given.
The problem can be phrased as one of maximizingthe convex function F(x) := ||Ax||p over the convex set S := {x : ||x||p < 1}.The convexity of F and S ensures that, for anyat least one vector gexists such that(14.5)Vectors g satisfying (14.5) are called subgradients of F (see, for example,[378, 198 7, p. 364]). Inequality (14.5) suggests the strategy of choosing asubgradient g and moving from u to a pointthat maximizes gT (υ – u),Tthat is, a vector that maximizes g υ. Clearly, υ * = dualq ( g).
Since, from(14.2), g has the form AT dualp (Ax), this completes the derivation of theiteration.For all values of p the power method has the desirable property of generating an increasing sequence of norm approximations.Lemma 14.2. In Algorithm 14.1, the vectors from the kth iteration satisfyThe first inequality in (ii) is strict if convergence is not obtained on the kthiteration.14.2 THE p-NORMProof.POWER293METHODThenFor the last part, note that, in view of (i), the convergence test “||zk||q <can be written as “||zk||q < ||yk||p ” .It is clear from Lemma 14.2 (or directly from the Holder inequality) thatthe convergence test “||z|| q < z T x” in Algorithm 14.1 is equivalent to “||z||q =z T x” and, since ||x||p = 1, this is equivalent to x = dualq ( z ).
Thus, althoughthe convergence test compares two scalars, it is actually testing for equalityin the vector equation (14.4).The convergence properties of Algorithm 14.1 merit a careful description.First, in view of Lemma 14.2, the scalars γk = ||yk||p form an increasing andconvergent sequence. This does not necessarily imply that Algorithm 14.1converges, since the algorithm tests for convergence of the xk , and these vectors could fail to converge.
However, a subsequence of the xk must convergeto a limit, say. Boyd [139, 1974] shows that ifis a strong local maximumof F with no zero components, thenlinearly.If Algorithm 14.1 converges it converges to a stationary point of F( x )when 1 < p <Thus, instead of the desired global maximum ||A||p , wemay obtain only a local maximum or even a saddle point. When p = 1 orif the algorithm converges to a point at which F is not differentiable, thatpoint need not even be a stationary point.
On the other hand, for p = 1orAlgorithm 14.1 terminates in at most n + 1 iterations (assuming thatwhen dualp or dualq is not unique an extreme point of the unit ball is taken),since the algorithm moves between the vertices ei of the unit ball in the 1norm, increasing F on each stage (x = ±ei for p = 1, and dual p (y) = ±ei forP =An example where n iterations are required for p = 1 is given inProblem 14.2.For two special types of matrix, more can be said about Algorithm 14.1.(1) If A = xyT (rank 1), the algorithm converges on the second step withγ = ||A||p = ||x||p||y||q , whatever x0 .(2) Boyd [139, 1974] shows that if A has nonnegative elements, ATA isirreducible, 1 < p <and x0 has positive elements, then the xk convergeand γ k||A||p .For values of p strictly between 1 andthe convergence behaviour ofAlgorithm 14.1 is typical of a linearly convergent method: exact convergence isnot usually obtained in a finite number of steps and arbitrarily many steps can294C ONDITION N UMBER E STIMATIONbe required for convergence, as is well-known for the 2-norm power method.Fortunately, there is a method for choosing a good starting vector that canbe combined with Algorithm 14.1 to produce a reliable norm estimator; seethe Notes and References and Problem 14.1.We now turn our attention to the extreme values of p: 1 and14.3.
LAPACK l-Norm EstimatorAlgorithm 14.1 has two remarkable properties when p = 1: it almost alwaysconverges within four iterations (when x0 = [1, 1, . . . , 1]T, say) and it frequently yields ||A||1 exactly. This rapid, finite termination is also obtainedfor p =and is related to the fact that Algorithm 14.1 moves among thefinite set of extreme points of the unit ball. Numerical experiments suggestthat the accuracy is about the same for both norms but that slightly moreiterations are required on average for p =Hence we will confine ourattention to the 1-norm.The l-norm version of Algorithm 14.1 was derived independently of thegeneral algorithm by Hager [492, 1984] and can be expressed as follows. Thenotation ξ = sign(y) means that ξ i = 1 or –1 according as yi > 0 or yi < 0.We now specialize to square matrices.Algorithm 14.3 (1-norm power method).
Givencomputes γ and x such that γ < ||A||1 and ||Ax||1 =x = n– 1erepeaty = Axξ = sign(y)z = A Tξifγ = ||y||1quitendx = e j, where |zj| =endγ||x|| 1 .this algorithm(smallest such j)Numerical experiments show that the estimates produced by Algorithm14.3 are frequently exact (γ = ||A|| 1), usually “acceptable” (γ > ||A||1/10),and sometimes poor (γ < ||A||1/10).An important question for any norm or condition estimator is whetherthere exists a “counterexample”—a parametrized matrix for which the quotient “estimate divided by true norm” can be made arbitrarily small (or large,depending on whether the estimator produces a lower bound or an upper14.3 LAPACK 1-N ORM E STIMATOR295bound) by varying a parameter.
A general class of counterexample for Algorithm 14.3 is given by the matriceswhere Ce = CTe = 0 (there are many possible choices for C ). For any suchmatrix, Algorithm 14.3 computes y = n – 1e, ξ = e, z = e, and hence thealgorithm terminates at the end of the first iteration withThe problem is that the algorithm stops at a local maximum that can differfrom the global one by an arbitrarily large factor.A more reliable and more robust algorithm is produced by the followingmodifications of Higham [537, 1988].Definition of estimate. To overcome most of the poor estimates, γ isredefined aswhereThe vector b is considered likely to “pick out” any large elements of A in thosecases where such elements fail to propagate through to y.Convergence test.