Higham - Accuracy and Stability of Numerical Algorithms (523152), страница 86
Текст из файла (страница 86)
For further detailsof the MDS method see Dennis and Torczon [301, 1991] and Torczon [1008,1989], [1009, 1991].The MDS method requires at least 2n independent function evaluationsper iteration, which makes it very suitable for parallel implementation. Generalizations of the MDS method that are even more suitable for parallel computation are described in [301, 1991] and [1010, 1992].
The MDS methodrequires O(n2) elements of storage for the simplices, but this can be reducedto O(n) (at the cost of extra bookkeeping) if an appropriate choice of initialsimplex is made [301, 1991].Unlike most direct search methods, the MDS method possesses some convergence theory.
Torczon [1009, 1991] shows that if the level set of j at24.3 E XAMPLESOFD IRECT S EARCH479is compact and f is continuously differentiable on this level set then a subsequence of the points(where k denotes the iteration index) converges to astationary point of f. Moreover, she gives an extension of this result that requires only continuity of f and guarantees convergence to either a stationarypoint off or a point where f is not continuously differentiable.Our implementation of the MDS method provides two possible startingsimplices, both of which include the starting point x0: a regular one (all sidesof equal length) and a right-angled one based on the coordinate axes, both asdescribed by Torczon in [1008, 1989]. The scaling is such that each edge ofthe regular simplex, or each edge of the right-angled simplex that is joined tox0 , has lengthAlso as in [1008, 1989], the main terminationtest halts the computation when the relative size of the simplex is no largerthan a tolerance toll that is, when(24.3)Unless otherwise stated, we used tol = 10-3 in (24.2) and (24.3) in all ourexperiments.The third method that we have used is the Nelder–Mead direct searchmethod [787, 1965], [303, 1987], which also employs a simplex but whichis fundamentally different from the MDS method.
We omit a descriptionsince the method is described in textbooks (see, for example, Gill, Murray,and Wright [447, 1981, §4.2.2], or Press, Teukolsky, Vetterling, and Flannery[842, 1992, §10.4]). Our limited experiments with the Nelder-Mead methodindicate that while it can sometimes out-perform the MDS method, the MDSmethod is generally superior for our purposes. No convergence results of theform described above for the MDS method are known for the Nelder-Meadmethod.For general results on the convergence of “pattern search” methods, seeTorczon [1011, 1993]. The AD and MDS methods are pattern search methods,but the Nelder–Mead method is not.It is interesting to note that the MDS method, the Nelder-Mead method,and our particular implementation of the AD method do not exploit the numerical values of f: their only use of f is to compare two function values tosee which is the larger!Our MATLAB implementations of the AD, MDS, and Nelder-Mead directsearch methods are in the Test Matrix Toolbox, described in Appendix E.24.3.
Examples of Direct SearchIn this section we give examples of the use of direct search to investigate thebehaviour of numerical algorithms.A UTOMATIC E RROR A NALYSIS48024.3.1. Condition EstimationWe have experimented with MATLAB implementations of two matrix condition number estimators. RCOND is the LINPACK estimator, described in§14.4, as implemented in the built-in MATLAB function rcond. LACON is theLAPACK condition estimator, as given in Algorithm 14.4 and implementedin MATLAB’S condest.
Both estimators compute a lower bound for k1(A)by estimating ||A–1 ||1 (||A||1 is computed explicitly as the maximum columnsum).To put the problem in the form of (24.1), we define x = vec(A), Aandwhere est (A) < k1 (A) is the condition estimate. We note that, since thealgorithms underlying RCOND and LACON contain tests and branches, thereare matrices A for which an arbitrarily small change in A can completelychange the condition estimate; hence for both algorithms f has points ofdiscontinuity.We applied the MDS maximizer to RCOND starting at the 5 x 5 Hilbertmatrix.
After 67 iterations and 4026 function evaluations the maximizer hadlocated a matrix for which f(x) = 226.9. We then started the NelderMead method from this matrix; after another 4947 function evaluations ithad reached the matrix (shown to 5 significant figures)for whichK1(A)= 3.38 x 10 5 , est(A) = 1.65 x 101,This example is interesting because the matrix is well scaled, while the parametrized counterexamples of Cline and Rew [217, 1983] all become badlyscaled when the parameter is chosen to make the condition estimate poor.For LACON we took as starting matrix the 4 x 4 version of then x n matrixwith a ij = cos((i – 1)(j – 1)π /( n – 1)) (this is a Chebyshev–Vandermondematrix, as used in §21.3.3, and is orthog(n, - 1) in the Test Matrix Toolbox). After 11 iterations and 1001 function evaluations the AD maximizerhad determined a (well-scaled) matrix A for whichk1(A)= 2.94 x 105,est(A) = 4.81,24.3 E XAMPLESOFD IRECT S EARCH481With relatively little effort on our part (most of the effort was spent experimenting with different starting matrices), the maximizers have discoveredexamples where both condition estimators fail to achieve their objective ofproducing an estimate correct to within an order of magnitude.
The value ofdirect search maximization in this context is clear: it can readily demonstratethe fallibility of a condition estimator—a task that can be extremely difficult to accomplish using theoretical analysis or tests with random matrices.Moreover, the numerical examples obtained from direct search may providea starting point for the construction of parametrized theoretical ones, or forthe improvement of a condition estimation algorithm.In addition to measuring the quality of a single algorithm, direct searchcan be used to compare two competing algorithms to investigate whether onealgorithm performs uniformly better than the other. We applied the MDSmaximizer to the functionwhere estL(A) and estR(A) are the condition estimates from LACON andRCOND, respectively.
If f(x) > 1 then LACON has produced a largerlower bound for k1 (A) than RCOND. Starting with a random 5 x 5 matrix the Nelder–Mead maximizer produced after 1788 function evaluations amatrix A for which estL(A) = k1(A) and f(x) = 1675.4. With f defined asf(x) = estR(A)/estL(A), and starting with I4, after 6065 function evaluations the MDS maximizer produced a matrix for which f(x) = 120.8. Thisexperiment shows that neither estimator is uniformly superior to the other.This conclusion would be onerous to reach by theoretical analysis of the algorithms.24.3.2.
Fast Matrix InversionWe recall Strassen’s inversion method from Problem 22.8: forit uses the formulaeP1 =P3 = P1 A12 ,P5 = P4 – A22,P2 = A21 P1 ,P4 = A21 P3 ,P6 = P5–1,482A UTOMATIC E RROR A NALYSISwhere each of the matrix products is formed using Strassen’s fast matrix multiplication method. Strassen’s inversion method is clearly unstable for generalA, because the method breaks down if A11 is singular.
Indeed Strassen’s inversion method has been implemented on a Cray-2 by Bailey and Ferguson [41,1988] and tested for n < 2048, and these authors observe empirically thatthe method has poor numerical stability. Direct search can be used to gaininsight into the numerical stability.With x = vet( A )define the stability measure(24.4)whereis the inverse of A computed using Strassen’s inversion method.This definition of f is appropriate because, as shown in Chapter 13, for mostconventional matrix inversion methods either the left residual XA – I or theright residual AX – I is guaranteed to have norm of order u||X|| ||A||. To treatStrassen’s inversion method as favorably as possible we use just one level ofrecursion; thus P1 and P6 are computed using GEPP but the multiplicationsare done with Strassen’s method. We applied the MDS maximizer, withtol = 10-9 in (24.3), starting with the 4 x 4 Vandermonde matrix whose (i,j)element is ((j – 1)/3) i–1.
After 34 iterations the maximizer had converged withf = 0.838, which represents complete instability. The corresponding matrixA is well conditioned, with k2(A) = 82.4. For comparison, the value of fwhen A is inverted using Strassen’s method with conventional multiplicationis f = 6.90 x 10–2; this confirms that the instability is not due to the use offast multiplication techniques—it is inherent in the inversion formulae.If A is a symmetric positive definite matrix then its leading principal submatrices are no more ill conditioned than the matrix itself, so we might expectStrassen’s inversion method to be stable for such matrices. To investigatethis possibility we carried out the same maximization as before, except weenforced positive definiteness as follows: when the maximizer generates a vector x = vec(B), A in (24.4) is defined as A = BTB.
Starting with a 4 x 4random matrix A with k2(A) = 6.71 x 107, the maximization yielded the valuef = 3.32 x 10 -8 after 15 iterations, and the corresponding value of f whenconventional multiplication is used is f = 6.61 x 10 –11 (the “maximizing”matrix A has condition number k2(A) = 3.58 x 109).The conclusion from these experiments is that Strassen’s inversion methodcannot be guaranteed to produce a small left or right residual even when Ais symmetric positive definite and conventional multiplication is used. Hencethe method must be regarded as being fundamentally unstable.24.3 E XAMPLESOF483D IRECT S EARCH24.3.3.
Solving a CubicExplicit formulae can be obtained for the roots of a cubic equation using techniques associated with the 16th century mathematicians del Ferro, Cardano,Tartaglia, and Vieta [140, 1968], [332, 1990]. The following development isbased on Birkhoff and Mac Lane [102, 1977, §5.5].Any nondegenerate cubic equation can be put in the form x3 + ax2 + bx +c = O by dividing through by the leading coefficient. We will assume thatthe coefficients are real. The change of variable x = y – a/3 eliminates thequadratic term:y3 + py + q = 0,Then Vieta’s substitution y = w – p/(3w ) yieldsand hence a quadratic equation in w3: (W3)2 + qw3 — p3 /27 = 0.
Hence(24.5)For either choice of sign, the three cube roots for w yield the roots of theoriginal cubic, on transforming back from w to y to Z.Are these formulae for solving a cubic numerically stable? Direct searchprovides an easy way to investigate. The variables are the three coefficientsa, b, c and for the objective function we take an approximation to the relative3. We compute the “exact” roots z userror of the computed rootsing MATLAB’s roots function (which uses the QR algorithm to compute theeigenvalues of the companion matrix21 ) and then our relative error measureiswhere we minimize over all six permutations Π.First, we arbitrarily take the "+" square root in (24.5). With almostany starting vector of coefficients, the MDS maximizer rapidly identifies coefficients for which the computed roots are very inaccurate.