Higham - Accuracy and Stability of Numerical Algorithms (523152), страница 83
Текст из файла (страница 83)
However, if the errors are measured relative to ||A|| ||B||, which is anatural quantity to use for comparison when employing matrix norms, thenthey are just as small as for conventional multiplication.It is straightforward to show that if the 3M method is implemented using Strassen’s method to form the real matrix products, then the computedcomplex productsatisfies the same bound (22.14) as for Strassen’s methoditself, but with an extra constant multiplier of 6 and with 4 added to theexpression inside the square brackets.22.3. Notes and ReferencesA good introduction to the construction of fast matrix multiplication methodsis provided by the papers of Pan [816, 1984] and Laderman, Pan, and Sha [684,1992].Harter [504, 1972] shows that Winograd’s formula (22.2) is the best of itskind, in a sense made precise in [504, 1972].How does one derive formulae such as those of Winograd and Strassen, orthat in the 3M method? Inspiration and ingenuity seem to be the key.
A fairlystraightforward, but still not obvious, derivation of Strassen’s method is givenby Yuval [1124, 1978]. Gustafson and Aluru [491, 1996] develop algorithms460FAST M ATRIX M ULTIPLICATIONthat systematically search for fast algorithms, taking advantage of a parallelcomputer. In an exhaustive search taking 21 hours of computation time ona 256 processor nCUBE 2, they were able to find 12 methods for multiplying2 complex numbers in 3 multiplications and 5 additions; they could not finda method with fewer additions, thus proving that such a method does notexist.
However, they estimate that a search for Strassen’s method on a 1024processor nCUBE 2 would take many centuries, even using aggressive pruningrules, so human ingenuity is not yet redundant!To obtain a useful implementation of Strassen’s method a number of issueshave to be addressed, including how to program the recursion, how best tohandle rectangular matrices of arbitrary dimension (since the basic methodis defined only for square matrices of dimension a power of 2), and how tocater for the extra storage required by the method.
These issues are discussedby Bailey [43, 1988], Bailey, Lee, and Simon [47, 1991], Fischer [374, 1974],Higham [544, 1990], Kreczmar [673, 1976], and[934, 1976], among others. Douglas, Heroux, Slishman, and Smith [317, 1994] present a portableFortran implementation of Winograd’s variant of Strassen’s method for realand complex matrices, with a level-3 BLAS interface; they take care to usea minimal amount of extra storage (about 2n 3/3 elements of extra storage isrequired when multiplying n x n matrices).Higham [544, 1990] shows how Strassen’s method can be used to producealgorithms for all the level-3 BLAS operations that are asymptotically fasterthan the conventional algorithms.
Most of the standard algorithms in numerical linear algebra remain stable (in an appropriately weakened sense) whenfast level-3 BLAS are used. See, for example, Chapter 12, $18.4, and Problems11.4 and 13.2.Knight [664, 1995] shows how to choose the recursion threshold to minimizethe operation count of Strassen’s method for rectangular matrices. He alsoshows how to use Strassen’s method to compute the QR factorization of anm x n matrix in O(mn1.838 ) operations instead of the usual O(mn2 ).Bailey, Lee, and Simon [47, 1991] substituted their Strassen’s method codefor a conventionally coded BLAS3 subroutine SGEMM and tested LAPACK’SLU factorization subroutine SGETRF on a Cray Y-MP.
They obtained speedimprovements for matrix dimensions 1024 and larger.The Fortran 90 standard includes an intrinsic function MATMUL that returnsthe product of its two matrix arguments. The standard does not specify whichmethod is to be used for the multiplication. An IBM compiler supports the useof Winograd’s variant of Strassen’s method, via an optional third argumentto MATMUL (an extension to Fortran 90) [318, 1994],Brent was the first to point out the possible instability of Winograd’smethod [143, 1970].
He presented a full error analysis (including Theorem 22. 1) and showed that scaling ensures stability.An error analysis of Strassen’s method was given by Brent in 1970 inPROBLEMS461an unpublished technical report that has not been widely cited [142, 1970].Section 22.2.2 is based on Higham [544, 1990].According to Knuth, the 3M formula was suggested by P. Ungar in 1963[668, 1981, p. 647].
It is analogous to a formula of Karatsuba and Ofman [643,1963] for squaring a 2n-digit number using three squarings of n-digit numbers. That three multiplications (or divisions) are necessary for evaluatingthe product of two complex numbers was proved by Winograd [1106, 1971].Section 22.2.4 is based on Higham [552, 1992].The answer to the question “What method should we use to multiplycomplex matrices?” depends on the desired accuracy and speed. In a Fortranenvironment an important factor affecting the speed is the relative efficiencyof real and complex arithmetic, which depends on the compiler and the computer (complex arithmetic is automatically converted by the compiler intoreal arithmetic).
For a discussion and some statistics see [552, 1992].The efficiency of Winograd’s method is very machine dependent. Bjørstad,Marine, Sørevik, and Vajteršic [122, 1992] found the method useful on theMasPar MP-1 parallel machine, on which floating point multiplication takesabout three times as long as floating point addition at 64-bit precision. Theyalso implemented Strassen’s method on the MP-1 (using Winograd’s methodat the bottom level of recursion) and obtained significant speedups over conventional multiplication for dimensions as small as 256.As noted in 22.1, Strassen [962, 1969] gave not only a method for multiplying n x n matrices inoperations, but also a method for invertingan n x n matrix with the same asymptotic cost.
The method is described inProblem 22.8. For more on Strassen’s inversion method see $24.3.2, Baileyand Ferguson [41, 1988], and Bane, Hansen, and Higham [51, 1993].Problems22.1. (Knight [664, 1995]) Suppose we have a method for multiplying n x nmatrices inoperations, where 2 < α < 3. Show that if A is m x n andoperations,B is n x p then the product AB can be formed inwhere nl = min(m, n, p) and n2 and n3 are the other two dimensions.22.2.
Work out the operation count for Winograd’s method applied to n x nmatrices.22.3. Let S n (n0 ) denote the operation count (additions plus multiplications)for Strassen’s method applied to n x n matrices, with recursion down to thelevel of no x no matrices. Assume that n and no are powers of 2. For large n,estimate Sn (8)/ Sn(n) and Sn(1)/S n (8) and explain the significance of theseratios (use (22.5)).22.4.
(Knight [664, 1995]) Suppose that Strassen’s method is used to multiply462FAST M ATRIX M ULTIPLICATIONan m x n matrix by an n x p matrix, where m = a2 j, n = b2 j, p = c2 j, andthat conventional multiplication is used once any dimension is 2 r or less. Showthat the operation count is α7 j + β4 j, whereShow that by setting r = 0 and a = 1 a special case of the result of Problem 22.1 is obtained.22.5. Compare and contrast Winograd’s inner product formula for n = 2with the imaginary part of the 3M formula (22.8).22.6. Prove the error bound described at the end oftion of the 3M method and Strassen’s method.22.2.4 for the combina-22.7. Two fast ways to multiply complex matrices are (a) to apply themethod to the original matrices and to use Strassen’s method to formthree real matrix products, and (b) to use Strassen’s method with themethod applied at the bottom level of recursion.
Investigate the merits oftwo approaches with respect to speed and storage.3Mthe3Mthe22.8. Strassen [962, 1969] gives a method for inverting an n x n matrix inoperations. Assume that n is even and writeThe inversion method is based on the following formulae:The matrix multiplications are done by Strassen’s method and the inversionsdetermining P1 and P6 are done by recursive invocations of the method itself.(a) Verify these formulae, using a block LU factorization of A, and show thatthey permit the claimedcomplexity.
(b) Show that if A is uppertriangular, Strassen’s method is equivalent to (the unstable) Method 2B of13.2.2.(For a numerical investigation into the stability of Strassen’s inversionmethod, see 24.3.2.)PROBLEMS46322.9. Find the inverse of the block upper triangular matrixDeduce that matrix multiplication can be reduced to matrix inversion.22.10. (RESEARCH PROBLEM) Carry out extensive numerical experiments totest the accuracy of Strassen’s method and Winograd’s variant (cf. the resultsat the end of 22.2.2).Previous Home NextChapter 23The Fast Fourier Transform andApplicationsOnce the [FFT] method was establishedit became clear that it had a long and interesting prehistorygoing back as far as Gauss.But until the advent of computing machinesit was a solution looking for a problem.Fourier Analysis (1988)Life as we know it would be very different without the FFT.— CHARLES F.