Higham - Accuracy and Stability of Numerical Algorithms (523152), страница 23
Текст из файла (страница 23)
Furthermore, in a sum of positive terms that vary widely in magnitudethe decreasing ordering may not allow the smaller terms to contribute to thesum (which is why the harmonic sum“converges” in floating pointarithmetic as n). However, consider the example with n = 4 andx = [ l , M,2M,-3M],(4.5)where M is a floating point number so large that fl (1 + M) = M (thus M >u -1). The three orderings considered so far produce the following results:Increasing:= fl(1 + M + 2M - 3M) = 0,Psum:= fl(1 + M - 3M + 2M) = 0,Decreasing:= fl(-3M + 2M + M + 1) = 1.Thus the decreasing ordering sustains no rounding errors and produces theexact answer, while both the increasing and Psum orderings yield computedsums with relative error 1. The reason why the decreasing ordering performsso well in this example is that it adds the “1” after the inevitable heavycancellation has taken place, rather than before, and so retains the importantinformation in this term.
If we evaluate the termin the errorbound (4.3) for example (4.5) we findIncreasing: µ = 4M,Psum: µ = 3M, Decreasing: µ = M + 1,so (4.3) “predicts” that the decreasing ordering will produce the most accurateanswer, but the bound it provides is extremely pessimistic since there are norounding errors in this instance.Extrapolating from this example, we conclude that the decreasing ordering is likely to yield greater accuracy than the increasing or Psum orderings whenever there is heavy cancellation in the sum, that is, wheneverTurning to the insertion method, a good explanation of the insertion strategy is that it attempts to minimize, one at a time, the termsin the error bound (4.3). Indeed, if the xi are all nonnegative the insertionmethod minimizes this bound over all instances of Algorithm 4.1.Finally, we note that a stronger form of the bound (4.4) holds for pairwisesummation.
It can be deduced from (4.3) or derived directly, as follows.Assume for simplicity that n = 2 r . Unlike in recursive summation each addendtakes part in the same number of additions, log2 n. Therefore we have arelation of the form92SUMMATIONFigure 4.1. Recovering the rounding error.which leads to the bound(4.6)Since it is proportional to log2 n rat her than n, this is a smaller bound than(4.4), which is the best bound of this form that holds in general for Algorithm 4.1.4.3. Compensated SummationWe have left to last the compensated summation method, which is recursivesummation with a correction term cleverly designed to diminish the roundingerrors.
Compensated summation is worth considering whenever an accuratesum is required and computations are already taking place at the highestprecision supported by the hardware or the programming language in use.In 1951 Gill [449, 1951] noticed that the rounding error in the sum of twonumbers could be estimated by subtracting one of the numbers from the sum,and he made use of this estimate in a Runge-Kutta code in a program libraryfor the EDSAC computer. Gill’s estimate is valid for fixed point arithmeticonly.
Kahan [625, 196 5] and Møller [777, 196 5] both extended the idea tofloating point arithmetic. Møller shows how to estimate a + b - fl (a + b) inchopped arithmetic, while Kahan uses a slightly simpler estimate to derivethe compensated summation method for computingThe estimate used by Kahan is perhaps best explained with the aid of adiagram. Let a and b be floating point numbers with |a| > |b|, let= fl(a+b),and consider Figure 4.1, which uses boxes to represent the mantissas of a andb.
The figure suggests that if we evaluate4.3 C OMPENSATED S UMMATION93in floating point arithmetic, in the order indicated by the parentheses, thenthe computed ê will be a good estimate of the error (a + b) In fact, forrounded floating point arithmetic in base 2, we havea+ b =+ê,(4.7)that is, the computed ê represents the error exactly. This result (which doesnot hold for all bases) is proved by Dekker [275, 1971, Thm. 4.7], Knuth [668,1981, Thm. C, p. 221], and Linnainmaa [703, 1974, Thm. 3]. Note that thereis no point in computing fl( + ê), sinceis already the best floating pointrepresentation of a + b !Kahan’s compensated summation method employs the correction e onevery step of a recursive summation.
After each partial sum is formed, thecorrection is computed and immediately added to the next term xi before thatterm is added to the partial sum. Thus the idea is to capture the roundingerrors and feed them back into the summation. The method may be writtenas follows.Algorithm 4.2 (compensated summation). Given floating point numbersx1, . . . , xn this algorithm forms the sumby compensated summation.s=0;e= 0for i = 1:ntemp = sy=xi+es=temp+ye = (temp - s) + y % Evaluate in the order shown.endThe compensated summation method has two weaknesses: ê is not necessarily the exact correction, since (4.7) is based on the assumption that | a | > |b|,and the addition y = xi + e is not performed exactly.
Nevertheless, the useof the corrections brings a benefit in the form of an improved error bound.Knuth [668, 1981, Ex. 19, pp. 229, 572-573] shows that the computed sumsatisfies(4.8)which is an almost ideal backward error result (a more detailed version ofKnuth’s proof is given by Goldberg [457, 1991]).In [627, 1972] and [628, 1973] Kahan describes a variation of compensatedsummation in which the final sum is also corrected (thus “s = s + e” isappended to the algorithm above). Kahan states in [627, 1972] and proves in94SUMMATION[628, 1973 ] that (4.8) holds with the stronger bound |µ i | < 2u+O( ( n-i+1 ) u 2) The proofs of (4.8) given by Knuth and Kahan are similar; they use themodel (2.4) with a subtle induct ion and some intricate algebraic manipulation.The forward error bound corresponding to (4.8) is(4.9)As long as nu < 1, the constant in this bound is independent of n, andso the bound is a significant improvement over the bounds (4.4) for recursive summation and (4.6) for pairwise summation.
Note, however, that ifcompensated summation is not guaranteed to yield asmall relative error.Another version of compensated summation has been investigated by several authors: Jankowski, Smoktunowicz, and Wozniakowski [609, 1983], Jankowski and Wozniakowski [611, 1985], Kielbasinski [655, 1973], Neumaier [788,1974], and Nickel [797, 1970]. Here, instead of immediately feeding each correction back into the summation, the correct ions are accumulated separatelyby recursive summation and then the global correction is added to the computed sum. For this version of compensated summation Kielbasinski [655,1973] and Neumaier [788, 1974] show that(4.10)provided nu < 0.1; this is weaker than (4.8) in that the second-order termhas an extra factor n. If n 2 u < 0.1 then in (4.10), |µ i | < 2.1u.
Jankowski,Smoktunowicz, and Wozniakowski [609, 198 3] show that, by using a divideand conquer implementation of compensated summation, the range of n forwhich |µ i | < cu holds in (4.10) can be extended, at the cost of a slight increasein the size of the constant c.Neither the correction formula (4.7) nor the result (4.8) for compensatedsummation holds under the no-guard-digit model of floating point arithmetic.Indeed, Kahan [634, 1990] constructs an example where compensated summation fails to achieve (4.9) on certain Cray machines, but he states that suchfailure is extremely rare. In [627, 1972] and [628, 1973] Kahan gives a modification of the compensated summation algorithm in which the assignment“e = (temp - s) + y” is replaced byf = 0if sign(temp) = sign(y), f = (0.46 * s - s) + s, ende = ( ( temp - f ) - ( s - f )) + y4.3 C OMPENSATED S UMMATION95Kahan shows in [628, 1973] that the modified algorithm achieves (4.8) “onall North American machines with floating hardware” and explains that “Themysterious constant 0.46, which could perhaps be any number between 0.25and 0.50, and the fact that the proof requires a considerat ion of known machines designs, indicate that this algorithm is not an advance in computerscience.”Viten’ko [1056, 1968] shows that under the no-guard-digit model (2.6) thesummation met hod with the optimal error bound (in a certain sense definedin [1056, 1968]) is pairwise summation.
This does not contradict Kahan’sresult because Kahan uses properties of the floating point arithmetic beyondthose in the no-guard-digit model.A good illustration of the benefits of compensated summation is providedby Euler’s method for the ordinary differential equation initial value problemy' = f(x,y), y(a) given, which generates an approximate solution accordingto y k+l = yk + hfk , y0 = y(a). We solved the equation y' = -y with y(0) = 1over [0,1] using n steps of Euler’s method (nh = 1), with n ranging from 10to 108. With compensated summation we replace the statements x = x + h,y = y + h * f(x,y) by (with the initialization cx = 0, cy = 0)dx = h + cxnew-x = x + dxcx = (x - new-x) + dxx = new_xdy = h * f(x,y) + cynew-y = y + dycy = (y - new-y) + dyy = new-yFigure 4.2 shows the errors en = |y (1) whereis the computedapproximation to y(1).
The computations were done in Fortran 90 in singleprecision arithmetic on a Sun SPARCstation (u6 × 10-8). Since Euler’smethod has global error of order h, the error curve on the plot should beapproximately a straight line. For the standard. implement at ion of Euler’smethod the errors en start to increase steadily beyond n = 20,000 becauseof the influence of rounding errors.