Higham - Accuracy and Stability of Numerical Algorithms (523152), страница 9
Текст из файла (страница 9)
In fact, for any x, the calculator computes, in place of f(x) = x, thefunction18PRINCIPLES OF FINITE PRECISION COMPUTATIONThe calculator is producing a completely inaccurate approximation to f(x) injust 120 operations on nonnegative numbers. How can this happen?The positive numbers x representable on the HP 48G satisfy 10-499 < x <9.999. . .
× 10499 . If we define r(x) =then, for any machine numberx > 1,which rounds to 1, since the HP 48G works to about 12 decimal digits. Thusfor x > 1, the repeated square roots reduce x to 1.0, which the squarings leaveunchanged.For 0 < x < 1 we haveon a 12-digit calculator, so we would expect the square root to satisfyThis upper bound rounds to the 12 significant digit number 0.99.
. .9. Henceafter the 60 square roots we have on the calculator a number x < 0.99. . .9.The 60 squarings are represented by s(x) =andBecause it is smaller than the smallest positive representable number, thisresult is set to zero on the calculator--a process known as underflow. (Theconverse situation, in which a result exceeds the largest representable number,is called overflow.)The conclusion is that there is nothing wrong with the calculator. Thisinnocuous-looking calculation simply exhausts the precision and range of amachine with 12 digits of precision and a 3-digit exponent.1.12.3.
An Infinite SumIt is well known that= 1.6449 3406 6848. . . . Suppose wewere not aware of this identity and wished to approximate the sum numerically. The most obvious strategy is to evaluate the sum for increasing k until1.13 INCREASING THE PRECISION19the computed sum does not change. In Fortran 90 in single precision thisyields the value 1.6447 2532, which is first attained at k = 4096. This agreeswith the exact infinite sum to just four significant digits out of a possible nine.The explanation for the poor accuracy is that we are summing the numbersfrom largest to smallest, and the small numbers are unable to contribute tothe sum.
For k = 4096 we are forming s + 4096-2 = s + 2-24, where s 1.6.Single precision corresponds to a 24-bit mantissa, so the term we are addingto s “drops off the end” of the computer word, as do all successive terms.The simplest cure for this inaccuracy is to sum in the opposite order: fromsmallest to largest. Unfortunately, this requires knowledge of how many termsto take before the summation begins. With 109 terms we obtain the computedsum 1.6449 3406, which is correct to eight significant digits.For much more on summation, see Chapter 4.1.13. Increasing the PrecisionWhen the only source of errors is rounding, a common technique for estimatingthe accuracy of an answer is to recompute it at a higher precision and to seehow many digits of the original and the (presumably) more accurate answeragree.
We would intuitively expect any desired accuracy to be achievable bycomputing at a high enough precision. This is certainly the case for algorithmspossessing an error bound proportional to the precision, which includes all thealgorithms described in the subsequent chapters of this book. However, sincean error bound is not necessarily attained, there is no guarantee that a resultcomputed in t digit precision will be more accurate than one computed ins digit precision, for a given t > s; in particular, for a very ill conditionedproblem both results could have no correct digits.For illustration, consider the system Ax = b, where A is the inverse of the5×5 Hilbert matrix and b i = (-l)i i. (For details of the matrices used inthis experiment see Chapter 26.) We solved the system in varying precisionswith unit roundoffs u = 2 -t , t = 15:40, corresponding to about 4 to 12decimal places of accuracy.
(This was accomplished in MATLAB by usingthe function chop from the Test Matrix Toolbox to round the result of everyarithmetic operation to t bits; see Appendix E.) The algorithm used wasGaussian elimination (without pivoting), which is perfectly stable for thissymmetric positive definite matrix. The upper plot of Figure 1.3 shows tagainst the relative errorsand the relative residuals ||b The lower plot of Figure 1.3 gives corresponding resultsfor A = P5 + 5I, where P5 is the Pascal matrix of order 5.
The conditionnumbers(A) are 1.62×102 for the inverse Hilbert matrix and 9.55×105 forthe shifted Pascal matrix. In both cases the general trend is that increasingthe precision decreases the residual and relative error, but the behaviour is20P RINCIPLESOFFINITE P RECISION C OMPUTATIONFigure 1.3. Forward errorsand relative residuals ||b versus precision t = - log2 u on the x axis.not monotonic. The reason for the pronounced oscillating behaviour of therelative error (but not the residual) for the inverse Hilbert matrix is not clear.An example in which increasing the precision by several bits does notimprove the accuracy is the evaluation ofy = x + a sin(bx),a = 10-8,b = 224.(1.8)Figure 1.4 plots t versus the absolute error, for precisions u = 2-t, t = 10:40.Since a sin(bx)-8.55×10-9, for t less than about 20 the error is dominatedby the error in representing x = l/7.
For 22 < t < 31 the accuracy is (exactly)constant! The plateau over the range 22 < t < 31 is caused by a fortuitousrounding error in the addition: in the binary representation of the exactanswer the 23rd to 32nd digits are 1s, and in the range of t of interest thefinal rounding produces a number with a 1 in the 22nd bit and zeros beyond,yielding an unexpectedly small error that affects only bits 33rd onwards.A more contrived example in which increasing the precision has no beneficial effect on the accuracy is the following evaluation of z = f(x):y = abs(3(x-0.5)-0.5)/25if y = 0z = l1.14 C ANCELLATIONOFR OUNDING E RRORS21Figure 1.4.
Absolute error versus precision, t = -log2 u.elsez = ey % Store to inhibit extended precision evaluation.z = (z - 1)/yendIn exact arithmetic, z = f(2/3) = 1, but in Fortran 90 on a Sun SPARCstation and on a 486DX workstation,= fl(f(2/3)) = 0.0 in both single anddouble precision arithmetic. A further example is provided by the “innocuous calculation” of §1.12.2, in which a step function is computed in place off(x) = x for a wide range of precisions.It is worth stressing that how precision is increased can greatly affectthe results obtained. Increasing the precision without preserving importantproperties such as monotonicity of rounding can vitiate an otherwise reliablealgorithm. Increasing the precision without maintaining a correct relationshipamong the precisions in different parts of an algorithm can also be harmfulto the accuracy.1.14.
Cancellation of Rounding ErrorsIt is not unusual for rounding errors to cancel in stable algorithms, with theresult that the final computed answer is much more accurate than the inter-P RINCIPLES22OFFINITE P RECISION C OMPUTATIONmediate quantities. This phenomenon is not universally appreciated, perhapsbecause we tend to look at the intermediate numbers in an algorithm onlywhen something is wrong, not when the computed answer is sat isfactory. Wedescribe two examples.
The first is a very short and rather unusual computation, while the second involves a well-known algorithm for computing astandard matrix decomposition.1.14.1. Computing (ex-1)/xConsider the function f(x) = (ex - 1)/x =which arises invarious applications. The obvious way to evaluate f is via the algorithm% Algorithm 1.if x = 0f = lelsef = (ex - 1)/xendThis algorithm suffers severe cancellation for |x | << 1, causing it to produce aninaccurate answer (0 instead of 1, if x is small enough) . Here is an alternative:% Algorithm 2.y = exif y = lf = lelsef = (y - 1)/logyendAt first sight this algorithm seems perverse, since it evaluates both exp andlog instead of just exp. Some results computed in MATLAB are shown inTable 1.2.
All the results for Algorithm 2 are correct in all the significantfigures shown, except for x = 10-15, when the last digit should be 1. On theother hand, Algorithm 1 returns answers that become less and less accurateas x decreases.To gain insight we look at the numbers in a particular computation withx = 9 × 10-8 and u = 2-246 × 10-8, for which the correct answer is1.00000005 to the significant digits shown. For Algorithm 1 we obtain acompletely inaccurate result, as expected:1.14 C ANCELLATIONOF23R OUNDING E RRORSTable 1.2. Computed values of (ex - 1)/x from Algorithms 1 and 2.xAlgorithm 1-5101.00000500000696510 -61.00000049996218410 -71.0000000494336809.999999939225290 × 10-110 -81.00000008274037110 - 910 - 1 0 1.0000000827403711 0 - 1 1 1.0000000827403711 0 - 1 2 1.00008890058234110 - 1 3 9.992007221626408 × 10-11 0 - 1 4 9.992007221626408 × 10-11 0 - 1 5 1.11022302462515610-16 0Algorithm 21.0000050000166671.0000005000001671.0000000500000021.0000000050000001.0000000005000001.0000000000500001.0000000000050001.0000000000005001.0000000000000501.0000000000000051.000000000000000Algorithm 2 produces a result correct in all but the last digit:Here are the quantities that would be obtained by Algorithm 2 in exact arithmetic (correct to the significant digits shown):We see that Algorithm 2 obtains very inaccurate values of ex - 1 and log e x,but the ratio of the two quantities it computes is very accurate.
Conclusion:errors cancel in the division in Algorithm 2.A short error analysis explains this striking cancellation of errors. Weassume that the exp and log functions are both computed with a relative errornot exceeding the unit roundoff u. The algorithm first computes= e x(1+δ),x|δ| < u. If= 1 then e (l+δ) = 1, sox = -log(1+δ) = -δ+δ2 /2-δ3/3+. . . ,|δ| < u,which implies that the correctly rounded value of f(x) = 1+x/2+x2/6+ . .is 1, and so f has been evaluated correctly, to the working precision, Ifthen7, using (1.1),.1(1-9)7The analysis from this point on assumes the use of a guard digit in subtraction (see§2.4); without a guard digit Algorithm 2 is not highly accurate.P RINCIPLES24where< u, i = 1:3.