Shampine, Allen, Pruess - Fundamentals of Numerical Computing (523185), страница 6
Текст из файла (страница 6)
A square root is computed with a small relativeerror and the same is true of the subtraction that follows. Consequently, the biggerroot xl -b is computed accurately by the quadratic formula. In the computationof the smaller root, there is cancellation when the square root term is subtracted from-b/2. The subtraction itself is done exactly, but the error already present in ( b/2)2cbecomes important in a relative sense. In the extreme case the formula results in zeroas an approximation to x2.For this particular task a reformulation of the problem avoids the difficulty. Theexpression( x-x1 )( x-x2 )==x2 - (xl + x2 ) + x1x2x2+ bx + cshows that x1x2 = c. As we have seen, the bigger root xl can be computed accuratelyusing the quadratic formula and thenx 2 = c/x lnprovides an accurate value for x2.As we observed earlier, it is important to knowExample 1.16.
Alternating series.when enough terms have been taken from a series to approximate the limit to a desiredaccuracy. Alternating series are attractive in this regard. Suppose a0 > al > ··· > an>a n + 1 > ··· > 0 and= 0. Then the alternating seriesconverges to a limit S and the error of the partial sumsatisfies| S - S n | < a n +1·To see a specific example, consider the evaluation of sinx by its Maclaurin series1.2 EXAMPLES OF FLOATING POINT CALCULATIONS19Although this series converges quickly for any given x, there are numerical difficultieswhen |x| is large. If, say, x = 10, the am areClearly there are some fairly large terms here that must cancel out to yield a resultsin 10 that has magnitude at most 1.
The terms am are the result of some computationthat here can be obtained with small relative error. However, if am is large comparedto the sum S, a small relative error in am will not be small compared to S and S willnot be computed accurately.We programmed the evaluation of this series in a straightforward way, being careful to compute, say,so as to avoid unnecessarily large quantities. Using single precision standard IEEEarithmetic we added terms until the partial sums stopped changing. This produced thevalue -0.544040 while the exact value should be sinx = -0.544021. Although theseries converges quickly for all x, some intermediate terms become large when |x| islarge.
Indeed, we got an overflow due to the small exponent range in IEEE singleprecision arithmetic when we tried x = 100. Clearly floating point arithmetic does notfree us from all concerns about scaling.Series are often used as a way of evaluating functions. If the desired functionvalue is small and if some terms in the series are comparatively large, then there mustbe cancellation and we must expect that inaccuracies in the computation of the termswill cause the function value to be inaccurate in a relative sense.nWe have seen examples showing that the sum of several numbers depends on theorder in which they are added. Is there a “good” order? We now derive a rule of thumbthat can be quite useful.
We can form a1 + a2 + ··· + aN by the algorithm used inExample 1.13. The first computed partial sum iswhere |δ2| < u. It is a little special. The general case is represented by the nextcomputed partial sum, which iswhere |δ3| < u. To gain insight, we approximate this expression by dropping termsinvolving the products of small factors so that20CHAPTER 1ERRORS AND FLOATING POINT ARITHMETICContinuing in this manner we find thatAccording to this approximation, the error made when ak is added to Sk might grow,but its effect in S, will be no bigger than (N - k + l) u|ak|. This suggests that toreduce the total error, the terms should be added in order of increasing magnitude.
Acareful bound on the error of repeated summation leads to the same rule of thumb.Adding in order of increasing magnitude is usually a good order, but not necessarilya good order (because of the complex ways that the individual errors can interact).Much mathematical software makes use of this device to enhance the accuracy of thecomputations.The approximate error can be bounded byHere we use the symbolto mean “less than or equal to a quantity that is approximately.” (The “less than” is not sharp here.) Further manipulation provides an approximate bound on the magnitude of the sum’s relative errorThe dangerous situation is whenwhich is when cancellationtakes place.
An important consequence is that if all the terms have the same sign, thesum will be computed accurately in a relative sense, provided only that the number ofterms is not too large for the precision available.For a convergent seriesRather than sum in the natural order m =it is necessary that0, 1,..., it would often be better to work out mathematically how many terms N areneeded to approximate S to the desired accuracy and then calculate SN in the reverseorder a N , a N-1 ,....Example 1.17.
There are two ways of interpreting errors that are important in numerical analysis. So far we have been considering a forward error analysis. Thiscorresponds to bounding the errors in the answer by bounding at each stage the errorsthat might arise and their effects. To be specific, recall the expression for the error ofsumming three numbers:1.2 EXAMPLES OF FLOATING POINT CALCULATIONS21A forward error analysis might bound the absolute error by(This is a sharp version of the approximate bound given earlier.) A backward erroranalysis views the computed result as the result computed in exact arithmetic of aproblem with somewhat different data.
Let us reinterpret the expression for fl( S 3 ) inthis light. It is seen thatf l( S 3) = y l + y 2 + y 3 ,whereIn the backward error analysis view, the computed sum is the exact sum of terms ykthat are each close in a relative sense to the given data xk. An algorithm that is stablein the sense of backward error analysis provides the exact solution of a problem withdata close to that of the original problem. As to whether the two solutions are close,that is a matter of the conditioning of the problem. A virtue of this way of viewingerrors is that it separates the roles of the stability of the algorithm and the conditionof the problem. Backward error analysis is particularly attractive when the input dataare of limited accuracy, as, for example, when the data are measured or computed. Itmay well happen that a stable algorithm provides the exact solution to a problem withdata that cannot be distinguished from the given data because of their limited accuracy.We really cannot ask more of the numerical scheme in such a situation, but again wemust emphasize that how close the solution is to that corresponding to the given datadepends on the conditioning of the problem.
We shall return to this matter in the nextchapter.A numerical example will help make the point. For xl = 0.12 × 102, x2 = 0.34 ×110 , x3 = -0.15 × 102, the true value of the sum is S3 = 0.40 × 100. When evaluatedin two digit decimal chopped arithmetic, fl (S3) = 0.00 × 100, a very inaccurate result.Nevertheless, with y1 = 0.116 × 102, y2 = x2, and y3 = x3, we have fl(S3) = y1+ y2 +y3. The computed result is the exact sum of numbers close to the original data. Indeed,two of the numbers are the same as the original data and the remaining one differs bynless than a unit of roundoff.For most of the numerical tasks in this book it is not necessary to worry greatlyabout the effects of finite precision arithmetic.
Two exceptions are the subject of theremaining examples. The first is the computation of a root of a continuous functionf(x). Naturally we would like to compute a number z for which f(z) is as close tozero as possible in the precision available. Routines for this purpose ask the user tospecify a desired accuracy. Even if the user does not request a very accurate root, theroutine may “accidentally” produce a number z for which f(z) is very small. Becauseit is usually not expensive to solve this kind of problem, it is quite reasonable for a user22CHAPTER 1ERRORS AND FLOATING POINT ARITHMETICto ask for all the accuracy possible.
One way or the other, we must ask what happensIfwhen x is very close to a root. An underflow is possible sincethis does not happen, it is usually found that the value of f(z) fluctuates erratically asBecause of this we must devise algorithms that will behave sensibly when thecomputed value f(x) does not have even the correct sign for x near z. An example willshow how the details of evaluation of f(x) are important when x is near a root.Example 1.18.
Let f(x) = x2 - 2x + 1 be evaluated at x = 1.018 with three-digitchopped arithmetic and - 100 < e < 100. The exact answer is f (1.018) = 0.324 ×10-3. Because the coefficients of f are small integers, no error arises when they arerepresented as floating point numbers.