Shampine, Allen, Pruess - Fundamentals of Numerical Computing (523185), страница 7
Текст из файла (страница 7)
However, x is not a floating point number inthis arithmetic and there is an error when= fl(x) = 0.101 × 101 is formed. Severalalgorithms are possible that arise in different ways of writing f(x):f(x)= [(x 2 ) - (2x)] + 1= x ( x -2) + 1= (x - l) 2 .These forms work out toAll of the results have large relative errors. This should not be too surprising since theproblem is poorly conditioned (see Exercise 1.5).Figure 1.2 is a plot of 281 values of the function f(x) = ( x exp( x ) - 1)3 for arguments near x = 0.567.
Single precision IEEE arithmetic was used for this calculationand the cubed term in the function was expanded out to generate more roundoff. Inexact arithmetic f(x) vanishes at only one point α near 0.567, a point that satisfiesα = exp( - α). However, it is clear from the figure that the floating point version is notnearly so well behaved near this ct.nIn Chapter 2 we discuss the numerical solution of a system of linear equations. Incontrast to the solution of nonlinear equations, codes based on the method there try tocompute an answer as accurately as possible in the precision available. A difficultywith precision arises when we try to assess the accuracy of the result.Example 1.19.
Residual calculation.The simplest system of linear equations isax = b.The quality of an approximate solution z can be measured by how well it satisfies theequation. The discrepancy is called its residual:r = b - az.1.2 EXAMPLES OF FLOATING POINT CALCULATIONS23Figure 1.2 Floating point evaluation of f(x) = x3e3x - 3x2 e2x + 3xex - 1.If z is a very good solution, its residual r is small and there is cancellation whenformingDefining δ bythe computed residualThe computed residual differs from the true residual by a quantity that can be as largeas |az|u|b|u. When r is small because z is a good solution and |b| happens to be enlarge, the computed residual may have few, if any, correct digits (although the relativeresidual |r/b| is fine).
When z is a good solution, it is generally necessary to use doubleprecision to obtain its residual to single precision accuracy.nEXERCISES1.8 Suppose that z = 0.180 × 102 is an approximate solution of ax = b for a = 0.111 × 100, b = 0.200 × 101.Use three-digit decimal chopped arithmetic to compute the residual r = b - az. Compute the residual indouble precision and in exact arithmetic. Discuss theresults.use four-digit decimal chopped arithmetic, then fourdigit decimal rounded arithmetic. How reasonable arethe answers? Find another formula for the midpointand use four-digit decimal (rounded or chopped) arithmetic to calculate the midpoint of [0.8717,0.8719].
Isyour formula better or worse?1.9 For α = 0.8717 and β = 0.8719 calculate the midpoint 1.10 In the model arithmetic, a single operation is carof the interval [α, β] using the formula (α+ β)/2. Firstried out with a small relative error. Unfortunately24CHAPTER 1ERRORS AND FLOATING POINT ARITHMETICthe same is not true of complex arithmetic.
To seeExplain why f(q) cannot be evaluated accurately inthis, let z = a + ib and w = c + id. By definition,finite precision arithmetic when q is small. In the exzw = (ac - bd) + i(ad + bc). Show how the real part,planation you should assume that y-3/2 can be evalac - bd, of the product zw might be computed with auated with a relative error that is bounded by a smalllarge relative error even though all individual calculamultiple of the unit roundoff. Use the binomial seriestions are done with a small relative error.to show1.11 An approximation S to ex can be computed by usingthe Taylor series for the exponential function:S := 1Why is this series a better way to evaluate f(q) whenP := 1q is small?for k = 1,2,...begin1.13 Let a regular polygon of N sides be inscribed in a unitP := xP/kcircle. If LN denotes the length of one side, the cirS := S+Pcumference of the polygon, N × LN, approximates theend k.circumference of the circle, 2π; hence π: NLN/2 forThe loop can be stopped when S = S + P to machinelarge N.
Using Pythagoras’ theorem it is easy to relateprecision.L2N to LN:(a) Try this algorithm with x = - 10 using single precision arithmetic. What was k when you stopped?What is the relative error in the resulting approximation? Does this appear to be a good way to computeStarting with L4 =for a square, approximate πe -10 to full machine precision?by means of this recurrence. Explain why a straightforward implementation of the recurrence in floating(b) Repeat (a) with x = + 10.point arithmetic does not yield an accurate value for(c) Why are the results so much more reasonable forπ.
(Keep in mind thatShow that(b)?the recurrence can be rearranged as(d) What would be a computationally safe way tocompute e-10 ?1.12 Many problems in astrodynamics can be approximated by the motion of one body about another underand demonstrate that this form works better.the influence of gravity, for example, the motion of asatellite about the earth. This is a useful approxima- 1.14 A study of the viscous decay of a line vortex leads totion because by a combination of analytical and nuan expression for the velocitymerical techniques, these two body problems can besolved easily.
When a better approximation is desired,for example, we need to account for the effect of themoon or sun on the satellite, it is natural to compute itat a distance r from the origin at time t > 0. Hereas a correction to the orbit of the two body problem.is the initial circulation and v > 0 is the kinematic visThis is the basis of Encke’s method; for details seecosity.
For some purposes the behavior of the velocitySection 9.3 of [2]. A fundamental issue is to calculateat distances r <<is of particular interest. Whyaccurately the small correction to the orbit. This isis the form given for the velocity numerically unsatreduced to the accurate calculation of a function f(q)isfactory for such distances? Assuming that you havefor small q > 0. The function isavailable a function for the accurate computation ofsinh( x ), manipulate the expression into one that canbe evaluated in a more accurate way for very small r.1.3 CASESTUDY 1251.3 CASE STUDY 1Now let us look at a couple of examples that illustrate points made in this chapter. Thefirst considers the evaluation of a special function.
The second illustrates the fact thatpractical computation often requires tools from several chapters of this book. Filon’smethod for approximating finite Fourier integrals will be developed in Chapter 3 andapplied in Chapter 5. An aspect of the method that we take up here is the accuratecomputation of coefficients for the method.The representation of the hyperbolic cosine function in terms of exponentialsx = cosh(y) =exp( y) + exp(-y)2makes it easy to verify that for x > 1,Let us consider the evaluation of this expression for cosh-1 (x) in floating point arithmetic when x >> 1.
An approximation made earlier in this chapter will help us tounderstand better what it is that we want to compute. After approximatingwe find thatThe first difficulty we encounter in the evaluation is that when x is very large, x2 overflows. This overflow is unnecessary because the argument we are trying to compute ison scale. If x is large, but not so large that x2 overflows, the effect of the 1 “falls off theend” in the subtraction, meaning that fl(x2 - 1) = fl(x2). This subtraction is carriedout with a small relative error, and the same is true of less extreme cases, but there isa loss of information when numbers are of greatly different size. The square root isobtained with a small relative error.
The information lost in the subtraction is neededat the next step because there is severe cancellation. Indeed, for large x, we might endup computing x - x = 0 as the argument for ln(x), which would be disastrous.How might we reformulate the task to avoid the difficulties just noted? A littlecalculation shows thata form that avoids cancellation. The preliminary analysis we did to gain insight suggests a better way of handling the rest of the argument:26CHAPTER 1ERRORS AND FLOATING POINT ARITHMETICNotice that here we form (l/x)2 instead of l/x2.
This rearrangement exchanges apossible overflow when forming x2 for a harmless underflow, harmless, that is, if thesystem sets an underflow to zero and continues on. We see now that the expressionavoids all the difficulties of the original expression for cosh-1 (x). Indeed, it is clearthat for large x, evaluation of this expression in floating point arithmetic will lead to anapproximation of ln(2x), as it should.For our second example we consider Filon’s method for approximating finiteFourier integrals, which is developed in Chapter 3:Here θ = ωh andThe details of the terms Ce and C0 do not concern us here.
There is a similar formula for integrals with the sine function in place of the cosine function that involvesthe same coefficients α, β, γ. It is shown in Case Study 3 that the absolute error of3this approximation is bounded by a constant times h . To get an accurate integral, itmight be necessary to use a small h, meaning that θ is small, but the expressions forthe coefficients are unsatisfactory in this case. Each suffers from cancellation in thenumerator, and the resulting error is amplified by the division by the small quantity θ3.To see the cancellation more clearly, let us approximate the sine and cosine terms in,say, a by the leading terms in their Taylor series, sin(θ)θ and cos(θ) 1, to getObviously there is perfect cancellation of leading terms in the numerator.