Higham - Accuracy and Stability of Numerical Algorithms (523152), страница 17
Текст из файла (страница 17)
It also contains a good chapter on rounding error analysis and an interesting collection of exercises. R. W. Hamminghas said of this book, “Nobody should ever have to know that much aboutfloating-point arithmetic. But I’m afraid sometimes you might” [833, 1988].Although Sterbenz’s book is now dated in some respects, it remains a usefulreference.A third important reference on floating point arithmetic is Knuth’s Seminumerical Algorithms [668, 1981, §4.2], from his Art of Computer Programmingseries. Knuth’s lucid presentation includes historical comments and challenging exercises (with solutions).The first analysis of floating point arithmetic was given by Samelson andBauer [890, 1953]. Later in the same decade Carr [187, 1959] gave a detaileddiscussion of error bounds for the basic arithmetic operations.An up-to-date and very readable reference on floating point arithmeticis the survey paper by Goldberg [457, 1991], which includes a detailed discussion of IEEE arithmetic.
A less mathematical, more hardware-orienteddiscussion can be found in the appendix “Computer Arithmetic” written byGoldberg that appears in the book on computer architecture by Hennessy andPatterson [515, 1990].A fascinating historical perspective on the development of computer floating point arithmetics, including background to the development of the IEEEstandard, can be found in the textbook by Patterson and Hennessy [822,1994, §4.11].The idea of representing floating point numbers in the form (2.1) is found,for example, in the work of Forsythe [393, 196 9], Matula [740, 1970], andDekker [275, 1971].An alternative definition of fl(x) is the nearest yG satisfying |y| <|x|.
This operation is called chopping, and does not satisfy our definition ofrounding. Chopped arithmetic is used in the IBM/370 floating point system.The difference between chopping and rounding (to nearest) is highlightedby a discrepancy in the index of the Vancouver Stock Exchange in the early1980s [852, 1983]. The exchange established an index in January 1982, withthe initial value of 1000.
By November 1983 the index had been hitting lowsin the 520s, despite the exchange apparently performing well. The index wasrecorded to three decimal places and it was discovered that the computerprogram calculating the index was chopping instead of rounding to producethe final value. Since the index was recalculated thousands of times a day, eachtime with a nonpositive final error, the bias introduced by chopping became58F LOATING P OINT A RITHMETICsignificant.
Upon recalculation with rounding the index almost doubled!When there is a tie in rounding, two possible strategies are to round tothe number with an even last digit and to round to the one with an odd lastdigit. Both are stable forms of rounding in the sense thatf l( ( ( ( x + y) - y) + y) - y) = fl( ( x + y) - y),as shown by Reiser and Knuth [869, 1975], [668, 198 1, p. 222]. For otherrules, such as round away from zero, repeated subtraction and addition of thesame number can yield an increasing sequence, a phenomenon known as drift.For bases 2 and 10 rounding to even is preferred to rounding to odd.
Afterrounding to even a subsequent rounding to one less place does not involve atie. Thus we have the rounding sequence 2.445, 2.44, 2.4 with round to even,but 2.445, 2.45. 2.5 with round to odd. For base 2, round to even causescomputations to produce integers more often [640, 1979] as a consequence ofproducing a zero least significant bit. Rounding to even in the case of tiesseems to have first been suggested by Scarborough in the first edition (1930)of [897, 1950].Predict ions based on the growth in the size of mathematical models solvedas the memory and speed of computers increase suggest that floating pointarithmetic with unit roundoff u10-32 will be needed for some applicationson future supercomputers [48, 1989].The model (2.4) does not fully describe any floating point arithmetic. It ismerely a tool for error analysis-one that has been remarkably successful inview of our current understanding of the numerical behaviour of algorithms.There have been various attempts to devise formal models of floating pointarithmetic, by specifying sets of axioms in terms of which error analysis can beperformed.
Some attempts are discussed in §25.7.4. No model yet proposedhas been truly successful. Priest [844, 1992 ] conjectures that the task of“encapsulating all that we wish to know about floating point arithmetic ina single set of axioms” is impossible, and he gives some motivation for thisconjecture.Under the model (2.4), floating point arithmetic is not associative withrespect to any of the four basic operations:op = +,-,*,/, where:= fl(a op b). Nevertheless, floating point arithmetic enjoys some algebraic structure, and it is possible to carry out erroranalysis in the “algebra”. Fortunately, it was recognized by Wilkinsonand others in the 1950s that this laboured approach is unnecessarily complicated, and that it is much better to work with the exact equations satisfiedby the computed quantities.
As Parlett [821, 1990 ] notes, though, “Therehave appeared a number of ponderous tomes that do manage to abstract thecomputer’s numbers into a formal structure and burden us with more jargon.”A draft proposal of IEEE Standard 754 is defined and described in [599,19 8 1 ]. That article, together with others in the same issue of the journal2.9 N OTESANDR EFERENCES59Computer, provides a very readable description of IEEE arithmetic. In particular, an excellent discussion of gradual underflow is given by Coonen [243,1981]. A draft proposal of IEEE Standard 854 is presented, with discussion,in [225, 1984].W. M.
Kahan of the University of California at Berkeley received the1989 ACM Turing Award for his contributions to computer architecture andnumerical analysis, and in particular for his work on IEEE floating pointarithmetic standards 754 and 854.An interesting examination of the implications of the IEEE standard forhigh-level languages such as Fortran is given by Fateman [365, 1982].
Topicsdiscussed include trap handling and how to exploit NaNs. For an overview ofhardware implementations of IEEE arithmetic, and software support for it,see Cody [223, 1988].Producing a fast and correct implementation of IEEE arithmetic is a difficult task. Correctness is especially important for a microprocessor (as opposed to a software) implementation, because of the logistical difficulties ofcorrecting errors when they are found.
In late 1994, much publicity was generated by the discovery of a flaw in the floating point divide instruction ofIntel’s Pentium chip. Because of some missing entries in a lookup table onthe chip, the FPDIV instruction could give as few as four correct significantdecimal digits for double precision floating point arguments with certain special bit patterns [916, 1994].
The flaw had been discovered by Intel in thesummer of 1994 during ongoing testing of the Pentium processor, but it hadnot been publically announced. In October 1994, a mathematician doing research into prime numbers independently discovered the flaw and reported itto the user community. Largely because of the way in which Intel respondedto the discovery of the flaw, the story was reported in national newspapers(e.g., the New York Times [727, 1994]) and generated voluminous discussionon Internet newsgroups (notably comp. sys. intel). Intel corrected the bugin 1994 and, several weeks after the bug was first reported, offered to replacefaulty chips.
For a very readable account of the Pentium FPDIV bug story,see Moler [772, 1995]. To emphasize that bugs in implementations of floatingpoint arithmetic are far from rare, we mention that the Calculator applicationin Microsoft Windows 3.1 evaluates fl(2.01 - 2.00) = 0.0.Computer chip designs can be tested in two main ways: by software simulations and by applying formal verification techniques. Formal verificationaims to prove mathematically that the chip design is correct, and this approach is now being used by Intel and other chip manufacturers [452, 1995].The implementation of IEEE arithmetic for the Inmos T800 transputer in the1980s was done with the help of formal methods.
The IEEE standard wastranslated into the set-theoretic specification language Z, and then Occamprocedures were written that were proved to adhere to the specifications. Fordetails, see Barrett [69, 1989] or, for a more informal overview, Shepherd and60F LOATING P OINT A RITHMETICWilson [917, 1989].The floating point operation op (op = +,-,*, or /) is monotonic iff l(a op b) < fl(c op d) whenever a, b, c, and d are floating point numbersfor which a op b < c op d and neither fl (a op b) nor fl(c op d) overflows.
IEEEarithmetic is monotonic, as is any correctly rounded arithmetic. Monotonicarithmetic is import ant in the bisection algorithm for finding the eigenvalues of a symmetric tridiagonal matrix; see Demmel, Dhillon, and Ren [289,1994], who give rigorous correctness proofs of some bisect ion implementat ionsin floating point arithmetic. Ferguson and Brightman [371, 1991] derive conditions that ensure that an approximation to a monotonic function preservesthe monotonicity on a set of floating point numbers.On computers of the 1950s (fixed point) multiplication was slower than(fixed point) addition by up to an order of magnitude [693, 1980, Apps.
2, 3].For most modern computers it is a rule of thumb that a floating point additionand multiplication take about the same amount of time, while a floating pointdivision is 2-10 times slower, and a square root operation (in hardware) is 1-2times slower than a division.Some computers have the ability to perform a floating point multiplicationfollowed by an addition or subtraction, x * y + z or x * y - z , as though it were asingle floating point operation. For example, the IBM RISC System/6000 hasa fused multiply-add (FMA) operation that forms x * y + z with just a singlerounding error, at the end, the multiplication and addition being performedat twice the precision of the operands (and by overlapping additions andmultiplications the RS/6000 can perform a sequence of FMAs in one cycleeach) [596, 1993].