Press, Teukolsly, Vetterling, Flannery - Numerical Recipes in C (523184), страница 66
Текст из файла (страница 66)
So, because such generators areknown to exist, we can leave to the philosophers the problem of defining them.A pragmatic point of view, then, is that randomness is in the eye of the beholder(or programmer). What is random enough for one application may not be randomenough for another.
Still, one is not entirely adrift in a sea of incommensurableapplications programs: There is a certain list of statistical tests, some sensible andsome merely enshrined by history, which on the whole will do a very good jobof ferreting out any correlations that are likely to be detected by an applicationsprogram (in this case, yours). Good random number generators ought to pass all ofthese tests; or at least the user had better be aware of any that they fail, so that he orshe will be able to judge whether they are relevant to the case at hand.2747.1 Uniform Deviates275As for references on this subject, the one to turn to first is Knuth [1].
Thentry [2]. Only a few of the standard books on numerical methods [3-4] treat topicsrelating to random numbers.CITED REFERENCES AND FURTHER READING:Knuth, D.E. 1981, Seminumerical Algorithms, 2nd ed., vol. 2 of The Art of Computer Programming(Reading, MA: Addison-Wesley), Chapter 3, especially §3.5.
[1]Bratley, P., Fox, B.L., and Schrage, E.L. 1983, A Guide to Simulation (New York: SpringerVerlag). [2]Dahlquist, G., and Bjorck, A. 1974, Numerical Methods (Englewood Cliffs, NJ: Prentice-Hall),Chapter 11. [3]Forsythe, G.E., Malcolm, M.A., and Moler, C.B. 1977, Computer Methods for MathematicalComputations (Englewood Cliffs, NJ: Prentice-Hall), Chapter 10. [4]7.1 Uniform DeviatesUniform deviates are just random numbers that lie within a specified range(typically 0 to 1), with any one number in the range just as likely as any other. Theyare, in other words, what you probably think “random numbers” are.
However,we want to distinguish uniform deviates from other sorts of random numbers, forexample numbers drawn from a normal (Gaussian) distribution of specified meanand standard deviation. These other sorts of deviates are almost always generated byperforming appropriate operations on one or more uniform deviates, as we will seein subsequent sections. So, a reliable source of random uniform deviates, the subjectof this section, is an essential building block for any sort of stochastic modelingor Monte Carlo computer work.System-Supplied Random Number GeneratorsMost C implementations have, lurking within, a pair of library routines forinitializing, and then generating, “random numbers.” In ANSI C, the synopsis is:#include <stdlib.h>#define RAND_MAX ...void srand(unsigned seed);int rand(void);You initialize the random number generator by invoking srand(seed) withsome arbitrary seed.
Each initializing value will typically result in a differentrandom sequence, or a least a different starting point in some one enormously longsequence. The same initializing value of seed will always return the same randomsequence, however.You obtain successive random numbers in the sequence by successive calls torand(). That function returns an integer that is typically in the range 0 to thelargest representable positive value of type int (inclusive).
Usually, as in ANSI C,this largest value is available as RAND_MAX, but sometimes you have to figure it outfor yourself. If you want a random float value between 0.0 (inclusive) and 1.0(exclusive), you get it by an expression like276Chapter 7.Random Numbersx = rand()/(RAND_MAX+1.0);Now our first, and perhaps most important, lesson in this chapter is: be very,very suspicious of a system-supplied rand() that resembles the one just described.If all scientific papers whose results are in doubt because of bad rand()s wereto disappear from library shelves, there would be a gap on each shelf about asbig as your fist.
System-supplied rand()s are almost always linear congruentialgenerators, which generate a sequence of integers I1 , I2 , I3 , . . ., each between 0 andm − 1 (e.g., RAND_MAX) by the recurrence relationIj+1 = aIj + c(mod m)(7.1.1)Here m is called the modulus, and a and c are positive integers called the multiplierand the increment respectively. The recurrence (7.1.1) will eventually repeat itself,with a period that is obviously no greater than m.
If m, a, and c are properly chosen,then the period will be of maximal length, i.e., of length m. In that case, all possibleintegers between 0 and m − 1 occur at some point, so any initial “seed” choice of I0is as good as any other: the sequence just takes off from that point.Although this general framework is powerful enough to provide quite decentrandom numbers, its implementation in many, if not most, ANSI C libraries is quiteflawed; quite a number of implementations are in the category “totally botched.”Blame should be apportioned about equally between the ANSI C committee andthe implementors.
The typical problems are these: First, since the ANSI standardspecifies that rand() return a value of type int — which is only a two-byte quantityon many machines — RAND_MAX is often not very large. The ANSI C standardrequires only that it be at least 32767. This can be disastrous in many circumstances:for a Monte Carlo integration (§7.6 and §7.8), you might well want to evaluate 106different points, but actually be evaluating the same 32767 points 30 times each, notat all the same thing! You should categorically reject any library random numberroutine with a two-byte returned value.Second, the ANSI committee’s published rationale includes the followingmischievous passage: “The committee decided that an implementation should beallowed to provide a rand function which generates the best random sequencepossible in that implementation, and therefore mandated no standard algorithm.
Itrecognized the value, however, of being able to generate the same pseudo-randomsequence in different implementations, and so it has published an example. . . .[emphasis added]” The “example” isunsigned long next=1;int rand(void) /* NOT RECOMMENDED (see text) */{next = next*1103515245 + 12345;return (unsigned int)(next/65536) % 32768;}void srand(unsigned int seed){next=seed;}7.1 Uniform Deviates277This corresponds to equation (7.1.1) with a = 1103515245, c = 12345, and m = 232(since arithmetic done on unsigned long quantities is guaranteed to return thecorrect low-order bits).
These are not particularly good choices for a and c, thoughthey are not gross embarrassments by themselves. The real botches occur whenimplementors, taking the committee’s statement above as license, try to “improve”on the published example. For example, one popular 32-bit PC-compatible compilerprovides a long generator that uses the above congruence, but swaps the high-orderand low-order 16 bits of the returned value. Somebody probably thought that thisextra flourish added randomness; in fact it ruins the generator.
While these kinds ofblunders can, of course, be fixed, there remains a fundamental flaw in simple linearcongruential generators, which we now discuss.The linear congruential method has the advantage of being very fast, requiringonly a few operations per call, hence its almost universal use. It has the disadvantagethat it is not free of sequential correlation on successive calls.
If k random numbers ata time are used to plot points in k dimensional space (with each coordinate between0 and 1), then the points will not tend to “fill up” the k-dimensional space, but ratherwill lie on (k − 1)-dimensional “planes.” There will be at most about m1/k suchplanes. If the constants m, a, and c are not very carefully chosen, there will be manyfewer than that. If m is as bad as 32768, then the number of planes on which triplesof points lie in three-dimensional space will be no greater than about the cube rootof 32768, or 32. Even if m is close to the machine’s largest representable integer,e.g., ∼ 232 , the number of planes on which triples of points lie in three-dimensionalspace is usually no greater than about the cube root of 232, about 1600. You mightwell be focusing attention on a physical process that occurs in a small fraction of thetotal volume, so that the discreteness of the planes can be very pronounced.Even worse, you might be using a generator whose choices of m, a, and c havebeen botched.