c15-1 (779585)
Текст из файла
65715.1 Least Squares as a Maximum Likelihood EstimatorCITED REFERENCES AND FURTHER READING:Bevington, P.R. 1969, Data Reduction and Error Analysis for the Physical Sciences (New York:McGraw-Hill).Brownlee, K.A. 1965, Statistical Theory and Methodology, 2nd ed. (New York: Wiley).Martin, B.R. 1971, Statistics for Physicists (New York: Academic Press).von Mises, R. 1964, Mathematical Theory of Probability and Statistics (New York: AcademicPress), Chapter X.Korn, G.A., and Korn, T.M. 1968, Mathematical Handbook for Scientists and Engineers, 2nd ed.(New York: McGraw-Hill), Chapters 18–19.15.1 Least Squares as a Maximum LikelihoodEstimatorSuppose that we are fitting N data points (xi , yi ) i = 1, .
. . , N , to a model thathas M adjustable parameters aj , j = 1, . . . , M . The model predicts a functionalrelationship between the measured independent and dependent variables,y(x) = y(x; a1 . . . aM )(15.1.1)where the dependence on the parameters is indicated explicitly on the right-hand side.What, exactly, do we want to minimize to get fitted values for the aj ’s? Thefirst thing that comes to mind is the familiar least-squares fit,minimize over a1 . . . aM :NX[yi − y(xi ; a1 . .
. aM )]2(15.1.2)i=1But where does this come from? What general principles is it based on? The answerto these questions takes us into the subject of maximum likelihood estimators.Given a particular data set of xi ’s and yi ’s, we have the intuitive feeling thatsome parameter sets a1 . . . aM are very unlikely — those for which the modelfunction y(x) looks nothing like the data — while others may be very likely — thosethat closely resemble the data.
How can we quantify this intuitive feeling? How canwe select fitted parameters that are “most likely” to be correct? It is not meaningfulto ask the question, “What is the probability that a particular set of fitted parametersa1 . . . aM is correct?” The reason is that there is no statistical universe of modelsfrom which the parameters are drawn. There is just one model, the correct one, anda statistical universe of data sets that are drawn from it!Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).should provide (i) parameters, (ii) error estimates on the parameters, and (iii) astatistical measure of goodness-of-fit.
When the third item suggests that the modelis an unlikely match to the data, then items (i) and (ii) are probably worthless.Unfortunately, many practitioners of parameter estimation never proceed beyonditem (i). They deem a fit acceptable if a graph of data and model “looks good.” Thisapproach is known as chi-by-eye. Luckily, its practitioners get what they deserve.658Chapter 15.Modeling of DataNotice that there is a factor ∆y in each term in the product. Maximizing (15.1.3) isequivalent to maximizing its logarithm, or minimizing the negative of its logarithm,namely,#"NX [yi − y(xi )]2− N log ∆y(15.1.4)2σ 2i=1Since N , σ, and ∆y are all constants, minimizing this equation is equivalent tominimizing (15.1.2).What we see is that least-squares fitting is a maximum likelihood estimationof the fitted parameters if the measurement errors are independent and normallydistributed with constant standard deviation.
Notice that we made no assumptionabout the linearity or nonlinearity of the model y(x; a1 . . .) in its parametersa1 . . . aM . Just below, we will relax our assumption of constant standard deviationsand obtain the very similar formulas for what is called “chi-square fitting” or“weighted least-squares fitting.” First, however, let us discuss further our verystringent assumption of a normal distribution.For a hundred years or so, mathematical statisticians have been in love withthe fact that the probability distribution of the sum of a very large number of verysmall random deviations almost always converges to a normal distribution. (Forprecise statements of this central limit theorem, consult [1] or other standard workson mathematical statistics.) This infatuation tended to focus interest away from theSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).That being the case, we can, however, turn the question around, and ask, “Givena particular set of parameters, what is the probability that this data set could haveoccurred?” If the yi ’s take on continuous values, the probability will always bezero unless we add the phrase, “...plus or minus some fixed ∆y on each data point.”So let’s always take this phrase as understood. If the probability of obtaining thedata set is infinitesimally small, then we can conclude that the parameters underconsideration are “unlikely” to be right. Conversely, our intuition tells us that thedata set should not be too improbable for the correct choice of parameters.In other words, we identify the probability of the data given the parameters(which is a mathematically computable number), as the likelihood of the parametersgiven the data.
This identification is entirely based on intuition. It has no formalmathematical basis in and of itself; as we already remarked, statistics is not abranch of mathematics!Once we make this intuitive identification, however, it is only a small furtherstep to decide to fit for the parameters a1 . . . aM precisely by finding those valuesthat maximize the likelihood defined in the above way. This form of parameterestimation is maximum likelihood estimation.We are now ready to make the connection to (15.1.2). Suppose that each datapoint yi has a measurement error that is independently random and distributed as anormal (Gaussian) distribution around the “true” model y(x).
And suppose that thestandard deviations σ of these normal distributions are the same for all points. Thenthe probability of the data set is the product of the probabilities of each point,(")2 #NY1 yi − y(xi )exp −∆y(15.1.3)P ∝2σi=115.1 Least Squares as a Maximum Likelihood Estimator659Chi-Square FittingWe considered the chi-square statistic once before, in §14.3.
Here it arisesin a slightly different context.If each data point (xi , yi ) has its own, known standard deviation σi , thenequation (15.1.3) is modified only by putting a subscript i on the symbol σ. Thatsubscript also propagates docilely into (15.1.4), so that the maximum likelihoodSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited.
To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).fact that, for real data, the normal distribution is often rather poorly realized, if it isrealized at all. We are often taught, rather casually, that, on average, measurementswill fall within ±σ of the true value 68 percent of the time, within ±2σ 95 percentof the time, and within ±3σ 99.7 percent of the time.
Extending this, one wouldexpect a measurement to be off by ±20σ only one time out of 2 × 1088 . We allknow that “glitches” are much more likely than that!In some instances, the deviations from a normal distribution are easy tounderstand and quantify. For example, in measurements obtained by countingevents, the measurement errors are usually distributed as a Poisson distribution,whose cumulative probability function was already discussed in §6.2. When thenumber of counts going into one data point is large, the Poisson distribution convergestowards a Gaussian. However, the convergence is not uniform when measured infractional accuracy. The more standard deviations out on the tail of the distribution,the larger the number of counts must be before a value close to the Gaussian isrealized.
The sign of the effect is always the same: The Gaussian predicts that “tail”events are much less likely than they actually (by Poisson) are. This causes suchevents, when they occur, to skew a least-squares fit much more than they ought.Other times, the deviations from a normal distribution are not so easy tounderstand in detail. Experimental points are occasionally just way off.
Характеристики
Тип файла PDF
PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.
Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.















