Thompson - Computing for Scientists and Engineers (523188), страница 34
Текст из файла (страница 34)
In Section 6.4 wepose and solve the important problem of normalization factors determined from aleast-squares fit. Logarithmic transformations of data are very common in scienceand engineering, and Section 6.5 shows how such a transformation biases the fittedparameters and how such bias may often be removed.
At last, in Project 6 (Section 6.6) there is a program for straight-line least squares, based on the formulas inSection 6.3, as well as suggestions for using the program. We round out the chapter with references on least-squares analysis.Background to this chapter, with emphasis on the underlying statistics and probability ideas is found in the books by Barlow, by Siegel, and by Snell. Requiredreading for all those who would use (or misuse) statistics is the book by Jaffe andSpirer.
The monograph by Draper and Smith on applied regression analysis hasseveral interesting examples and interpretations. Many basic techniques for erroranalysis in the physical sciences are described in the books by Taylor and by Lichten, as well as in the practical guide by Lyons. For topics in advanced physics, especially high-energy physics, the book by Lyons is suitable.181Next182LEAST-SQUARES ANALYSIS OF DATA6.1 INTRODUCTION TO THE LEAST-SQUARES CRITERIONIf we give up the condition that the fitted function must pass through each data point,then we must decide on objective criteria for a best fit. Let the fitting function beY(x). Suppose that our best-fit criterion were to minimize the (signed) differencebetween the y data, yj, and the fitting values Yj = Y(xj).Exercise 6.1Prove that if the signed difference between data and fitting function is to be minimized, then the best-fit function is just Yj = , the average of the y data, thesame value at all points.
nThe average of a set of data values is not, however, usually a good representative ofthe behavior of the data. For example, the annual average temperature in the town ofYakutsk on the Lena River in Siberia is a comfortable 10°C, but with a range from-40°C to +2O°C.Thus, given that signed differences are usually inappropriate, how about absolute values of differences? Unfortunately, this fitting criterion is fraught with difficult mathematical analysis, since the fitting methods are based on derivatives and theabsolute value function is singular at the origin, which is the point that a perfect fitwould reach.
We therefore turn to a reasonable alternative, namely minimizing sumsof differences squared.Minimization of a sum of squares of differences between data and fitting function is called a least-squares fit. The method was developed by Gauss and Legendrein the early nineteenth century in their analysis of astronomical orbits from impreciseobservations” Gauss also related the least-squares principle to the theory of probability, as we now outline in the context of maximum-likelihood analysis.Maximum likelihood and least squaresWe first introduce the Gaussian probability distribution, which is often called thenormal distribution in statistics.
Under quite general and plausible conditions (see,for example, Snell’s book) the probability that the value of variable v is obtained if ithas true average value is P (v), given by(6.1)Here is the standard deviation of v, with the mean-square deviation of the v values from is 2. Also, P (v) is normalized so that the total probability over all vvalues is unity. Strictly speaking, P is a probability density. In accord with common practice, we will use the term probability.
One way to characterize a probabilitydistribution, P(v), is to specify its moments, which are the values of the positiveinteger powers of v, weighted by P (v), then averaged over all values of v. Thus,for a continuous distribution (such as the Gaussian) the nth moment about the mean,Mn, is given by6.1INTRODUCTION TO THE LEAST-SQUARES CRITERION183(6.2)in which the origin of v has been moved to coincide with its average value.
Noticethat if the probability distribution is symmetric, which is quite common, then all themoments for odd values of n must be zero; you don’t need to attempt the integrationto see this. For the even moments, there are often clever methods of calculatingthem, as suggested in Exercise 6.2 (c). There are several interesting and relevantproperties of the Gaussian distribution, which you may like to derive in the following exercise.Exercise 6.2(a) In order to calculate the mean-square deviation (the second moment) of theGaussian, consider the value of v2P(v) integrated over v from to .
Byusing the method of parametric integration (described in many calculus books),show that this integral is 2, as claimed below (6.1).(b) Show that the Full Width at Half Maximum (FWHM) of the Gaussian distribution, the range of v over which it goes from half maximum through maximumto half maximum again, is(c) Use the technique of parametric integration to derive a formula for the evenmoments of the Gaussian distribution, namely the integrals of v2nP(v) with n apositive integer.
nThe Gaussian distribution, with its FWHM, is illustrated in Figure 6.1.FIGURE 6.1 The Gaussian probability distribution with unity total probability and unitystandard deviation.184LEAST-SQUARES ANALYSIS OF DATANow that we have some understanding of a probability distribution that will beassociated with errors in data, let us apply it to relate random errors in x and y datato the derivation of a fitting criterion. Suppose that at the jth data point a measurement gives values xj and yj with standard deviationsandrespectively.
Wealso suppose that their average values over many measurements would be Xj and Yjrespectively, just the fitting values of x and y that we would like to know. Thus, weassume that there are no systematic errors and that the data, if free of their randomerrors, could be exactly describable by the fitting function. In the real world, it isseldom true that either assumption is likely to be strictly justifiable. By assumingfurther that uncertainties in x and y measurements are independent, the probability ofobtaining such values is obtained as Pj given by(6.3)What is the likelihood, L, that n independent measurements, j = 1,2,..., n, wouldproduce the data actually obtained? This likelihood is just(6.4)Exercise 6.3Show that maximizing the likelihood function L in (6.3) for Gaussian distributions of xj and yj values is equivalent to minimizing the x2 function, given by(6.5)if it is assumed that the standard deviations are not fitting parameters.
nThus, we have derived the least-squares criterion, minimization of the sum ofsquares in (6.4), from the assumptions of independent measurements from point topoint and independent Gaussian-distributed random errors at each point. Note thatboth x and y variables may have errors, so that we do not make any distinction between independent and dependent variables. This is a common situation in the lifesciences and in astronomy, where pairs of observations are associated. In physicsand engineering one can often control x relatively well (xj Xj), so that it becomesthe independent variable, and x2 in (6.4) collapses to the y-dependent terms. Forgenerality, we retain the full form (6.4) in the following. The relation between probability distributions and random errors (“stochastic processes”) is developed inChapter 10 of Solomon’s book.One disadvantage of a least-squares best-fit criterion arises from data with relatively small errors that, if not included in the fit, would lie far from the best-fit function value.
When included in the fitting, such “outliers” may have a very strong influence on the fitting function because they contribute as the square of their distancefrom it. Although this is as it should be for independent random Gaussian errors6.2ORTHOGONAL FUNCTIONS AND LINEAR LEAST SQUARES185that are correctly estimated, real data and their errors may not satisfy all the conditions discussed above that are necessary to relate X2 to the maximum-likelihood criterion.
Therefore, the presence of outliers are usually considered quite problematicin least-squares fitting.Least squares and the objective functionAlthough the maximum-likelihood model for normally-distributed errors leads to aleast-squares condition when the weights are chosen as the inverses of the data variances, this is not the only justification for a least-squares criterion. For example, theGaussian error distribution is not appropriate when the errors arise from countingstatistics. Then one has Poisson error distributions (Chapter 3 in Barlow, Chapter 6 in Solomon), for which an estimate of the standard deviation for a mean valueof y counts isI have discussed elsewhere (Thompson, 1992) Poisson statisticsin the context of finding least-squares normalization factors.In many experiments one adjusts the measurement procedure so that the statistical uncertainties are proportional to the measured values, that is, one has constantpercentage errors.
Under this condition the analysis often becomes tractable, as wefind for the analysis of parameter bias in logarithmic transformations in Section 6.5.2The function to be minimized is therefore usually generalized from x in (6.4) tothe objective function, , defined by(6.6)in which wxj and wyj are weights assigned to the jth x and y points, respectively.The maximum-likelihood formula (6.4) is recovered if the weights are equal to theinverses of the squares of the standard deviations.
Only then is = X2 the function occurring in statistics to which confidence limits (C.L.) can be assigned. Ifweights other than inverses of error variances are used, then any overall scaling factor applied to them or to does not affect the values of Xj and Yj at which the minimum will occur. The statistical significance of the value of the objective functiondepends strongly on how relative weights are assigned, as well as on their overallmagnitude. This fact is often ignored by scientists and engineers, especially byphysicists.Because we have gone beyond the probability origin for least-squares fitting, alternative motivations for using a least-squares criterion are worth discussing, as wedo in the next section.6.2 ORTHOGONAL FUNCTIONS AND LINEAR LEAST SQUARESThe general least-squares condition, minimization of the objective function definedby (6.5), serves as the starting point for least-squares fitting using common functions.
Of greatest practical importance are orthogonal functions. They also con-186LEAST-SQUARES ANALYSIS OF DATAnect least-squares fitting with the Fourier expansions discussed in Chapters 9 and10. In this section we introduce orthogonal functions, then show their connection toleast-squares fitting.What are orthogonal functions?Orthogonal functions are sets of functions such that with weight factor wj and summation over the n observation points labeled by j, any two functions in the set areconstrained by(6.7)For example, in Fourier expansions the K (x) are the complex-exponential functions eiKx, or cos (Kx) and sin (Kx), which can all be made orthogonal over the xjrange if the xj are equally spaced and if the weights are all equal.