Thompson - Computing for Scientists and Engineers (523188), страница 40
Текст из файла (страница 40)
Note also that in this error model theexponent B (which is often of primary interest) is unbiased.The bias in A can be estimated by expanding the logarithm in (6.60) in a Maclaurin series, then evaluating the expectation values term by term. The unbiasedvalue, A, can be estimated from the extracted biased value Ab in (6.60) by solvingfor A to obtain(6.6 1)where the bias term, Lb(P), is given by(6.62)with the sum that depends upon the error distribution being given by(6.63)where Em denotes the mth moment of the distribution P.
The first term in the Maclaurin series vanishes because P is to have zero mean, while the second term contributessince P is to have unity standard deviation (second moment about themean).212LEAST-SQUARES ANALYSIS OF DATAExercise 6.22Work through the steps in the derivation of the bias expressions (6.54) through(6.63). nThus we have obtained for proportional errors a prescription for removing the biasinduced by a logarithmic transformation of data.
We now show that this bias is onlyweakly dependent on the probability distribution function satisfied by the errors.Dependence of bias on error distributionIn the bias term, Lb (P) in (6.62), the sum, S(P), depends upon the error distribution, P. For many common probability distributions, only the even moments arenonzero, so then all the terms in the sum in (6.63) are strictly positive and highermoments must necessarily increase the bias.Consider the commonly assumed Gaussian probability distribution P = PG, discussed in Section 6.1, which is also called the normal distribution. Its third moment vanishes because of its symmetry about the mean value. The fourth moment ofthe Gaussian gives(6.64)Another convenient and commonly used, but slightly unrealistic, distribution isthe uniform probability distribution, P = Pu, shown in Figure 6.7.Exercise 6.23Show that the height and width of the standardized uniform probability distribution are as indicated in Figure 6.7.
nFIGURE 6.7 The uniform probability distribution, Pu (0,I ), also called the rectangular distribution. The distribution shown has zero mean, unity normalization, and unity standard deviation.6.5LOGARITHMIC TRANSFORMATIONS AND PARAMETER BIASES213For the uniform distribution P = Pu, one obtains (similarly to the Gaussian distribution) the exponent of the bias factor given by(6.65)through the fourth moment of the distribution. For a given standard deviation thefourth moment contributes 30% (6/20) more for PG than for Pu, primarily becausethe wings of the Gaussian distribution include larger values of vj for a given .Other error models can be incorporated in the estimates of the pre-exponential forproportional errors if their probability distributions can be calculated.Exercise 6.24(a) Verify the correctness of the second terms of the bias factors Lb in (6.64)and (6.65) for the Gaussian and uniform distributions, respectively, by calculating the corresponding fourth moments of their probability distributions.(b) As numerical examples of the bias induced in the pre-exponential A by a logarithmic transformation, show that A needs to be corrected upward by 0.5% fordata with 10% standard deviationand upward by about 5% for datawith 30% random errorsnMathematically punctilious readers should object to the analysis of the Gaussiandistribution, because the argument of the ln function in the error model may becomenegative, even though this is very improbable if is small.
(Forthe probability is about 3 × 10-7.) Therefore, in the analytical treatment for the Gaussian,(6.63) represents an asymptotic series which eventually diverges. In the MonteCarlo simulation suggested in Exercise 6.25 if the sample size is very large thechance of getting a negative argument increases, and the program should include atrap for this condition.Formally, we can circumvent the problem with the logarithmic transformationfor a Gaussian distribution of errors by defining suitably truncated distributionswhose low-order moments are nearly the same as the complete distributions, so thatthere are no practical consequences for the bias estimates. For a uniform distribution, the problem arises only if= 0.58 (see Figure 6.7), which wouldusually be considered too large an error to justify anything more than a cursory fit tothe data.The simple corrections given by (6.61) and (6.62) are worth making if the assumption of proportional random errors, (6.57), is realistic.
It is also reassuringthat the exponent B is unbiased under this assumption. For any other error modelthe logarithmic transformation induces biases in both the exponent B and the pre-exponent A which cannot easily be corrected. As a final remark, data taken with asmall number of samples, typically with fewer than fifty samples, will have errorsdominated by the statistics of the data, which overwhelm any effects of the transformation bias that we have just investigated.214LEAST-SQUARES ANALYSIS OF DATAAs a way of confirming the analysis of the bias in taking logarithms for analyzing exponential behavior, try a Monte Carlo simulation of a Gaussian random-errordistribution, as suggested in the following exercise.Exercise 6.25(a) Use a computer random-number generator to provide a sample of say10,000 values from a Gaussian distribution, and force this sample to have zeromean and unity standard deviation, as the above analysis uses.
Then sort thesample into 100 bins.(b) Choose = 0.2 (an average 20% error for each y datum), then make fromthe sampled Gaussian in (a) the distribution lnas appears in(6.60). The Monte Carlo estimate of the bias is just the negative of the meanvalue (expectation value, E) of this distribution. How good is the agreementwith our analytic estimate, Lb(PG) obtained from (6.64)? nIn Sections 6.4 and 6.5 we explored two common and useful aspects of fittingimprecise data, to wit, normalization factors and logarithmic transformations.
In thenext section we return to straight-line least squares with errors in both variables,emphasizing the numerics of this problem.6.6 PROJECT 6: PROGRAM FOR STRAIGHT-LINELEAST-SQUARES FITSOur aim in this section is to implement a program that makes a straight-line leastsquares fit to data in which there are errors in both variables, following the algorithmdeveloped in Section 6.3.
We use the Independent Diagonal Weighting Model withConstant ratio of x to y weights from point to point, called IDWMC, as illustrated inFigure 6.2. This model for the weights is often quite realistic, and it also allows thesimple formulas (6.36) or (6.37) to be used to estimate the slope of the best-fit linein terms of the data, the weights at each datum, and the ratio of x weights to yweights,Organization of Straight-Line Least SquaresTheorganizationandimplementation of Straight-Line Least Squares arequite straightforward, so we describe the features of the code, then we present it.The overall structure of Straight-Line Least Squares is very simple. Awhi1e loop over the number of data points, n, is used to input the number of datapoints.
A value of n = 0 is used to terminate execution, while a value larger thanthe array bounds, NMAX - 1 (since the [0] element is not used), causes an errormessage ( ! ! ). A value of n < 2 is also diagnosed as an error because the problemis then undetermined, since one data point can’t determine two parameters.6.6PROGRAM FOR STRAIGHT-LINE LEAST-SQUARES FITS215If the input n value is acceptable, the parameter for the ratio of x weights to yweights,is input, followed by n triples of x values, y values, and y weights.Then the program invokes function Least Squares, which returns the intercept(al), the slope (a2), the y-fit contribution to the minimum objective function(oymin), and the correlation coefficient between x and y values (rho).The C function Least Squares is a direct coding of formulas in Section 6.3.The discussion in Section 6.3 of the properties of the least-squares slopes and Figure 6.4 should convince you that once becomes small its value is not important.Try it in the program and see.
The quadratic equation for the slope has two forms ofsolution, in order to avoid errors from subtractive cancellation, either (6.36) or(6.37).PROGRAM 6.2 Straight-line least squares with errors in both variables. but with a constant ratio of x weights to y weights.#include <stdio.h>#include <Math.h>#define MAX 101main(){/* Straight-Line Least Squares *//* Weighted fit with errors in both variables */double xlist[MAX], ylist[MAX], wght[MAX]; /* data & weights */double a1,a2,aymin,rho;/* intercept,slope,objective,correlation */double lambda; /* ratio of x weights to y weights */double xin, yin, win; /* temporaries for input */int j, n; /* data counter, number of data sets */void LeastSquares();printf("Straight Line Least Squares\n");n = 2;while(n>O)printf("\nInput n (n=O to end): "); scanf("%i",&n);if (n <= 0 )printf("\nEnd Straight Line Least Squares"); exit(O);if ( n > MAX-1 ) printf("\n !! More than %i data sets\n",MAX-1);else216LEAST-SQUARES ANALYSIS OF DATAif ( n < 2 ) printf("\n !!Fewer than 2 data sets\n");else{printf("Input ratio of x weights to y weights, lambda: ");scanf("%lf",&lambda);printf("Input %i triples of x, y, y weight:\n",n);for ( j = 1; j <= n; j++ ){scanf("%lf%lf%lf",&xin,&yin,&win);xlist[j] = xin; ylist[j] = yin; wght[j] = win;}LeastSquares(xlist,ylist,wght,lambda,n,&a1,&a2,&oymin,&rho);printf("Intercept a1 = %g and slope a2 = %g\n",a1,a2);printf("Minimized y objective function = %g\n", oymin);printf("Correlation coefficient = %g\n",rho);}}} /* end while n loop */void LeastSquares(xlist,ylist,wght,lambda,n,al,a2,oymin,rho)/* Straight-line weighted least squares */double xlist[],ylist[],wght[];double lambda;double *al,*a2,*oymin,*rho;int n;doubleint j;sw,sx,sy,xav,yav,xk,yk,sxx,sxy,syy,quad;SW = 0; sx = 0; sy = O;/* Weighted averages of x and y *lfor ( j = 1; j <= n; j++ ){ /* sum weights, then sum weighted x and y values */sw = sw+wght[j];sx = sx+xlist[j]*wght[j]; sy = sy+ylist[j]*wght[j];}xav = sx/sw; yav = sy/sw;/* Make weighted bilinear Sums */sxx = o;for ( j = 1; j <= n; j++ ){xk = xlist[j]-xav; yk = ylist[j]-yav;sxx = sxx+xk*xk*wght[j];sxy = sxy+xk*yk*wght[j];6.6PROGRAM FOR STRAIGHT-LINE LEAST-SQUARES FITS217syy = syy+yk*yk*wght[j];}quad = sxx*lambda-syy; /* Stable formulas for slope */if ( quad > O )*a2 = 2*sxy*lambda/(sqrt(quad*quad+4*sxy*lambda)+quad);else*a = (-quad+sqrt(quad*quad+4*sxy*sxy*lambda))/(2*sxy);*al = yav-(*a2)*xav; /* intercept */*oymin = syy-(*a2)*sxy; /* minimized y objective function */*rho = sxy/sqrt(sxx*syy); /* correlation coefficient */Testing and using the least-squares programNow that you know from the preceding subsection how Straight -Line LeastSquares is supposed to work, try adapting it and running it in your computing environment.Exercise 6.26(a) Implement the program Straight-Line Least Squares for your computer.