c14-2 (779577), страница 2
Текст из файла (страница 2)
Notice that it is importantthat a particular value of i label the corresponding points in each sample, that is,the ones that are paired. The significance of the t statistic in (14.2.7) is evaluatedfor N − 1 degrees of freedom.The routine is#include <math.h>void tptest(float data1[], float data2[], unsigned long n, float *t,float *prob)Given the paired arrays data1[1..n] and data2[1..n], this routine returns Student’s t forpaired data as t, and its significance as prob, small values of prob indicating a significantdifference of means.{void avevar(float data[], unsigned long n, float *ave, float *var);float betai(float a, float b, float x);unsigned long j;float var1,var2,ave1,ave2,sd,df,cov=0.0;avevar(data1,n,&ave1,&var1);avevar(data2,n,&ave2,&var2);for (j=1;j<=n;j++)cov += (data1[j]-ave1)*(data2[j]-ave2);cov /= df=n-1;sd=sqrt((var1+var2-2.0*cov)/n);*t=(ave1-ave2)/sd;*prob=betai(0.5*df,0.5,df/(df+(*t)*(*t)));}Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).Our final example of a Student’s t test is the case of paired samples. Herewe imagine that much of the variance in both samples is due to effects that arepoint-by-point identical in the two samples. For example, we might have two jobcandidates who have each been rated by the same ten members of a hiring committee.We want to know if the means of the ten scores differ significantly.
We first tryttest above, and obtain a value of prob that is not especially significant (e.g.,> 0.05). But perhaps the significance is being washed out by the tendency of somecommittee members always to give high scores, others always to give low scores,which increases the apparent variance and thus decreases the significance of anydifference in the means. We thus try the paired-sample formulas,14.2 Do Two Distributions Have the Same Means or Variances?619F-Test for Significantly Different Variancesvoid ftest(float data1[], unsigned long n1, float data2[], unsigned long n2,float *f, float *prob)Given the arrays data1[1..n1] and data2[1..n2], this routine returns the value of f, andits significance as prob.
Small values of prob indicate that the two arrays have significantlydifferent variances.{void avevar(float data[], unsigned long n, float *ave, float *var);float betai(float a, float b, float x);float var1,var2,ave1,ave2,df1,df2;avevar(data1,n1,&ave1,&var1);avevar(data2,n2,&ave2,&var2);if (var1 > var2) {Make F the ratio of the larger variance to the smaller*f=var1/var2;one.df1=n1-1;df2=n2-1;} else {*f=var2/var1;df1=n2-1;df2=n1-1;}*prob = 2.0*betai(0.5*df2,0.5*df1,df2/(df2+df1*(*f)));if (*prob > 1.0) *prob=2.0-*prob;}CITED REFERENCES AND FURTHER READING:von Mises, R. 1964, Mathematical Theory of Probability and Statistics (New York: AcademicPress), Chapter IX(B).Norusis, M.J.
1982, SPSS Introductory Guide: Basic Statistics and Operations; and 1985, SPSSX Advanced Statistics Guide (New York: McGraw-Hill).Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).The F-test tests the hypothesis that two samples have different variances bytrying to reject the null hypothesis that their variances are actually consistent. Thestatistic F is the ratio of one variance to the other, so values either 1 or 1will indicate very significant differences. The distribution of F in the null case isgiven in equation (6.4.11), which is evaluated using the routine betai. In the mostcommon case, we are willing to disprove the null hypothesis (of equal variances) byeither very large or very small values of F , so the correct significance is two-tailed,the sum of two incomplete beta functions.
It turns out, by equation (6.4.3), that thetwo tails are always equal; we need compute only one, and double it. Occasionally,when the null hypothesis is strongly viable, the identity of the two tails can becomeconfused, giving an indicated probability greater than one. Changing the probabilityto two minus itself correctly exchanges the tails.
These considerations and equation(6.4.3) give the routine620Chapter 14.Statistical Description of Data14.3 Are Two Distributions Different?Chi-Square TestSuppose that Ni is the number of events observed in the ith bin, and that ni isthe number expected according to some known distribution. Note that the Ni ’s areSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).Given two sets of data, we can generalize the questions asked in the previoussection and ask the single question: Are the two sets drawn from the same distributionfunction, or from different distribution functions? Equivalently, in proper statisticallanguage, “Can we disprove, to a certain required level of significance, the nullhypothesis that two data sets are drawn from the same population distributionfunction?” Disproving the null hypothesis in effect proves that the data sets are fromdifferent distributions. Failing to disprove the null hypothesis, on the other hand,only shows that the data sets can be consistent with a single distribution function.One can never prove that two data sets come from a single distribution, since (e.g.)no practical amount of data can distinguish between two distributions which differonly by one part in 1010 .Proving that two distributions are different, or showing that they are consistent,is a task that comes up all the time in many areas of research: Are the visible starsdistributed uniformly in the sky? (That is, is the distribution of stars as a functionof declination — position in the sky — the same as the distribution of sky area asa function of declination?) Are educational patterns the same in Brooklyn as in theBronx? (That is, are the distributions of people as a function of last-grade-attendedthe same?) Do two brands of fluorescent lights have the same distribution ofburn-out times? Is the incidence of chicken pox the same for first-born, second-born,third-born children, etc.?These four examples illustrate the four combinations arising from two differentdichotomies: (1) The data are either continuous or binned.
(2) Either we wish tocompare one data set to a known distribution, or we wish to compare two equallyunknown data sets. The data sets on fluorescent lights and on stars are continuous,since we can be given lists of individual burnout times or of stellar positions. Thedata sets on chicken pox and educational level are binned, since we are giventables of numbers of events in discrete categories: first-born, second-born, etc.; or6th Grade, 7th Grade, etc.
Stars and chicken pox, on the other hand, share theproperty that the null hypothesis is a known distribution (distribution of area in thesky, or incidence of chicken pox in the general population). Fluorescent lights andeducational level involve the comparison of two equally unknown data sets (the twobrands, or Brooklyn and the Bronx).One can always turn continuous data into binned data, by grouping the eventsinto specified ranges of the continuous variable(s): declinations between 0 and 10degrees, 10 and 20, 20 and 30, etc. Binning involves a loss of information, however.Also, there is often considerable arbitrariness as to how the bins should be chosen.Along with many other investigators, we prefer to avoid unnecessary binning of data.The accepted test for differences between binned distributions is the chi-squaretest.
For continuous data as a function of a single variable, the most generallyaccepted test is the Kolmogorov-Smirnov test. We consider each in turn..