c14-7 (779582)
Текст из файла
14.7 Do Two-Dimensional Distributions Differ?645}}*tau=s/sqrt(en1*en2);svar=(4.0*points+10.0)/(9.0*points*(points-1.0));*z=(*tau)/sqrt(svar);*prob=erfcc(fabs(*z)/1.4142136);}CITED REFERENCES AND FURTHER READING:Lehmann, E.L. 1975, Nonparametrics: Statistical Methods Based on Ranks (San Francisco:Holden-Day).Downie, N.M., and Heath, R.W. 1965, Basic Statistical Methods, 2nd ed. (New York: Harper &Row), pp. 206–209.Norusis, M.J. 1982, SPSS Introductory Guide: Basic Statistics and Operations; and 1985, SPSSX Advanced Statistics Guide (New York: McGraw-Hill).14.7 Do Two-Dimensional Distributions Differ?We here discuss a useful generalization of the K–S test (§14.3) to two-dimensionaldistributions. This generalization is due to Fasano and Franceschini [1], a variant on anearlier idea due to Peacock [2].In a two-dimensional distribution, each data point is characterized by an (x, y) pair ofvalues.
An example near to our hearts is that each of the 19 neutrinos that were detectedfrom Supernova 1987A is characterized by a time ti and by an energy Ei (see [3]). Wemight wish to know whether these measured pairs (ti , Ei ), i = 1 . . . 19 are consistent with atheoretical model that predicts neutrino flux as a function of both time and energy — that is,a two-dimensional probability distribution in the (x, y) [here, (t, E)] plane. That would be aone-sample test. Or, given two sets of neutrino detections, from two comparable detectors,we might want to know whether they are compatible with each other, a two-sample test.In the spirit of the tried-and-true, one-dimensional K–S test, we want to range overthe (x, y) plane in search of some kind of maximum cumulative difference between twotwo-dimensional distributions. Unfortunately, cumulative probability distribution is notwell-defined in more than one dimension! Peacock’s insight was that a good surrogate isthe integrated probability in each of four natural quadrants around a given point (xi , yi ),namely the total probabilities (or fraction of data) in (x > xi , y > yi ), (x < xi , y > yi ),(x < xi , y < yi ), (x > xi , y < yi ).
The two-dimensional K–S statistic D is now takento be the maximum difference (ranging both over data points and over quadrants) of thecorresponding integrated probabilities. When comparing two data sets, the value of D maydepend on which data set is ranged over. In that case, define an effective D as the averageSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).li=l/j;decoding its rowlj=l-j*li;and column.mm=(m1=li-ki)*(m2=lj-kj);pairs=tab[ki+1][kj+1]*tab[li+1][lj+1];if (mm) {Not a tie.en1 += pairs;en2 += pairs;s += (mm > 0 ? pairs : -pairs);Concordant, or discordant.} else {if (m1) en1 += pairs;if (m2) en2 += pairs;}646Chapter 14.Statistical Description of Data3.12 | .56.65 | .26.11 | .09.12 | .0920−1−2−3−3−2−10123Figure 14.7.1.
Two-dimensional distributions of 65 triangles and 35 squares. The two-dimensional K–Stest finds that point one of whose quadrants (shown by dotted lines) maximizes the difference betweenfraction of triangles and fraction of squares. Then, equation (14.7.1) indicates whether the difference isstatistically significant, i.e., whether the triangles and squares must have different underlying distributions.of the two values obtained.
If you are confused at this point about the exact definition of D,don’t fret; the accompanying computer routines amount to a precise algorithmic definition.Figure 14.7.1 gives a feeling for what is going on. The 65 triangles and 35 squares seemto have somewhat different distributions in the plane. The dotted lines are centered on thetriangle that maximizes the D statistic; the maximum occurs in the upper-left quadrant. Thatquadrant contains only 0.12 of all the triangles, but it contains 0.56 of all the squares.
Thevalue of D is thus 0.44. Is this statistically significant?Even for fixed sample sizes, it is unfortunately not rigorously true that the distributionof D in the null hypothesis is independent of the shape of the two-dimensional distribution.In this respect the two-dimensional K–S test is not as natural as its one-dimensional parent.However, extensive Monte Carlo integrations have shown that the distribution of the twodimensional D is very nearly identical for even quite different distributions, as long as theyhave the same coefficient of correlation r, defined in the usual way by equation (14.5.1). Intheir paper, Fasano and Franceschini tabulate Monte Carlo results for (what amounts to) thedistribution of D as a function of (of course) D, sample size N , and coefficient of correlationr. Analyzing their results, one finds that the significance levels for the two-dimensional K–Stest can be summarized by the simple, though approximate, formulas,√ND√Probability (D > observed ) = QKS√(14.7.1)1 + 1 − r2 (0.25 − 0.75/ N )Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).114.7 Do Two-Dimensional Distributions Differ?647for the one-sample case, and the same for the two-sample case, but withN=N1 N2.N1 + N2(14.7.2)#include <math.h>#include "nrutil.h"void ks2d1s(float x1[], float y1[], unsigned long n1,void (*quadvl)(float, float, float *, float *, float *, float *),float *d1, float *prob)Two-dimensional Kolmogorov-Smirnov test of one sample against a model.
Given the x and ycoordinates of n1 data points in arrays x1[1..n1] and y1[1..n1], and given a user-suppliedfunction quadvl that exemplifies the model, this routine returns the two-dimensional K-Sstatistic as d1, and its significance level as prob. Small values of prob show that the sampleis significantly different from the model. Note that the test is slightly distribution-dependent,so prob is only an estimate.{void pearsn(float x[], float y[], unsigned long n, float *r, float *prob,float *z);float probks(float alam);void quadct(float x, float y, float xx[], float yy[], unsigned long nn,float *fa, float *fb, float *fc, float *fd);unsigned long j;float dum,dumm,fa,fb,fc,fd,ga,gb,gc,gd,r1,rr,sqen;*d1=0.0;for (j=1;j<=n1;j++) {Loop over the data points.quadct(x1[j],y1[j],x1,y1,n1,&fa,&fb,&fc,&fd);(*quadvl)(x1[j],y1[j],&ga,&gb,&gc,&gd);*d1=FMAX(*d1,fabs(fa-ga));*d1=FMAX(*d1,fabs(fb-gb));*d1=FMAX(*d1,fabs(fc-gc));Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).The above formulas are accurate enough when N >∼ 20, and when the indicatedprobability (significance level) is less than (more significant than) 0.20 or so. When theindicated probability is > 0.20, its value may not be accurate, but the implication that thedata and model (or two data sets) are not significantly different is certainly correct.
Noticethat in the limit of r → 1 (perfect correlation), equations (14.7.1) and (14.7.2) reduce toequations (14.3.9) and (14.3.10): The two-dimensional data lie on a perfect straight line, andthe two-dimensional K–S test becomes a one-dimensional K–S test.The significance level for the data in Figure 14.7.1, by the way, is about 0.001. Thisestablishes to a near-certainty that the triangles and squares were drawn from differentdistributions. (As in fact they were.)Of course, if you do not want to rely on the Monte Carlo experiments embodied inequation (14.7.1), you can do your own: Generate a lot of synthetic data sets from yourmodel, each one with the same number of points as the real data set.
Compute D for eachsynthetic data set, using the accompanying computer routines (but ignoring their calculatedprobabilities), and count what fraction of the time these synthetic D’s exceed the D from thereal data. That fraction is your significance.One disadvantage of the two-dimensional tests, by comparison with their onedimensional progenitors, is that the two-dimensional tests require of order N 2 operations:Two nested loops of order N take the place of an N log N sort.
Характеристики
Тип файла PDF
PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.
Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.















