c14-6 (779581), страница 2
Текст из файла (страница 2)
The external routines crank (below) and sort2 (§8.2) are used. A small valueof either probd or probrs indicates a significant correlation (rs positive) or anticorrelation(rs negative).{float betai(float a, float b, float x);void crank(unsigned long n, float w[], float *s);float erfcc(float x);void sort2(unsigned long n, float arr[], float brr[]);unsigned long j;float vard,t,sg,sf,fac,en3n,en,df,aved,*wksp1,*wksp2;wksp1=vector(1,n);wksp2=vector(1,n);for (j=1;j<=n;j++) {wksp1[j]=data1[j];wksp2[j]=data2[j];}sort2(n,wksp1,wksp2);Sort each of the data arrays, and convert thePentries toranks.
The values sf and sg return the sums (fk3 −fk )crank(n,wksp1,&sf);P 3sort2(n,wksp2,wksp1);and (gm − gm ), respectively.crank(n,wksp2,&sg);*d=0.0;for (j=1;j<=n;j++)Sum the squared difference of ranks.*d += SQR(wksp1[j]-wksp2[j]);Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited.
To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).D=642Chapter 14.Statistical Description of DataExpectation value of D,and variance of D givenumber of standard deviations and significance.Rank correlation coefficient,and its t value,give its significance.}void crank(unsigned long n, float w[], float *s)Given a sorted array w[1..n], replaces the elements by their rank, including midranking of ties,and returns as s the sum of f 3 − f , where f is the number of elements in each tie.{unsigned long j=1,ji,jt;float t,rank;*s=0.0;while (j < n) {if (w[j+1] != w[j]) {Not a tie.w[j]=j;++j;} else {A tie:for (jt=j+1;jt<=n && w[jt]==w[j];jt++);How far does it go?rank=0.5*(j+jt-1);This is the mean rank of the tie,for (ji=j;ji<=(jt-1);ji++) w[ji]=rank;so enter it into all the tiedt=jt-j;entries,*s += t*t*t-t;and update s.j=jt;}}if (j == n) w[n]=n;If the last element was not tied, this is its rank.}Kendall’s TauKendall’s τ is even more nonparametric than Spearman’s rs or D.
Instead ofusing the numerical difference of ranks, it uses only the relative ordering of ranks:higher in rank, lower in rank, or the same in rank. But in that case we don’t evenhave to rank the data! Ranks will be higher, lower, or the same if and only ifthe values are larger, smaller, or equal, respectively. On balance, we prefer rs asbeing the more straightforward nonparametric test, but both statistics are in generaluse. In fact, τ and rs are very strongly correlated and, in most applications, areeffectively the same test.To define τ , we start with the N data points (xi , yi ).
Now consider all1N(N− 1) pairs of data points, where a data point cannot be paired with itself,2and where the points in either order count as one pair. We call a pair concordantSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).en=n;en3n=en*en*en-en;aved=en3n/6.0-(sf+sg)/12.0;fac=(1.0-sf/en3n)*(1.0-sg/en3n);vard=((en-1.0)*en*en*SQR(en+1.0)/36.0)*fac;*zd=(*d-aved)/sqrt(vard);*probd=erfcc(fabs(*zd)/1.4142136);*rs=(1.0-(6.0/en3n)*(*d+(sf+sg)/12.0))/sqrt(fac);fac=(*rs+1.0)*(1.0-(*rs));if (fac > 0.0) {t=(*rs)*sqrt((en-2.0)/fac);df=en-2.0;*probrs=betai(0.5*df,0.5,df/(df+t*t));} else*probrs=0.0;free_vector(wksp2,1,n);free_vector(wksp1,1,n);64314.6 Nonparametric or Rank Correlationconcordant − discordant√τ= √concordant + discordant + extra-y concordant + discordant + extra-x(14.6.8)You can easily convince yourself that this must lie between 1 and −1, and that ittakes on the extreme values only for complete rank agreement or complete rankreversal, respectively.More important, Kendall has worked out, from the combinatorics, the approximate distribution of τ in the null hypothesis of no association between x and y.In this case τ is approximately normally distributed, with zero expectation valueand a variance ofVar(τ ) =4N + 109N (N − 1)(14.6.9)The following program proceeds according to the above description, andtherefore loops over all pairs of data points.
Beware: This is an O(N 2 ) algorithm,unlike the algorithm for rs , whose dominant sort operations are of order N log N . Ifyou are routinely computing Kendall’s τ for data sets of more than a few thousandpoints, you may be in for some serious computing. If, however, you are willing tobin your data into a moderate number of bins, then read on.#include <math.h>void kendl1(float data1[], float data2[], unsigned long n, float *tau,float *z, float *prob)Given data arrays data1[1..n] and data2[1..n], this program returns Kendall’s τ as tau,its number of standard deviations from zero as z, and its two-sided significance level as prob.Small values of prob indicate a significant correlation (tau positive) or anticorrelation (taunegative).{float erfcc(float x);unsigned long n2=0,n1=0,k,j;long is=0;float svar,aa,a2,a1;for (j=1;j<n;j++) {for (k=(j+1);k<=n;k++) {a1=data1[j]-data1[k];a2=data2[j]-data2[k];aa=a1*a2;if (aa) {++n1;Loop over first member of pair,and second member.Neither array has a tie.Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).if the relative ordering of the ranks of the two x’s (or for that matter the two x’sthemselves) is the same as the relative ordering of the ranks of the two y’s (or forthat matter the two y’s themselves). We call a pair discordant if the relative orderingof the ranks of the two x’s is opposite from the relative ordering of the ranks of thetwo y’s. If there is a tie in either the ranks of the two x’s or the ranks of the twoy’s, then we don’t call the pair either concordant or discordant.
If the tie is in thex’s, we will call the pair an “extra y pair.” If the tie is in the y’s, we will call thepair an “extra x pair.” If the tie is in both the x’s and the y’s, we don’t call the pairanything at all. Are you still with us?Kendall’s τ is now the following simple combination of these various counts:644Chapter 14.Statistical Description of Data++n2;aa > 0.0 ? ++is : --is;} else {if (a1) ++n1;if (a2) ++n2;}One or both arrays have ties.An “extra x” event.An “extra y” event.Equation (14.6.8).Equation (14.6.9).Significance.}Sometimes it happens that there are only a few possible values each for x andy.
In that case, the data can be recorded as a contingency table (see §14.4) that givesthe number of data points for each contingency of x and y.Spearman’s rank-order correlation coefficient is not a very natural statisticunder these circumstances, since it assigns to each x and y bin a not-very-meaningfulmidrank value and then totals up vast numbers of identical rank differences. Kendall’stau, on the other hand, with its simple counting, remains quite natural. Furthermore,its O(N 2 ) algorithm is no longer a problem, since we can arrange for it to loop overpairs of contingency table entries (each containing many data points) instead of overpairs of data points. This is implemented in the program that follows.Note that Kendall’s tau can be applied only to contingency tables where bothvariables are ordinal, i.e., well-ordered, and that it looks specifically for monotoniccorrelations, not for arbitrary associations.
These two properties make it less generalthan the methods of §14.4, which applied to nominal, i.e., unordered, variables andarbitrary associations.Comparing kendl1 above with kendl2 below, you will see that we have“floated” a number of variables. This is because the number of events in acontingency table might be sufficiently large as to cause overflows in some of theinteger arithmetic, while the number of individual data points in a list could notpossibly be that large [for an O(N 2 ) routine!].#include <math.h>void kendl2(float **tab, int i, int j, float *tau, float *z, float *prob)Given a two-dimensional table tab[1..i][1..j], such that tab[k][l] contains the numberof events falling in bin k of one variable and bin l of another, this program returns Kendall’s τas tau, its number of standard deviations from zero as z, and its two-sided significance level asprob.















