c14-5 (779580)
Текст из файла
636Chapter 14.Statistical Description of DataNorusis, M.J. 1982, SPSS Introductory Guide: Basic Statistics and Operations; and 1985, SPSSX Advanced Statistics Guide (New York: McGraw-Hill).Fano, R.M. 1961, Transmission of Information (New York: Wiley and MIT Press), Chapter 2.We next turn to measures of association between variables that are ordinalor continuous, rather than nominal. Most widely used is the linear correlationcoefficient. For pairs of quantities (xi , yi ), i = 1, . .
. , N , the linear correlationcoefficient r (also called the product-moment correlation coefficient, or Pearson’sr) is given by the formulaP(xi − x)(yi − y)rPr = rP i(14.5.1)(xi − x)2(yi − y)2iiwhere, as usual, x is the mean of the xi ’s, y is the mean of the yi ’s.The value of r lies between −1 and 1, inclusive. It takes on a value of 1, termed“complete positive correlation,” when the data points lie on a perfect straight linewith positive slope, with x and y increasing together. The value 1 holds independentof the magnitude of the slope. If the data points lie on a perfect straight line withnegative slope, y decreasing as x increases, then r has the value −1; this is called“complete negative correlation.” A value of r near zero indicates that the variablesx and y are uncorrelated.When a correlation is known to be significant, r is one conventional way ofsummarizing its strength.
In fact, the value of r can be translated into a statementabout what residuals (root mean square deviations) are to be expected if the data arefitted to a straight line by the least-squares method (see §15.2, especially equations15.2.13 – 15.2.14). Unfortunately, r is a rather poor statistic for deciding whetheran observed correlation is statistically significant, and/or whether one observedcorrelation is significantly stronger than another.
The reason is that r is ignorant ofthe individual distributions of x and y, so there is no universal way to compute itsdistribution in the case of the null hypothesis.About the only general statement that can be made is this: If the null hypothesisis that x and y are uncorrelated, and if the distributions for x and y each haveenough convergent moments (“tails” die off sufficiently rapidly), and if N is large(typically > 500), then r is distributedapproximately normally, with a mean of zero√and a standard deviation of 1/ N . In that case, the (double-sided) significance ofthe correlation, that is, the probability that |r| should be larger than its observedvalue in the null hypothesis, is√ !|r| N√(14.5.2)erfc2where erfc(x) is the complementary error function, equation (6.2.8), computed bythe routines erffc or erfcc of §6.2.
A small value of (14.5.2) indicates that theSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).14.5 Linear Correlation14.5 Linear Correlation637where a11 , a12 , and a22 are arbitrary constants.
For this distribution r has the valuea12r = −√a11 a22(14.5.4)There are occasions when (14.5.3) may be known to be a good model of thedata. There may be other occasions when we are willing to take (14.5.3) as at leasta rough and ready guess, since many two-dimensional distributions do resemble abinormal distribution, at least not too far out on their tails. In either situation, we canuse (14.5.3) to go beyond (14.5.2) in any of several directions:First, we can allow for the possibility that the number N of data points is notlarge.
Here, it turns out that the statisticrN −2t=r(14.5.5)1 − r2is distributed in the null case (of no correlation) like Student’s t-distribution withν = N − 2 degrees of freedom, whose two-sided significance level is given by1 − A(t|ν) (equation 6.4.7). As N becomes large, this significance and (14.5.2)become asymptotically the same, so that one never does worse by using (14.5.5),even if the binormal assumption is not well substantiated.Second, when N is only moderately large (≥ 10), we can compare whetherthe difference of two significantly nonzero r’s, e.g., from different experiments, isitself significant.
In other words, we can quantify whether a change in some controlvariable significantly alters an existing correlation between two other variables. Thisis done by using Fisher’s z-transformation to associate each measured r with acorresponding z,1+r1(14.5.6)z = ln21−rThen, each z is approximately normally distributed with a mean value rtrue11 + rtrueln+z=21 − rtrueN −1(14.5.7)where rtrue is the actual or population value of the correlation coefficient, and witha standard deviationσ(z) ≈ √1N −3(14.5.8)Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).two distributions are significantly correlated. (See expression 14.5.9 below for amore accurate test.)Most statistics books try to go beyond (14.5.2) and give additional statisticaltests that can be made using r. In almost all cases, however, these tests are validonly for a very special class of hypotheses, namely that the distributions of x and yjointly form a binormal or two-dimensional Gaussian distribution around their meanvalues, with joint probability density122p(x, y) dxdy = const.
× exp − (a11 x − 2a12 xy + a22 y ) dxdy (14.5.3)2638Chapter 14.Statistical Description of DataEquations (14.5.7) and (14.5.8), when they are valid, give several usefulstatistical tests. For example, the significance level at which a measured value of rdiffers from some hypothesized value rtrue is given byerfc(14.5.9)where z and z are given by (14.5.6) and (14.5.7), with small values of (14.5.9)indicating a significant difference. (Setting z = 0 makes expression 14.5.9 a moreaccurate replacement for expression 14.5.2 above.) Similarly, the significance of adifference between two measured correlation coefficients r1 and r2 is−z||z12erfc √ q2 N11−3 + N21−3(14.5.10)where z1 and z2 are obtained from r1 and r2 using (14.5.6), and where N1 and N2are, respectively, the number of data points in the measurement of r1 and r2 .All of the significances above are two-sided.
If you wish to disprove the nullhypothesis in favor of a one-sided hypothesis, such as that r1 > r2 (where the senseof the inequality was decided a priori), then (i) if your measured r1 and r2 havethe wrong sense, you have failed to demonstrate your one-sided hypothesis, but (ii)if they have the right ordering, you can multiply the significances given above by0.5, which makes them more significant.But keep in mind: These interpretations of the r statistic can be completelymeaningless if the joint probability distribution of your variables x and y is toodifferent from a binormal distribution.#include <math.h>#define TINY 1.0e-20Will regularize the unusual case of complete correlation.void pearsn(float x[], float y[], unsigned long n, float *r, float *prob,float *z)Given two arrays x[1..n] and y[1..n], this routine computes their correlation coefficientr (returned as r), the significance level at which the null hypothesis of zero correlation isdisproved (prob whose small value indicates a significant correlation), and Fisher’s z (returnedas z), whose value can be used in further statistical tests as described above.{float betai(float a, float b, float x);float erfcc(float x);unsigned long j;float yt,xt,t,df;float syy=0.0,sxy=0.0,sxx=0.0,ay=0.0,ax=0.0;for (j=1;j<=n;j++) {ax += x[j];ay += y[j];}ax /= n;ay /= n;for (j=1;j<=n;j++) {xt=x[j]-ax;yt=y[j]-ay;sxx += xt*xt;syy += yt*yt;Find the means.Compute the correlation coefficient.Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).√|z − z| N − 3√214.6 Nonparametric or Rank Correlation639CITED REFERENCES AND FURTHER READING:Dunn, O.J., and Clark, V.A. 1974, Applied Statistics: Analysis of Variance and Regression (NewYork: Wiley).Hoel, P.G.
1971, Introduction to Mathematical Statistics, 4th ed. (New York: Wiley), Chapter 7.von Mises, R. 1964, Mathematical Theory of Probability and Statistics (New York: AcademicPress), Chapters IX(A) and IX(B).Korn, G.A., and Korn, T.M. 1968, Mathematical Handbook for Scientists and Engineers, 2nd ed.(New York: McGraw-Hill), §19.7.Norusis, M.J. 1982, SPSS Introductory Guide: Basic Statistics and Operations; and 1985, SPSSX Advanced Statistics Guide (New York: McGraw-Hill).14.6 Nonparametric or Rank CorrelationIt is precisely the uncertainty in interpreting the significance of the linearcorrelation coefficient r that leads us to the important concepts of nonparametric orrank correlation. As before, we are given N pairs of measurements (xi , yi ). Before,difficulties arose because we did not necessarily know the probability distributionfunction from which the xi ’s or yi ’s were drawn.The key concept of nonparametric correlation is this: If we replace the valueof each xi by the value of its rank among all the other xi ’s in the sample, thatis, 1, 2, 3, .
Характеристики
Тип файла PDF
PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.
Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.















