c14-3 (779578)

Файл №779578 c14-3 (Numerical Recipes in C)c14-3 (779578)2017-12-272017-12-27СтудИзба

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла

620Chapter 14.Statistical Description of Data14.3 Are Two Distributions Different?Chi-Square TestSuppose that Ni is the number of events observed in the ith bin, and that ni isthe number expected according to some known distribution. Note that the Ni ’s areSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited.

To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).Given two sets of data, we can generalize the questions asked in the previoussection and ask the single question: Are the two sets drawn from the same distributionfunction, or from different distribution functions? Equivalently, in proper statisticallanguage, “Can we disprove, to a certain required level of significance, the nullhypothesis that two data sets are drawn from the same population distributionfunction?” Disproving the null hypothesis in effect proves that the data sets are fromdifferent distributions. Failing to disprove the null hypothesis, on the other hand,only shows that the data sets can be consistent with a single distribution function.One can never prove that two data sets come from a single distribution, since (e.g.)no practical amount of data can distinguish between two distributions which differonly by one part in 1010 .Proving that two distributions are different, or showing that they are consistent,is a task that comes up all the time in many areas of research: Are the visible starsdistributed uniformly in the sky? (That is, is the distribution of stars as a functionof declination — position in the sky — the same as the distribution of sky area asa function of declination?) Are educational patterns the same in Brooklyn as in theBronx? (That is, are the distributions of people as a function of last-grade-attendedthe same?) Do two brands of fluorescent lights have the same distribution ofburn-out times? Is the incidence of chicken pox the same for first-born, second-born,third-born children, etc.?These four examples illustrate the four combinations arising from two differentdichotomies: (1) The data are either continuous or binned.

(2) Either we wish tocompare one data set to a known distribution, or we wish to compare two equallyunknown data sets. The data sets on fluorescent lights and on stars are continuous,since we can be given lists of individual burnout times or of stellar positions. Thedata sets on chicken pox and educational level are binned, since we are giventables of numbers of events in discrete categories: first-born, second-born, etc.; or6th Grade, 7th Grade, etc. Stars and chicken pox, on the other hand, share theproperty that the null hypothesis is a known distribution (distribution of area in thesky, or incidence of chicken pox in the general population).

Fluorescent lights andeducational level involve the comparison of two equally unknown data sets (the twobrands, or Brooklyn and the Bronx).One can always turn continuous data into binned data, by grouping the eventsinto specified ranges of the continuous variable(s): declinations between 0 and 10degrees, 10 and 20, 20 and 30, etc. Binning involves a loss of information, however.Also, there is often considerable arbitrariness as to how the bins should be chosen.Along with many other investigators, we prefer to avoid unnecessary binning of data.The accepted test for differences between binned distributions is the chi-squaretest. For continuous data as a function of a single variable, the most generallyaccepted test is the Kolmogorov-Smirnov test.

We consider each in turn.14.3 Are Two Distributions Different?621integers, while the ni ’s may not be. Then the chi-square statistic isχ2 =X (Ni − ni )2ini(14.3.1)void chsone(float bins[], float ebins[], int nbins, int knstrn, float *df,float *chsq, float *prob)Given the array bins[1..nbins] containing the observed numbers of events, and an arrayebins[1..nbins] containing the expected numbers of events, and given the number of constraints knstrn (normally one), this routine returns (trivially) the number of degrees of freedomdf, and (nontrivially) the chi-square chsq and the significance prob. A small value of probindicates a significant difference between the distributions bins and ebins.

Note that binsand ebins are both float arrays, although bins will normally contain integer values.{float gammq(float a, float x);void nrerror(char error_text[]);int j;float temp;*df=nbins-knstrn;*chsq=0.0;for (j=1;j<=nbins;j++) {if (ebins[j] <= 0.0) nrerror("Bad expected number in chsone");temp=bins[j]-ebins[j];*chsq += temp*temp/ebins[j];}*prob=gammq(0.5*(*df),0.5*(*chsq));Chi-square probability function. See §6.2.}Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.

Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).where the sum is over all bins. A large value of χ2 indicates that the null hypothesis(that the Ni ’s are drawn from the population represented by the ni ’s) is rather unlikely.Any term j in (14.3.1) with 0 = nj = Nj should be omitted from the sum.

Aterm with nj = 0, Nj 6= 0 gives an infinite χ2 , as it should, since in this case theNi ’s cannot possibly be drawn from the ni ’s!The chi-square probability function Q(χ2 |ν) is an incomplete gamma function,and was already discussed in §6.2 (see equation 6.2.18). Strictly speaking Q(χ2 |ν)is the probability that the sum of the squares of ν random normal variables of unitvariance (and zero mean) will be greater than χ2 . The terms in the sum (14.3.1)are not individually normal. However, if either the number of bins is large ( 1),or the number of events in each bin is large ( 1), then the chi-square probabilityfunction is a good approximation to the distribution of (14.3.1) in the case of the nullhypothesis. Its use to estimate the significance of the chi-square test is standard.The appropriate value of ν, the number of degrees of freedom, bears someadditional discussion.

If the data are collected with the model ni ’s fixed — thatis, not later renormalized to fit the total observed number of events ΣNi — then νequals the number of bins NB . (Note that this is not the total number of events!)Much more commonly, the ni ’s are normalized after the fact so that their sum equalsthe sum of the Ni ’s. In this case the correct value for ν is NB − 1, and the modelis said to have one constraint (knstrn=1 in the program below). If the model thatgives the ni ’s has additional free parameters that were adjusted after the fact to agreewith the data, then each of these additional “fitted” parameters decreases ν (andincreases knstrn) by one additional unit.We have, then, the following program:622Chapter 14.Statistical Description of DataNext we consider the case of comparing two binned data sets.

Let Ri be thenumber of events in bin i for the first data set, Si the number of events in the samebin i for the second data set. Then the chi-square statistic isχ2 =X (Ri − Si )2Ri + Si(14.3.2)Comparing (14.3.2) to (14.3.1), you should note that the denominator of (14.3.2) isnot just the average of Ri and Si (which would be an estimator of ni in 14.3.1).Rather, it is twice the average, the sum. The reason is that each term in a chi-squaresum is supposed to approximate the square of a normally distributed quantity withunit variance. The variance of the difference of two normal quantities is the sumof their individual variances, not the average.If the data were collected in such a way that the sum of the Ri ’s is necessarilyequal to the sum of Si ’s, then the number of degrees of freedom is equal to oneless than the number of bins, NB − 1 (that is, knstrn = 1), the usual case. Ifthis requirement were absent, then the number of degrees of freedom would be NB .Example: A birdwatcher wants to know whether the distribution of sighted birdsas a function of species is the same this year as last.

Each bin corresponds to onespecies. If the birdwatcher takes his data to be the first 1000 birds that he saw ineach year, then the number of degrees of freedom is NB − 1. If he takes his data tobe all the birds he saw on a random sample of days, the same days in each year, thenthe number of degrees of freedom is NB (knstrn = 0). In this latter case, note thathe is also testing whether the birds were more numerous overall in one year or theother: That is the extra degree of freedom. Of course, any additional constraints onthe data set lower the number of degrees of freedom (i.e., increase knstrn to morepositive values) in accordance with their number.The program isvoid chstwo(float bins1[], float bins2[], int nbins, int knstrn, float *df,float *chsq, float *prob)Given the arrays bins1[1..nbins] and bins2[1..nbins], containing two sets of binneddata, and given the number of constraints knstrn (normally 1 or 0), this routine returns thenumber of degrees of freedom df, the chi-square chsq, and the significance prob.

A small valueof prob indicates a significant difference between the distributions bins1 and bins2. Note thatbins1 and bins2 are both float arrays, although they will normally contain integer values.{float gammq(float a, float x);int j;float temp;*df=nbins-knstrn;*chsq=0.0;for (j=1;j<=nbins;j++)if (bins1[j] == 0.0 && bins2[j] == 0.0)--(*df);No data means one less degree of freeelse {dom.temp=bins1[j]-bins2[j];*chsq += temp*temp/(bins1[j]+bins2[j]);}*prob=gammq(0.5*(*df),0.5*(*chsq));Chi-square probability function. See §6.2.}Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited.

To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).i14.3 Are Two Distributions Different?623Equation (14.3.2) and the routine chstwo both apply to the case where the totalnumber of data points is the same in the two binned sets. For unequal numbers ofdata points, the formula analogous to (14.3.2) is2χ =ppX ( S/RRi − R/SSi )2(14.3.3)whereR≡XS≡RiiXSi(14.3.4)iare the respective numbers of data points.corresponding change in chstwo.It is straightforward to make theKolmogorov-Smirnov TestThe Kolmogorov-Smirnov (or K–S) test is applicable to unbinned distributionsthat are functions of a single independent variable, that is, to data sets where eachdata point can be associated with a single number (lifetime of each lightbulb whenit burns out, or declination of each star).

Характеристики

Тип файла

PDF-файл

Размер

209,05 Kb

Материал

Numerical Recipes in C

Тип материала

Книга

Предмет

Цифровая обработка сигналов (ЦОС)

Высшее учебное заведение

МГТУ им. Н.Э.Баумана

Тип файла PDF

PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.

Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.

Список файлов книги

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.