Hartl, Jones - Genetics. Principlers and analysis - 1998 (522927), страница 40
Текст из файла (страница 40)
In the example using flower color, the genetic hypothesis implies that the genotypes in the cross purple ×white could be symbolized as Pp × pp. The possible progeny genotypes are either Pp or pp.2. Use the rules of probability to make explicit predictions of the types and proportions of progeny that should beobserved if the genetic hypothesis is true. Convert the proportions to numbers of progeny (percentages are notallowed in a X2 test). If the hypothesis about the flower-color cross is true, then we should expect the progenygenotypes Pp and pp to occur in a ratio of 1 : 1.
Because the hypothesis is that Pp flowers are purple and ppflowers are white, we expect the phenotypes of the progeny to be purple or white in the ratio 1 : 1. Among 20progeny, the expected numbers are 10 purple and 10 white.Page 1103. For each class of progeny in turn, subtract the expected number from the observed number. Square thisdifference and divide the result by the expected number.
In our example, the calculation for the purple progeny is(14—10)2/10 = 1.6, and that for the white progeny is (6–10)2/10 = 1.6.4. Sum the result of the numbers calculated in step 3 for all classes of progeny. The summation is the value ofX2 for these data. The sum for the purple and white classes of progeny is 1.6 + 1.6 = 3.2, and this is the value of X2for the experiment, calculated on the assumption that our genetic hypothesis is correct.In symbols, the calculation of X2 can be represented by the expressionin which Σ means the summation over all the classes of progeny.
Note that X2 is calculated using the observed andexpected numbers, not the proportions, ratios, or percentages. Using something other than the actual numbers is themost common beginner's mistake in applying the X2 method. The X2 value is reasonable as a measure of goodnessof fit, because the closer the observed numbers are to the expected numbers, the smaller the value of X2. A value ofX2 = 0 means that the observed numbers fit the expected numbers perfectly.As another example of the calculation of X2, suppose that the progeny of an F1 × F1 cross includes two contrastingphenotypes observed in the numbers 99 andTable 3.2 Calculation of X2 for a monohybrid ratioPhenotype Observed Expected(class)numbernumberDeviationfromexpectedWildtype99108-90.75Mutant4536+92.25Total144144x2 = 3.0045.
In this case the genetic hypothesis might be that the trait is determined by a pair of alleles of a single gene, inwhich case the expected ratio of dominant: recessive phenotypes among the F2 progeny is 3 : 1. Considering thedata, the question is whether the observed ratio of 99 : 45 is in satisfactory agreement with the expected 3 : 1.Calculation of the value of X2 is illustrated in Table 3.2. The total number of progeny is 99 + 45 = 144. Theexpected numbers in the two classes, on the basis of the genetic hypothesis that the true ratio is 3 : 1, are calculatedas (3/4) × 144 = 108 and (1/4) × 144 = 36. Because there are two classes of data, there are two terms in the X2calculation:Once the X2 value has been calculated, the next step is to interpret whether this value represents a good fit or a badfit to the expected numbers.
This assessment is done with the aid of the graphs in Figure 3.20. The x-axis gives theX2 values that reflect goodness of fit, and the y-axis gives the probability P that a worse fit (or one equally bad)would be obtained by chance, assuming that the genetic hypothesis is true. If the genetic hypothesis is true, then theobserved numbers should be reasonably close to the expected numbers. Suppose that the observed X2 is so largethat the probability of a fit as bad or worse is very small. Then the observed results do not fit the theoreticalexpectations. This means that the genetic hypothesis used to calculate the expected numbers of progeny must berejected, because the observed numbers of progeny deviate too much from the expected numbers.In practice, the critical values of P are conventionally chosen as 0.05 (the 5 percent level) and 0.01 (the 1 percentlevel).
For P values ranging from 0.01 to 0.05, the probability that chance alone would lead to a fit as bad or worseis between 1 in 20 experiments and between 1 in 100, respectively. This is the purple region in Figure 3.20; if the Pvalue falls in this range, thePage 111correctness of the genetic hypothesis is considered very doubtful. The result is said to be significant at the 5percent level. For P values smaller than 0.01, the probability that chance alone would lead to a fit as bad or worseis less than 1 in 100 experiments. This is the green region in Figure 3.20; in this case, the result is said to be highlysignificant at the 1 percent level, and the genetic hypothesis is rejected outright. If the terminology of statisticalsignificance seems backward, it is because the term ''significant" refers to the magnitude of the deviation betweenthe observed and the expected numbers; in a result that is statistically significant, there is a large ("significant")difference between what is observed and what is expected.To use Figure 3.20 to determine the P value corresponding to a calculated X2, weFigure 3.20Graphs for interpreting goodness of fit to genetic predictions using the chi-square test.
For anycalculated value of X2 along the x-axis, the y-axis gives the probability P that chance alone wouldproduce a fit as bad as or worse than that actually observed, when the genetic predictions arecorrect. Tests with P in the purple region (less than 5 percent) or in the green region (less than 1percent) are regarded as statistically significant and normally require rejection of the genetichypothesis leading to the prediction. Each X2 test has a number of degrees of freedom associatedwith it.
In the tests illustrated in this chapter, the number of degrees of freedom equals the numberof classes in the data minus 1.Page 112Connection The Case Against Mendel's GardenerRonald Aylmer Fisher 1936University College, London, EnglandHas Mendel's Work Been Rediscovered?R. A. Fisher, one of the founders of modern statistics, was also interested in genetics. He gave Mendel'sdata a thorough going over and made an "abominable discovery." Fisher's unpleasant discovery was thatsome of Mendel's experiments yielded a better fit to the wrong expected values than they did to the rightexpected values.
At issue are two series of experiments consisting of progeny tests in which F2 plants withthe dominant phenotype were self-fertilized and their progeny examined for segregation to ascertainwhether each parent was heterozygous or homozygous. In the first series of experiments, Mendel explicitlystates that the cultivated 10 seeds from each plant. What Mendel did not realize, apparently, is thatinferring the genotype of the parent on the basis of the phenotypes of 10 progeny introduces a slight bias.The reason is shown in the accompanying illustration. Because a fraction (3/4)10 of all progenies from aheterozygous parent will not exhibit segregation, purely as a result of chance, this proportion of Aaparents gets misclassified as AA.
The expected proportion of "apparent" AA plants is (1/3) + (2/3)(3/4)10and that of Aa plants is (2/3)[1–(3/4)10], for a ratio of 0.37: 0.63. In the first series of experiments, among600 plants tested, Mendel reports a ratio of 0.335: 0.665, which is in better agreement with the incorrectexpectation of 0.33 : 0.67 than with 0.37 : 0.63. In the second series of experiments, among 473 progeny,Mendel reports a ratio of 0.32 : 0.68, which is again in better agreement with 0.33 : 0.67 thanThe reconstruction [of Mendel's experiments] gives no doubt whatever that his report is to be takenentirely literally, and that his experiments were carried out in just the way and much in the order that theyare recounted.with 0.37 : 0.63.
This is the "abominable discovery." The reported data differ highly significantly from thetrue expectation. How could this be? Fisher suggested that Mendel may have been deceived by anoverzealous assistant. Mendel did have a gardener who tended the fruit orchards, a man described asuntrustworthy and excessively fond of alcohol, and Mendel was also assisted in his pea experiments bytwo fellow monks. Another possibility, also suggested by Fisher, is that in the second series ofexperiments, Mendel cultivated more than 10 seeds from each plant.
(Mendel does not specify how manyseeds were tested from each plant in the second series.) If he cultivated 15 seeds per plant, rather than 10,then the data are no longer statistically significant and the insinuation of data tampering evaporates.In connection with these tests of homozygosity by examining ten offspring formed by self-fertilization, itis disconcerting to find that the proportion of plants misclassified by this test is not inappreciable.Between 5 and 6 percent of the heterozygous plants will be classified as homozygous. . . .
Now among600 plants tested by Mendel 201 were classified as homozygous and 399 as heterozygous. . . . Thedeviation [from the true expected values of 222 and 378] is one to be taken seriously. . . . A deviation asfortunate as Mendel's is to be expected once in twentynine trials. . . . [In the second series of experiments],a total deviation of the magnitude observed, and in the right direction, is only to be expected once in 444trials; there is therefore a serious discrepancy. . .
. If we could suppose that larger progenies, say fifteenplants, were grown on this occasion, the greater part of the discrepancy would be removed. . . . Such anexplanation, however, could not explain the discrepancy observed in the first group of experiments, inwhich the procedure is specified, without the occurrence of a coincidence of considerable(text box continued on next page)need the number of degrees of freedom of the particular X2 test. For the type of X2 test illustrated in Table 3.2, thenumber of degrees of freedom equals the number of classes of data minus 1. Table 3.2 contains two classes of data(wildtype and mutant), so the number of degrees of freedom is 2 — 1 = 1.
The reason for subtracting 1 is that, incalculating the expected numbers of progeny, we make sure that the total number of progeny is the same as thatactually observed. For this reason, one of the classes of data is not really "free" to contain any number we mightspecify; because the expected number in one class must be adjusted to make the total come out correctly, one"degree of freedom" is lost.
Analogous X2 tests with three classes of data have 2 degrees of freedom, and thosewith four classes of data have 3 degrees of freedom.Once we have decided the appropriate number of degrees of freedom, we can interpret the X2 value in Table 3.2.Refer toPage 113(text box continues from previous page)improbability. . . . The reconstruction [of Mendel's experiments] gives no doubt whatever that his report is to betaken entirely literally, and that his experiments were carried out in just the way and much in the order that theyare recounted. The detailed reconstruction of his programme on this assumption leads to no discrepancywhatsoever.