c15-6 (779590), страница 4
Текст из файла (страница 4)
(In the one-dimensional case this was just takingthe reciprocal of the element C11 .)• The equation for the elliptical boundary of your desired confidence regionin the ν-dimensional subspace of interest is∆ = δa0 · [Cproj]−1 · δa00where δa is the ν-dimensional vector of parameters of interest.(15.6.7)Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).68.3%90%95.4%99%99.73%99.99%698Chapter 15.Modeling of Dataa22∆χ=11length w1V(2)1length w2a1Figure 15.6.5. Relation of the confidence region ellipse ∆χ2 = 1 to quantities computed by singularvalue decomposition.
The vectors V(i) are unit vectors along the principal axes of the confidence region.The semi-axes have lengths equal to the reciprocal of the singular values wi . If the axes are all scaledby some constant factor α, ∆χ2 is scaled by the factor α2 .If you are confused at this point, you may find it helpful to compare Figure15.6.4 and the accompanying table, considering the case M = 2 with ν = 1 andν = 2. You should be able to verify the following statements: (i) The horizontalband between C and C 0 contains 99 percent of the probability distribution, so itis a confidence limit on a2 alone at this level of confidence. (ii) Ditto the bandbetween B and B 0 at the 90 percent confidence level. (iii) The dashed ellipse,labeled by ∆χ2 = 2.30, contains 68.3 percent of the probability distribution, so it isa confidence region for a1 and a2 jointly, at this level of confidence.Confidence Limits from Singular Value DecompositionWhen you have obtained your χ2 fit by singular value decomposition (§15.4), theinformation about the fit’s formal errors comes packaged in a somewhat different, butgenerally more convenient, form.
The columns of the matrix V are an orthonormalset of M vectors that are the principal axes of the ∆χ2 = constant ellipsoids.We denote the columns as V(1) . . . V(M ) . The lengths of those axes are inverselyproportional to the corresponding singular values w1 . . . wM ; see Figure 15.6.5. Theboundaries of the ellipsoids are thus given by2(V(M ) · δa)2∆χ2 = w12 (V(1) · δa)2 + · · · + wM(15.6.8)which is the justification for writing equation (15.4.18) above. Keep in mind thatit is much easier to plot an ellipsoid given a list of its vector principal axes, thangiven its matrix quadratic form!The formula for the covariance matrix [C] in terms of the columns V(i) is[C] =MX1V(i) ⊗ V(i)wi2i=1or, in components,(15.6.9)Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).V(1)69915.7 Robust EstimationCjk =MX1Vji Vkiwi2i=1(15.6.10)Efron, B., and Tibshirani, R. 1986, Statistical Science vol. 1, pp.
54–77. [2]Avni, Y. 1976, Astrophysical Journal, vol. 210, pp. 642–646. [3]Lampton, M., Margon, M., and Bowyer, S. 1976, Astrophysical Journal, vol. 208, pp. 177–190.Brownlee, K.A. 1965, Statistical Theory and Methodology, 2nd ed. (New York: Wiley).Martin, B.R. 1971, Statistics for Physicists (New York: Academic Press).15.7 Robust EstimationThe concept of robustness has been mentioned in passing several times already.In §14.1 we noted that the median was a more robust estimator of central value thanthe mean; in §14.6 it was mentioned that rank correlation is more robust than linearcorrelation.
The concept of outlier points as exceptions to a Gaussian model forexperimental error was discussed in §15.1.The term “robust” was coined in statistics by G.E.P. Box in 1953. Variousdefinitions of greater or lesser mathematical rigor are possible for the term, but ingeneral, referring to a statistical estimator, it means “insensitive to small departuresfrom the idealized assumptions for which the estimator is optimized.” [1,2] The word“small” can have two different interpretations, both important: either fractionallysmall departures for all data points, or else fractionally large departures for a smallnumber of data points. It is the latter interpretation, leading to the notion of outlierpoints, that is generally the most stressful for statistical procedures.Statisticians have developed various sorts of robust statistical estimators.
Many,if not most, can be grouped in one of three categories.M-estimates follow from maximum-likelihood arguments very much as equations (15.1.5) and (15.1.7) followed from equation (15.1.3). M-estimates are usuallythe most relevant class for model-fitting, that is, estimation of parameters. Wetherefore consider these estimates in some detail below.L-estimates are “linear combinations of order statistics.” These are mostapplicable to estimations of central value and central tendency, though they canoccasionally be applied to some problems in estimation of parameters. Two“typical” L-estimates will give you the general idea. They are (i) the median, and(ii) Tukey’s trimean, defined as the weighted average of the first, second, and thirdquartile points in a distribution, with weights 1/4, 1/2, and 1/4, respectively.R-estimates are estimates based on rank tests.
For example, the equality orinequality of two distributions can be estimated by the Wilcoxon test of computingthe mean rank of one distribution in a combined sample of both distributions.The Kolmogorov-Smirnov statistic (equation 14.3.6) and the Spearman rank-orderSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited.
To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).CITED REFERENCES AND FURTHER READING:Efron, B. 1982, The Jackknife, the Bootstrap, and Other Resampling Plans (Philadelphia:S.I.A.M.). [1].














