The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf), страница 2
Описание файла
PDF-файл из архива "The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf", который расположен в категории "". Всё это находится в предмете "(ппп соиад) (sas) пакеты прикладных программ для статистической обработки и анализа данных" из 10 семестр (2 семестр магистратуры), которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 2 страницы из PDF
John Kimmel was supportive, patient and helpful at every phase; MaryAnn Brickner and Frank Ganz headed a superbproduction team at Springer. Trevor Hastie would like to thank the statistics department at the University of Cape Town for their hospitality duringthe final stages of this book.
We gratefully acknowledge NSF and NIH fortheir support of this work. Finally, we would like to thank our families andour parents for their love and support.Trevor HastieRobert TibshiraniJerome FriedmanStanford, CaliforniaMay 2001The quiet statisticians have changed our world; not by discovering new facts or technical developments, but by changing theways that we reason, experiment and form our opinions ....–Ian HackingThis is page xiiiPrinter: Opaque thisContentsPreface to the Second EditionviiPreface to the First Editionxi1 Introduction2 Overview of Supervised Learning2.1Introduction . .
. . . . . . . . . . . . . . . . . . .2.2Variable Types and Terminology . . . . . . . . . .2.3Two Simple Approaches to Prediction:Least Squares and Nearest Neighbors . . . . . . .2.3.1Linear Models and Least Squares . . . .2.3.2Nearest-Neighbor Methods . . . . . . . .2.3.3From Least Squares to Nearest Neighbors2.4Statistical Decision Theory . . . .
. . . . . . . . .2.5Local Methods in High Dimensions . . . . . . . . .2.6Statistical Models, Supervised Learningand Function Approximation . . . . . . . . . . . .2.6.1A Statistical Modelfor the Joint Distribution Pr(X, Y ) . . .2.6.2Supervised Learning . . . . . . .
. . . . .2.6.3Function Approximation . . . . . . . . .2.7Structured Regression Models . . . . . . . . . . .2.7.1Difficulty of the Problem . . . . . . . . .1. . . .. . . ...................111114161822. . . .28.....2829293232...........999..........xivContents2.8Classes of Restricted Estimators . . . . .
. . . . . .2.8.1Roughness Penalty and Bayesian Methods2.8.2Kernel Methods and Local Regression . . .2.8.3Basis Functions and Dictionary Methods .2.9Model Selection and the Bias–Variance Tradeoff . .Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . .
. . . . . . . . ...............3 Linear Methods for Regression3.1Introduction . . . . . . . . . . . . . . . . . . . . . . . .3.2Linear Regression Models and Least Squares . . . . . .3.2.1Example: Prostate Cancer . . . . . . . . . . .3.2.2The Gauss–Markov Theorem . . . . . . . .
. .3.2.3Multiple Regressionfrom Simple Univariate Regression . . . . . . .3.2.4Multiple Outputs . . . . . . . . . . . . . . . .3.3Subset Selection . . . . . . . . . . . . . . . . . . . . . .3.3.1Best-Subset Selection . . . . . . . . . . . . . .3.3.2Forward- and Backward-Stepwise Selection . .3.3.3Forward-Stagewise Regression . . . . . . .
. .3.3.4Prostate Cancer Data Example (Continued) .3.4Shrinkage Methods . . . . . . . . . . . . . . . . . . . . .3.4.1Ridge Regression . . . . . . . . . . . . . . . .3.4.2The Lasso . . . . . . . . . . . . . . . . . . . .3.4.3Discussion: Subset Selection, Ridge Regressionand the Lasso . .
. . . . . . . . . . . . . . . .3.4.4Least Angle Regression . . . . . . . . . . . . .3.5Methods Using Derived Input Directions . . . . . . . .3.5.1Principal Components Regression . . . . . . .3.5.2Partial Least Squares . . . . . . . . . . . . . .3.6Discussion: A Comparison of the Selectionand Shrinkage Methods . . .
. . . . . . . . . . . . . . .3.7Multiple Outcome Shrinkage and Selection . . . . . . .3.8More on the Lasso and Related Path Algorithms . . . .3.8.1Incremental Forward Stagewise Regression . .3.8.2Piecewise-Linear Path Algorithms . . . .
. . .3.8.3The Dantzig Selector . . . . . . . . . . . . . .3.8.4The Grouped Lasso . . . . . . . . . . . . . . .3.8.5Further Properties of the Lasso . . . . . . . . .3.8.6Pathwise Coordinate Optimization . . . . . . .3.9Computational Considerations . . . . . . . . . . . . . .Bibliographic Notes . . . .
. . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........33343435373939....4343444951..........52565757586061616168.....6973797980............828486868989909192939494Contents4 Linear Methods for Classification4.1Introduction . . . . . . . . . . . . . . . . . . . . . .
.4.2Linear Regression of an Indicator Matrix . . . . . . .4.3Linear Discriminant Analysis . . . . . . . . . . . . . .4.3.1Regularized Discriminant Analysis . . . . . .4.3.2Computations for LDA . . . . . . . . . . . .4.3.3Reduced-Rank Linear Discriminant Analysis4.4Logistic Regression . .
. . . . . . . . . . . . . . . . . .4.4.1Fitting Logistic Regression Models . . . . . .4.4.2Example: South African Heart Disease . . .4.4.3Quadratic Approximations and Inference . .4.4.4L1 Regularized Logistic Regression . . . . . .4.4.5Logistic Regression or LDA? . . . . . .
. . .4.5Separating Hyperplanes . . . . . . . . . . . . . . . . .4.5.1Rosenblatt’s Perceptron Learning Algorithm4.5.2Optimal Separating Hyperplanes . . . . . . .Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . .
. . . . . . . . . . . . . . . ..................xv.................1011011031061121131131191201221241251271291301321351355 Basis Expansions and Regularization1395.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1395.2Piecewise Polynomials and Splines . . . . . . . . . . . .
. 1415.2.1Natural Cubic Splines . . . . . . . . . . . . . . . 1445.2.2Example: South African Heart Disease (Continued)1465.2.3Example: Phoneme Recognition . . . . . . . . . 1485.3Filtering and Feature Extraction . . . . . . . . . . . . . . 1505.4Smoothing Splines . . . . . .
. . . . . . . . . . . . . . . . 1515.4.1Degrees of Freedom and Smoother Matrices . . . 1535.5Automatic Selection of the Smoothing Parameters . . . . 1565.5.1Fixing the Degrees of Freedom . . . . . . . . . . 1585.5.2The Bias–Variance Tradeoff . . . . . . . . . . . . 1585.6Nonparametric Logistic Regression . . .
. . . . . . . . . . 1615.7Multidimensional Splines . . . . . . . . . . . . . . . . . . 1625.8Regularization and Reproducing Kernel Hilbert Spaces . 1675.8.1Spaces of Functions Generated by Kernels . . . 1685.8.2Examples of RKHS . . . . . . . . . . . . . . . . 1705.9Wavelet Smoothing . .
. . . . . . . . . . . . . . . . . . . 1745.9.1Wavelet Bases and the Wavelet Transform . . . 1765.9.2Adaptive Wavelet Filtering . . . . . . . . . . . . 179Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 181Exercises . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181Appendix: Computational Considerations for Splines . . . . . . 186Appendix: B-splines . . . . . . . . . . . . . . . . . . . . . 186Appendix: Computations for Smoothing Splines . . . .
. 189xviContents6 Kernel Smoothing Methods6.1One-Dimensional Kernel Smoothers . . . . . . . . . . . .6.1.1Local Linear Regression . . . . . . . . . . . . . .6.1.2Local Polynomial Regression . . . . . . . . . . .6.2Selecting the Width of the Kernel . . . . . . . . .
. . . .6.3Local Regression in IRp . . . . . . . . . . . . . . . . . . .6.4Structured Local Regression Models in IRp . . . . . . . .6.4.1Structured Kernels . . . . . . . . . . . . . . . . .6.4.2Structured Regression Functions . . . . . . . . .6.5Local Likelihood and Other Models . . . . . . . . .
. . .6.6Kernel Density Estimation and Classification . . . . . . .6.6.1Kernel Density Estimation . . . . . . . . . . . .6.6.2Kernel Density Classification . . . . . . . . . . .6.6.3The Naive Bayes Classifier . . . . . . . . . . . .6.7Radial Basis Functions and Kernels . . . . . . . .
. . . .6.8Mixture Models for Density Estimation and Classification6.9Computational Considerations . . . . . . . . . . . . . . .Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .1911921941971982002012032032052082082102102122142162162167 Model Assessment and Selection7.1Introduction . . . . . . . . . . . . . . . . . .7.2Bias, Variance and Model Complexity . . . .7.3The Bias–Variance Decomposition . . . . . .7.3.1Example: Bias–Variance Tradeoff .7.4Optimism of the Training Error Rate . .
. .7.5Estimates of In-Sample Prediction Error . . .7.6The Effective Number of Parameters . . . . .7.7The Bayesian Approach and BIC . . . . . . .7.8Minimum Description Length . . . . . . . . .7.9Vapnik–Chervonenkis Dimension . . . . . . .7.9.1Example (Continued) . . . . . . . .7.10 Cross-Validation . . . . . . . . . . .
. . . . .7.10.1 K-Fold Cross-Validation . . . . . .7.10.2 The Wrong and Right Wayto Do Cross-validation . . . . . . . .7.10.3 Does Cross-Validation Really Work?7.11 Bootstrap Methods . . . . . . . . . . . . . .7.11.1 Example (Continued) . . . . . . . .7.12 Conditional or Expected Test Error? . . . . .Bibliographic Notes . .
. . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . ............................................................................................219219219223226228230232233235237239241241.................................................2452472492522542572578 Model Inference and Averaging2618.1Introduction .
. . . . . . . . . . . . . . . . . . . . . . . . 261Contentsxvii8.2The Bootstrap and Maximum Likelihood Methods . . . .8.2.1A Smoothing Example . . . . . . . . . . . . . .8.2.2Maximum Likelihood Inference . . . . . . . . . .8.2.3Bootstrap versus Maximum Likelihood . .
. . .8.3Bayesian Methods . . . . . . . . . . . . . . . . . . . . . .8.4Relationship Between the Bootstrapand Bayesian Inference . . . . . . . . . . . . . . . . . . .8.5The EM Algorithm . . . . . . . . . . . . . . . . . . . . .8.5.1Two-Component Mixture Model . . . . . . . . .8.5.2The EM Algorithm in General . . . . . . . . . .8.5.3EM as a Maximization–Maximization Procedure8.6MCMC for Sampling from the Posterior .