The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 3
Текст из файла (страница 3)
. . . . . . . . .8.7Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.7.1Example: Trees with Simulated Data . . . . . .8.8Model Averaging and Stacking . . . . . . . . . . . . . . .8.9Stochastic Search: Bumping . . . . . . . . . . . . . . . . .Bibliographic Notes . . . . . . . . . . . . . . . . . . . . .
. . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 Additive Models, Trees, and Related Methods9.1Generalized Additive Models . . . . . . . . . . . .9.1.1Fitting Additive Models . . . . . . . . . .9.1.2Example: Additive Logistic Regression .9.1.3Summary . . . . . . . . . . . . . . . .
. .9.2Tree-Based Methods . . . . . . . . . . . . . . . . .9.2.1Background . . . . . . . . . . . . . . . .9.2.2Regression Trees . . . . . . . . . . . . . .9.2.3Classification Trees . . . . . . . . . . . .9.2.4Other Issues . . . . . . . . . .
. . . . . .9.2.5Spam Example (Continued) . . . . . . .9.3PRIM: Bump Hunting . . . . . . . . . . . . . . . .9.3.1Spam Example (Continued) . . . . . . .9.4MARS: Multivariate Adaptive Regression Splines .9.4.1Spam Example (Continued) . . . . . . .9.4.2Example (Simulated Data) . . . . . . . .9.4.3Other Issues . . . . . .
. . . . . . . . . .9.5Hierarchical Mixtures of Experts . . . . . . . . . .9.6Missing Data . . . . . . . . . . . . . . . . . . . . .9.7Computational Considerations . . . . . . . . . . .Bibliographic Notes . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . .
. . . . . . .....................................................................................26126126526726727127227227627727928228328829029229329529529729930430530530730831031331732032132632732832933233433433510 Boosting and Additive Trees33710.1 Boosting Methods . .
. . . . . . . . . . . . . . . . . . . . 33710.1.1 Outline of This Chapter . . . . . . . . . . . . . . 340xviiiContents10.210.310.410.510.610.710.810.910.10Boosting Fits an Additive Model . . . . . . . . . . .Forward Stagewise Additive Modeling . . . . . . . .Exponential Loss and AdaBoost .
. . . . . . . . . .Why Exponential Loss? . . . . . . . . . . . . . . . .Loss Functions and Robustness . . . . . . . . . . . .“Off-the-Shelf” Procedures for Data Mining . . . . .Example: Spam Data . . . . . . . . . . . . . . . . .Boosting Trees . . . . . .
. . . . . . . . . . . . . . .Numerical Optimization via Gradient Boosting . . .10.10.1 Steepest Descent . . . . . . . . . . . . . . .10.10.2 Gradient Boosting . . . . . . . . . . . . . .10.10.3 Implementations of Gradient Boosting . . .10.11 Right-Sized Trees for Boosting . . . . . . . . . . . .10.12 Regularization . . . . . . . . . . . . . . . . . . . .
.10.12.1 Shrinkage . . . . . . . . . . . . . . . . . . .10.12.2 Subsampling . . . . . . . . . . . . . . . . .10.13 Interpretation . . . . . . . . . . . . . . . . . . . . .10.13.1 Relative Importance of Predictor Variables10.13.2 Partial Dependence Plots . . . .
. . . . . .10.14 Illustrations . . . . . . . . . . . . . . . . . . . . . . .10.14.1 California Housing . . . . . . . . . . . . . .10.14.2 New Zealand Fish . . . . . . . . . . . . . .10.14.3 Demographics Data . . . . . . . . . . . . .Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . .
. . . . . . . . . . . . . . . . . . . . ............................................................................34134234334534635035235335835835936036136436436536736736937137137537938038411 Neural Networks11.1 Introduction . . . . . . . . . . . . . . . . . . . . . .11.2 Projection Pursuit Regression . . . . .
. . . . . . .11.3 Neural Networks . . . . . . . . . . . . . . . . . . . .11.4 Fitting Neural Networks . . . . . . . . . . . . . . . .11.5 Some Issues in Training Neural Networks . . . . . .11.5.1 Starting Values . . . . . . . . . . . . . .
. .11.5.2 Overfitting . . . . . . . . . . . . . . . . . .11.5.3 Scaling of the Inputs . . . . . . . . . . . .11.5.4 Number of Hidden Units and Layers . . . .11.5.5 Multiple Minima . . . . . . . . . . . . . . .11.6 Example: Simulated Data . . . . . . . . . . . . . . .11.7 Example: ZIP Code Data . . . . .
. . . . . . . . . .11.8 Discussion . . . . . . . . . . . . . . . . . . . . . . .11.9 Bayesian Neural Nets and the NIPS 2003 Challenge11.9.1 Bayes, Boosting and Bagging . . . . . . . .11.9.2 Performance Comparisons . . . . . . . . .11.10 Computational Considerations . . . . . . . . . . . .Bibliographic Notes .
. . . . . . . . . . . . . . . . . . . . .......................................................389389389392395397397398398400400401404408409410412414415ContentsxixExercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 41512 Support Vector Machines andFlexible Discriminants12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .12.2 The Support Vector Classifier . . . . . . . . . . . . . . .12.2.1 Computing the Support Vector Classifier . . .12.2.2 Mixture Example (Continued) . . . . .
. . . .12.3 Support Vector Machines and Kernels . . . . . . . . . .12.3.1 Computing the SVM for Classification . . . . .12.3.2 The SVM as a Penalization Method . . . . . .12.3.3 Function Estimation and Reproducing Kernels12.3.4 SVMs and the Curse of Dimensionality . . . .12.3.5 A Path Algorithm for the SVM Classifier . . .12.3.6 Support Vector Machines for Regression . . . .12.3.7 Regression and Kernels . .
. . . . . . . . . . .12.3.8 Discussion . . . . . . . . . . . . . . . . . . . .12.4 Generalizing Linear Discriminant Analysis . . . . . . .12.5 Flexible Discriminant Analysis . . . . . . . . . . . . . .12.5.1 Computing the FDA Estimates . . . . . . . . .12.6 Penalized Discriminant Analysis . . . . . . . . . . .
. .12.7 Mixture Discriminant Analysis . . . . . . . . . . . . . .12.7.1 Example: Waveform Data . . . . . . . . . . . .Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 Prototype Methods and Nearest-Neighbors13.1 Introduction . . . . .
. . . . . . . . . . . . . . .13.2 Prototype Methods . . . . . . . . . . . . . . . .13.2.1 K-means Clustering . . . . . . . . . . .13.2.2 Learning Vector Quantization . . . . .13.2.3 Gaussian Mixtures . . . . . . . . . . . .13.3 k-Nearest-Neighbor Classifiers . . . .
. . . . . .13.3.1 Example: A Comparative Study . . . .13.3.2 Example: k-Nearest-Neighborsand Image Scene Classification . . . . .13.3.3 Invariant Metrics and Tangent Distance13.4 Adaptive Nearest-Neighbor Methods . . . . . . .13.4.1 Example . . . . . . . . . . . . . . . . .13.4.2 Global Dimension Reductionfor Nearest-Neighbors . . . . . . . .
. .13.5 Computational Considerations . . . . . . . . . .Bibliographic Notes . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . ......................417417417420421423423426428431432434436438438440444446449451455455...................................459459459460462463463468....................470471475478....................479480481481xxContents14 Unsupervised Learning14.1 Introduction . .
. . . . . . . . . . . . . . . . . . . . . .14.2 Association Rules . . . . . . . . . . . . . . . . . . . . .14.2.1 Market Basket Analysis . . . . . . . . . . . . .14.2.2 The Apriori Algorithm . . . . . . . . . . . . .14.2.3 Example: Market Basket Analysis . . . . . . .14.2.4 Unsupervised as Supervised Learning . . . . .14.2.5 Generalized Association Rules . . . . . . . . .14.2.6 Choice of Supervised Learning Method . .
. .14.2.7 Example: Market Basket Analysis (Continued)14.3 Cluster Analysis . . . . . . . . . . . . . . . . . . . . . .14.3.1 Proximity Matrices . . . . . . . . . . . . . . .14.3.2 Dissimilarities Based on Attributes . . . . . .14.3.3 Object Dissimilarity . . . . . . . . . . . . . . .14.3.4 Clustering Algorithms . . . . . .
. . . . . . . .14.3.5 Combinatorial Algorithms . . . . . . . . . . .14.3.6 K-means . . . . . . . . . . . . . . . . . . . . .14.3.7 Gaussian Mixtures as Soft K-means Clustering14.3.8 Example: Human Tumor Microarray Data . .14.3.9 Vector Quantization . . . . . . . . . . . . . . .14.3.10 K-medoids . . . . . . . . . .
. . . . . . . . . .14.3.11 Practical Issues . . . . . . . . . . . . . . . . .14.3.12 Hierarchical Clustering . . . . . . . . . . . . .14.4 Self-Organizing Maps . . . . . . . . . . . . . . . . . . .14.5 Principal Components, Curves and Surfaces . . . . . .
.14.5.1 Principal Components . . . . . . . . . . . . . .14.5.2 Principal Curves and Surfaces . . . . . . . . .14.5.3 Spectral Clustering . . . . . . . . . . . . . . .14.5.4 Kernel Principal Components . . . . . . . . . .14.5.5 Sparse Principal Components . . . .
. . . . . .14.6 Non-negative Matrix Factorization . . . . . . . . . . . .14.6.1 Archetypal Analysis . . . . . . . . . . . . . . .14.7 Independent Component Analysisand Exploratory Projection Pursuit . . . . . . . . . . .14.7.1 Latent Variables and Factor Analysis .
. . . .14.7.2 Independent Component Analysis . . . . . . .14.7.3 Exploratory Projection Pursuit . . . . . . . . .14.7.4 A Direct Approach to ICA . . . . . . . . . . .14.8 Multidimensional Scaling . . . . . . . . . . . . . . . . .14.9 Nonlinear Dimension Reductionand Local Multidimensional Scaling . . . .
. . . . . . .14.10 The Google PageRank Algorithm . . . . . . . . . . . .Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................................485485487488489492495497499499501503503505507507509510512514515518520528534534541544547550553554......557558560565565570....572576578579Contents15 Random Forests15.1 Introduction . . . . . .