Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 3
Текст из файла (страница 3)
. . . . . . . .6.2 Constructing Kernels . . . . . . . . . . . .6.3 Radial Basis Function Networks . . . . . .6.3.1 Nadaraya-Watson model . . . . . .6.4 Gaussian Processes . . . . . . . . . . . . .6.4.1 Linear regression revisited . . . . .6.4.2 Gaussian processes for regression .6.4.3 Learning the hyperparameters . .
.6.4.4 Automatic relevance determination6.4.5 Gaussian processes for classification6.4.6 Laplace approximation . . . . . . .6.4.7 Connection to neural networks . . .Exercises . . . . . . . . . . . . . . . . . . . . ..........................................................................................................................................................................291293294299301303304306311312313315319320Sparse Kernel Machines7.1 Maximum Margin Classifiers . . . .7.1.1 Overlapping class distributions7.1.2 Relation to logistic regression7.1.3 Multiclass SVMs .
. . . . . .7.1.4 SVMs for regression . . . . .7.1.5 Computational learning theory7.2 Relevance Vector Machines . . . . .7.2.1 RVM for regression . . . . . .7.2.2 Analysis of sparsity . . . . . .7.2.3 RVM for classification . . . .Exercises . . . . . . . .
. . . . . . . . . ................................................................................................................................................325326331336338339344345345349353357.................................xviiCONTENTS8Graphical Models8.1 Bayesian Networks . . . . . . . . . .
. . . .8.1.1 Example: Polynomial regression . . .8.1.2 Generative models . . . . . . . . . .8.1.3 Discrete variables . . . . . . . . . . .8.1.4 Linear-Gaussian models . . . . . . .8.2 Conditional Independence . . . . . . . . . .8.2.1 Three example graphs . . . . . . . .8.2.2 D-separation . . . . .
. . . . . . . .8.3 Markov Random Fields . . . . . . . . . . .8.3.1 Conditional independence properties .8.3.2 Factorization properties . . . . . . .8.3.3 Illustration: Image de-noising . . . .8.3.4 Relation to directed graphs . . . . . .8.4 Inference in Graphical Models . . . . . . . .8.4.1 Inference on a chain . .
. . . . . . .8.4.2 Trees . . . . . . . . . . . . . . . . .8.4.3 Factor graphs . . . . . . . . . . . . .8.4.4 The sum-product algorithm . . . . . .8.4.5 The max-sum algorithm . . . . . . .8.4.6 Exact inference in general graphs . .8.4.7 Loopy belief propagation . . . . . . .8.4.8 Learning the graph structure . .
. . .Exercises . . . . . . . . . . . . . . . . . . . . . .....................................................................................................................................................................................................................................................................................359360362365366370372373378383383384387390393394398399402411416417418418Mixture Models and EM9.1 K-means Clustering .
. . . . . . . . . . . .9.1.1 Image segmentation and compression9.2 Mixtures of Gaussians . . . . . . . . . . . .9.2.1 Maximum likelihood . . . . . . . . .9.2.2 EM for Gaussian mixtures . . . . . .9.3 An Alternative View of EM . . . . . . . . .9.3.1 Gaussian mixtures revisited . . . . .9.3.2 Relation to K-means . . . . . . . . .9.3.3 Mixtures of Bernoulli distributions . .9.3.4 EM for Bayesian linear regression . .9.4 The EM Algorithm in General . . . . . .
. .Exercises . . . . . . . . . . . . . . . . . . . . . .................................................................................................................................................42342442843043243543944144344444845045510 Approximate Inference10.1 Variational Inference . . . . . . . . . . . . . .10.1.1 Factorized distributions . . . . . . . . .10.1.2 Properties of factorized approximations10.1.3 Example: The univariate Gaussian . .
.10.1.4 Model comparison . . . . . . . . . . .10.2 Illustration: Variational Mixture of Gaussians ...................................................................4614624644664704734749xviiiCONTENTS10.2.1 Variational distribution . . . . . . . . .10.2.2 Variational lower bound . . . . . . . .10.2.3 Predictive density . . . . . .
. . . . . .10.2.4 Determining the number of components10.2.5 Induced factorizations . . . . . . . . .10.3 Variational Linear Regression . . . . . . . . .10.3.1 Variational distribution . . . . . . . . .10.3.2 Predictive distribution . . . .
. . . . .10.3.3 Lower bound . . . . . . . . . . . . . .10.4 Exponential Family Distributions . . . . . . .10.4.1 Variational message passing . . . . . .10.5 Local Variational Methods . . . . . . . . . . .10.6 Variational Logistic Regression . . . . . . . .10.6.1 Variational posterior distribution . . . .10.6.2 Optimizing the variational parameters .10.6.3 Inference of hyperparameters .
. . . .10.7 Expectation Propagation . . . . . . . . . . . .10.7.1 Example: The clutter problem . . . . .10.7.2 Expectation propagation on graphs . . .Exercises . . . . . . . . . . . . . . . . . . . . . . .............................................................................................................................................................................................................................47548148248348548648648848949049149349849850050250551151351711 Sampling Methods11.1 Basic Sampling Algorithms .
. . . . . . .11.1.1 Standard distributions . . . . . . .11.1.2 Rejection sampling . . . . . . . . .11.1.3 Adaptive rejection sampling . . . .11.1.4 Importance sampling . . . . . . . .11.1.5 Sampling-importance-resampling .11.1.6 Sampling and the EM algorithm . .11.2 Markov Chain Monte Carlo . . . . . . . .11.2.1 Markov chains . . . . . . . . . .
.11.2.2 The Metropolis-Hastings algorithm11.3 Gibbs Sampling . . . . . . . . . . . . . .11.4 Slice Sampling . . . . . . . . . . . . . . .11.5 The Hybrid Monte Carlo Algorithm . . . .11.5.1 Dynamical systems . . . . . . . . .11.5.2 Hybrid Monte Carlo . . . . . . . .11.6 Estimating the Partition Function . . .
. .Exercises . . . . . . . . . . . . . . . . . . . . ..............................................................................................................................................................................................................................52352652652853053253453653753954154254654854855255455612 Continuous Latent Variables12.1 Principal Component Analysis . .
. . .12.1.1 Maximum variance formulation12.1.2 Minimum-error formulation . .12.1.3 Applications of PCA . . . . . .12.1.4 PCA for high-dimensional data.................................................................559561561563565569..........xixCONTENTS12.2 Probabilistic PCA . . . . . . . . . . .12.2.1 Maximum likelihood PCA . . .12.2.2 EM algorithm for PCA . . .
. .12.2.3 Bayesian PCA . . . . . . . . .12.2.4 Factor analysis . . . . . . . . .12.3 Kernel PCA . . . . . . . . . . . . . . .12.4 Nonlinear Latent Variable Models . . .12.4.1 Independent component analysis12.4.2 Autoassociative neural networks12.4.3 Modelling nonlinear manifolds .Exercises .
. . . . . . . . . . . . . . . . . ......................................................................................................................................................................57057457758058358659159159259559913 Sequential Data13.1 Markov Models . . . . . . . . .
. . . . . . . . .13.2 Hidden Markov Models . . . . . . . . . . . . .13.2.1 Maximum likelihood for the HMM . . .13.2.2 The forward-backward algorithm . . . .13.2.3 The sum-product algorithm for the HMM13.2.4 Scaling factors . . . . . . . . . . . . . .13.2.5 The Viterbi algorithm .
. . . . . . . . . .13.2.6 Extensions of the hidden Markov model .13.3 Linear Dynamical Systems . . . . . . . . . . . .13.3.1 Inference in LDS . . . . . . . . . . . . .13.3.2 Learning in LDS . . . . . . . . . . . . .13.3.3 Extensions of LDS . . . . . . . . . . . .13.3.4 Particle filters .
. . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . .............................................................................................................................................60560761061561862562762963163563864264464564614 Combining Models14.1 Bayesian Model Averaging . . . . .
. . . . .14.2 Committees . . . . . . . . . . . . . . . . . .14.3 Boosting . . . . . . . . . . . . . . . . . . .14.3.1 Minimizing exponential error . . . .14.3.2 Error functions for boosting . . . . .14.4 Tree-based Models . . . . . . . . . . . . . .14.5 Conditional Mixture Models . . . . . . .
. .14.5.1 Mixtures of linear regression models .14.5.2 Mixtures of logistic models . . . . .14.5.3 Mixtures of experts . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . ...............................................................................................................653654655657659661663666667670672674......................Appendix AData Sets677Appendix BProbability Distributions685Appendix CProperties of Matrices695xxCONTENTSAppendix DCalculus of Variations703Appendix ELagrange Multipliers707References711Index7291IntroductionThe problem of searching for patterns in data is a fundamental one and has a long andsuccessful history.