The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 42
Текст из файла (страница 42)
It shares many properties with theone-dimensional cubic smoothing spline:• as λ → 0, the solution approaches an interpolating function [the onewith smallest penalty (5.38)];• as λ → ∞, the solution approaches the least squares plane;• for intermediate values of λ, the solution can be represented as alinear expansion of basis functions, whose coefficients are obtainedby a form of generalized ridge regression.The solution has the formf (x) = β0 + β T x +NXαj hj (x),(5.39)j=1where hj (x) = ||x − xj ||2 log ||x − xj ||.
These hj are examples of radialbasis functions, which are discussed in more detail in the next section. Thecoefficients are found by plugging (5.39) into (5.37), which reduces to afinite-dimensional penalized least squares problem. For the penalty to befinite, the coefficients αj have to satisfy a set of linear constraints; seeExercise 5.14.Thin-plate splines are defined more generally for arbitrary dimension d,for which an appropriately more general J is used.There are a number of hybrid approaches that are popular in practice,both for computational and conceptual simplicity. Unlike one-dimensionalsmoothing splines, the computational complexity for thin-plate splines isO(N 3 ), since there is not in general any sparse structure that can be exploited.
However, as with univariate smoothing splines, we can get awaywith substantially less than the N knots prescribed by the solution (5.39).1665. Basis Expansions and RegularizationSystolic Blood Pressure16045•••••••••15540••••••••••••Obesity35302520••••••••150••••• • •••••155••••• • •• •• ••••••• ••••• ••• •• ••••••••••• ••• • •• • • • •• •••• •• • ••• • •• • • • • • • • • • • • •• • •• •• •• •••• •• • • •• • •••••• •• • •• ••• ••• •• • •• • • • • • • • •• • • •• •• •• •• • •• •• •• • • ••• • ••• • ••••••••• • •• • • • • • ••••• • •• •• • • • • •• •• •• •• • • • • •• • • • •• • • • ••• •••• •••••• • •• •• • • • • •• • • • • • •• • •• • • • • •• • • • • •• ••• • •• •••• •• ••• •• • • • • •• • • • • • • • • • • •• • • •• • • •• •• •• •• •• • • • •• • •• 150••• • • ••• •• • • ••• • •• • • • •• •• • •• • • • •••••••••• •• •••••• •• ••• ••• • • • •• ••145• •• •••• •• ••• • • •••••140• ••••••135•••145•••12515•130•••140135130125•1202030405060AgeFIGURE 5.12.
A thin-plate spline fit to the heart disease data, displayed as acontour plot. The response is systolic blood pressure, modeled as a functionof age and obesity. The data points are indicated, as well as the lattice of pointsused as knots. Care should be taken to use knots from the lattice inside the convexhull of the data (red), and ignore those outside (green).In practice, it is usually sufficient to work with a lattice of knots coveringthe domain. The penalty is computed for the reduced expansion just asbefore. Using K knots reduces the computations to O(N K 2 + K 3 ).
Figure 5.12 shows the result of fitting a thin-plate spline to some heart diseaserisk factors, representing the surface as a contour plot. Indicated are thelocation of the input features, as well as the knots used in the fit. Note thatλ was specified via dfλ = trace(Sλ ) = 15.More generally one can represent f ∈ IRd as an expansion in any arbitrarily large collection of basis functions, and control the complexity by applying a regularizer such as (5.38). For example, we could construct a basisby forming the tensor products of all pairs of univariate smoothing-splinebasis functions as in (5.35), using, for example, the univariate B-splinesrecommended in Section 5.9.2 as ingredients.
This leads to an exponential5.8 Regularization and Reproducing Kernel Hilbert Spaces167growth in basis functions as the dimension increases, and typically we haveto reduce the number of functions per coordinate accordingly.The additive spline models discussed in Chapter 9 are a restricted classof multidimensional splines. They can be represented in this general formulation as well; that is, there exists a penalty J[f ] that guarantees that thesolution has the form f (X) = α + f1 (X1 ) + · · · + fd (Xd ) and that each ofthe functions fj are univariate splines. In this case the penalty is somewhatdegenerate, and it is more natural to assume that f is additive, and thensimply impose an additional penalty on each of the component functions:J[f ]==J(f1 + f2 + · · · + fd )d ZXfj′′ (tj )2 dtj .(5.40)j=1These are naturally extended to ANOVA spline decompositions,XXf (X) = α +fj (Xj ) +fjk (Xj , Xk ) + · · · ,j(5.41)j<kwhere each of the components are splines of the required dimension.
Thereare many choices to be made:• The maximum order of interaction—we have shown up to order 2above.• Which terms to include—not all main effects and interactions arenecessarily needed.• What representation to use—some choices are:– regression splines with a relatively small number of basis functions per coordinate, and their tensor products for interactions;– a complete basis as in smoothing splines, and include appropriate regularizers for each term in the expansion.In many cases when the number of potential dimensions (features) is large,automatic methods are more desirable.
The MARS and MART procedures(Chapters 9 and 10, respectively), both fall into this category.5.8 Regularization and Reproducing KernelHilbert SpacesIn this section we cast splines into the larger context of regularization methods and reproducing kernel Hilbert spaces. This section is quite technicaland can be skipped by the disinterested or intimidated reader.1685. Basis Expansions and RegularizationA general class of regularization problems has the form"N#XL(yi , f (xi )) + λJ(f )minf ∈H(5.42)i=1where L(y, f (x)) is a loss function, J(f ) is a penalty functional, and H isa space of functions on which J(f ) is defined.
Girosi et al. (1995) describequite general penalty functionals of the formZ|f˜(s)|2J(f ) =ds,(5.43)IRd G̃(s)where f˜ denotes the Fourier transform of f , and G̃ is some positive functionthat falls off to zero as ||s|| → ∞. The idea is that 1/G̃ increases the penaltyfor high-frequency components of f .
Under some additional assumptionsthey show that the solutions have the formf (X) =KXαk φk (X) +NXi=1k=1θi G(X − xi ),(5.44)where the φk span the null space of the penalty functional J, and G is theinverse Fourier transform of G̃. Smoothing splines and thin-plate splinesfall into this framework. The remarkable feature of this solution is thatwhile the criterion (5.42) is defined over an infinite-dimensional space, thesolution is finite-dimensional. In the next sections we look at some specificexamples.5.8.1 Spaces of Functions Generated by KernelsAn important subclass of problems of the form (5.42) are generated bya positive definite kernel K(x, y), and the corresponding space of functions HK is called a reproducing kernel Hilbert space (RKHS). The penaltyfunctional J is defined in terms of the kernel as well.
We give a brief andsimplified introduction to this class of models, adapted from Wahba (1990)and Girosi et al. (1995), and nicely summarized in Evgeniou et al. (2000).Let x, y ∈ IRp . We consider the space of functions generated by the linearspan of P{K(·, y), y ∈ IRp )}; i.e arbitrary linear combinations of the formf (x) = m αm K(x, ym ), where each kernel term is viewed as a functionof the first argument, and indexed by the second. Suppose that K has aneigen-expansion∞Xγi φi (x)φi (y)(5.45)K(x, y) =P∞i=12i=1 γiwith γi ≥ 0,< ∞. Elements of HK have an expansion in terms ofthese eigen-functions,∞Xci φi (x),(5.46)f (x) =i=15.8 Regularization and Reproducing Kernel Hilbert Spaces169with the constraint thatdef||f ||2HK =∞Xi=1c2i /γi < ∞,(5.47)where ||f ||HK is the norm induced by K.
The penalty functional in (5.42)for the space HK is defined to be the squared norm J(f ) = ||f ||2HK . Thequantity J(f ) can be interpreted as a generalized ridge penalty, wherefunctions with large eigenvalues in the expansion (5.45) get penalized less,and vice versa.Rewriting (5.42) we have"N#X2L(yi , f (xi )) + λ||f ||HKmin(5.48)f ∈HKi=1or equivalentlymin {cj }∞1NXL(yi ,∞Xcj φj (xi )) + λj=1j=1i=1∞Xc2j /γj .(5.49)It can be shown (Wahba, 1990, see also Exercise 5.15) that the solutionto (5.48) is finite-dimensional, and has the formf (x) =NXαi K(x, xi ).(5.50)i=1The basis function hi (x) = K(x, xi ) (as a function of the first argument) isknown as the representer of evaluation at xi in HK , since for f ∈ HK , it iseasily seen that hK(·, xi ), f iHK = f (xi ).