The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 36
Текст из файла (страница 36)
, 3. Except in specialcases, we would typically prefer the third panel, which is also piecewiselinear, but restricted to be continuous at the two knots. These continuity restrictions lead to linear constraints on the parameters; for example,f (ξ1− ) = f (ξ1+ ) implies that β1 + ξ1 β4 = β2 + ξ1 β5 . In this case, since thereare two restrictions, we expect to get back two parameters, leaving four freeparameters.A more direct way to proceed in this case is to use a basis that incorporates the constraints:h1 (X) = 1,h2 (X) = X,h3 (X) = (X − ξ1 )+ ,h4 (X) = (X − ξ2 )+ ,where t+ denotes the positive part.
The function h3 is shown in the lowerright panel of Figure 5.1. We often prefer smoother functions, and thesecan be achieved by increasing the order of the local polynomial. Figure 5.2shows a series of piecewise-cubic polynomials fit to the same data, with1425. Basis Expansions and RegularizationPiecewise ConstantOOO OOOOO OOPiecewise LinearOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOξ2ξ2ξ1Continuous Piecewise LinearPiecewise-linear Basis FunctionOOOOOOOOOOOOOOOOOOξ1O OOOOO OOOOOOOOOOOO OOOOOOOOOOOOOOOOOOOOOO OOOOOOOOOOOOOOOOOO OOOOO OOOOOOOO OOOOO••OOOOOO(X − ξ1 )+OO•• • •••• •• •••• • • •• •Oξ1ξ2••••••••OOOOOOξ1••••••••••ξ2FIGURE 5.1.
The top left panel shows a piecewise constant function fit to someartificial data. The broken vertical lines indicate the positions of the two knotsξ1 and ξ2 . The blue curve represents the true function, from which the data weregenerated with Gaussian noise.
The remaining two panels show piecewise linear functions fit to the same data—the top right unrestricted, and the lower leftrestricted to be continuous at the knots. The lower right panel shows a piecewise–linear basis function, h3 (X) = (X − ξ1 )+ , continuous at ξ1 . The black pointsindicate the sample evaluations h3 (xi ), i = 1, . . . , N .5.2 Piecewise Polynomials and Splines143Piecewise Cubic PolynomialsDiscontinuousOOO OOOO OOOOOContinuousOOOOOOOOOOOOOOOOO OOOOO OOOO OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO OOOO OOOOOOOOOOOOOOOOOO OOOOOOO OOOO OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO OOOOOOOOOOOOOOOOξ1ξ2Continuous Second DerivativeOOOOOξ1ξ2OOOContinuous First DerivativeOOOOξ1OO OOOOOOOOOOOOξ1ξ2ξ2FIGURE 5.2.
A series of piecewise-cubic polynomials, with increasing orders ofcontinuity.increasing orders of continuity at the knots. The function in the lowerright panel is continuous, and has continuous first and second derivativesat the knots. It is known as a cubic spline. Enforcing one more order ofcontinuity would lead to a global cubic polynomial. It is not hard to show(Exercise 5.1) that the following basis represents a cubic spline with knotsat ξ1 and ξ2 :h1 (X) = 1,h3 (X) = X 2 ,h2 (X) = X,h4 (X) = X 3 ,h5 (X) = (X − ξ1 )3+ ,h6 (X) = (X − ξ2 )3+ .(5.3)There are six basis functions corresponding to a six-dimensional linear spaceof functions.
A quick check confirms the parameter count: (3 regions)×(4parameters per region) −(2 knots)×(3 constraints per knot)= 6.1445. Basis Expansions and RegularizationMore generally, an order-M spline with knots ξj , j = 1, . . . , K is apiecewise-polynomial of order M , and has continuous derivatives up toorder M − 2.
A cubic spline has M = 4. In fact the piecewise-constantfunction in Figure 5.1 is an order-1 spline, while the continuous piecewise linear function is an order-2 spline. Likewise the general form for thetruncated-power basis set would behj (X)=X j−1 , j = 1, . . . , M,hM +ℓ (X)=−1(X − ξℓ )M, ℓ = 1, . . . , K.+It is claimed that cubic splines are the lowest-order spline for which theknot-discontinuity is not visible to the human eye. There is seldom anygood reason to go beyond cubic-splines, unless one is interested in smoothderivatives.
In practice the most widely used orders are M = 1, 2 and 4.These fixed-knot splines are also known as regression splines. One needsto select the order of the spline, the number of knots and their placement.One simple approach is to parameterize a family of splines by the numberof basis functions or degrees of freedom, and have the observations xi determine the positions of the knots. For example, the expression bs(x,df=7)in R generates a basis matrix of cubic-spline functions evaluated at the Nobservations in x, with the 7 − 3 = 41 interior knots at the appropriate percentiles of x (20, 40, 60 and 80th.) One can be more explicit, however; bs(x,degree=1, knots = c(0.2, 0.4, 0.6)) generates a basis for linear splines,with three interior knots, and returns an N × 4 matrix.Since the space of spline functions of a particular order and knot sequenceis a vector space, there are many equivalent bases for representing them(just as there are for ordinary polynomials.) While the truncated powerbasis is conceptually simple, it is not too attractive numerically: powers oflarge numbers can lead to severe rounding problems.
The B-spline basis,described in the Appendix to this chapter, allows for efficient computationseven when the number of knots K is large.5.2.1 Natural Cubic SplinesWe know that the behavior of polynomials fit to data tends to be erraticnear the boundaries, and extrapolation can be dangerous. These problemsare exacerbated with splines. The polynomials fit beyond the boundaryknots behave even more wildly than the corresponding global polynomialsin that region. This can be conveniently summarized in terms of the pointwise variance of spline functions fit by least squares (see the example in thenext section for details on these variance calculations). Figure 5.3 compares1 A cubic spline with four knots is eight-dimensional.
The bs() function omits bydefault the constant term in the basis, since terms like this are typically included withother terms in the model.5.2 Piecewise Polynomials and Splines0.6•0.40.5Global LinearGlobal Cubic PolynomialCubic Spline - 2 knotsNatural Cubic Spline - 6 knots•0.3••0.2•••••• ••• ••••• •• • •••0.00.1Pointwise Variances1450.0••••••• ••• ••••• •••••••••••••••••••••• •• ••••••0.2••••••••••••••••••••••••0.4•••••• • •••••• • ••• •• • •• ••••• •• • •• ••••• • •••• ••••• • •• •• • •0.60.8••••• • ••••• •• • • •••••••1.0XFIGURE 5.3. Pointwise variance curves for four different models, with X consisting of 50 points drawn at random from U [0, 1], and an assumed error modelwith constant variance.
The linear and cubic polynomial fits have two and fourdegrees of freedom, respectively, while the cubic spline and natural cubic splineeach have six degrees of freedom. The cubic spline has two knots at 0.33 and 0.66,while the natural spline has boundary knots at 0.1 and 0.9, and four interior knotsuniformly spaced between them.the pointwise variances for a variety of different models. The explosion ofthe variance near the boundaries is clear, and inevitably is worst for cubicsplines.A natural cubic spline adds additional constraints, namely that the function is linear beyond the boundary knots. This frees up four degrees offreedom (two constraints each in both boundary regions), which can bespent more profitably by sprinkling more knots in the interior region. Thistradeoff is illustrated in terms of variance in Figure 5.3. There will be aprice paid in bias near the boundaries, but assuming the function is linear near the boundaries (where we have less information anyway) is oftenconsidered reasonable.A natural cubic spline with K knots is represented by K basis functions.One can start from a basis for cubic splines, and derive the reduced basis by imposing the boundary constraints.
For example, starting from thetruncated power series basis described in Section 5.2, we arrive at (Exercise 5.4):N1 (X) = 1,N2 (X) = X,Nk+2 (X) = dk (X) − dK−1 (X),(5.4)1465. Basis Expansions and Regularizationwhere(X − ξk )3+ − (X − ξK )3+.(5.5)ξK − ξkEach of these basis functions can be seen to have zero second and thirdderivative for X ≥ ξK .dk (X) =5.2.2 Example: South African Heart Disease (Continued)In Section 4.4.2 we fit linear logistic regression models to the South Africanheart disease data.
Here we explore nonlinearities in the functions usingnatural splines. The functional form of the model islogit[Pr(chd|X)] = θ0 + h1 (X1 )T θ1 + h2 (X2 )T θ2 + · · · + hp (Xp )T θp , (5.6)where each of the θj are vectors of coefficients multiplying their associatedvector of natural spline basis functions hj .We use four natural spline bases for each term in the model. For example,with X1 representing sbp, h1 (X1 ) is a basis consisting of four basis functions.