Conte, de Boor - Elementary Numerical Analysis. An Algorithmic Approach (523140), страница 37
Текст из файла (страница 37)
. , do until satisfied:u :=If u = 0, then STOP.Else, determine the minimum t* > 0 closest to 0 ofthe functiong(t) = F(x(m) - tu)(m+l)(m)x:= x- t*uExample 5.2 Given the guess x(0) = [1, -1] T for the local minimum (4/3, 0) of thefunctionof Example 5.1, we findThus, in the first step of steepest descent, we look for a minimum of the functiong(t) = F(l + t, -1 + 3t) = (1 + t)3 + (-1 + 3t)3 - 2(1 + t)2 + 3(-1 + 3t)2 - 8getting g´(t) = 0 gives the equation0 = 3(1 + t)2 + 3(3t - 1)23 - 4(1 + t) + 3 · 2(3t - 1)3= 84t 2 + 2t - 10which has the two solutionsquadratic formula).
We choose the positive root, t* =x (0) in the direction ofThis gives(using thesince we intend to walk fromthe minimum itself.212*SYSTEMS OF EQUATIONS AND UNCONSTRAINED OPTIMIZATIONIt is clear that the method of steepest descent guarantees a decrease infunction value from step to step, i.e.,= 0). This fact can be made the basis for a convergence proof of themethod (under the assumption that ||x(m)|| < constant for all m). But it iseasy to give examples which show that the method may converge veryslowly.with α > 0, has a global minimum atExample 5.3 ‘The function F(x) =x = 0. Its gradient is linear,We could therefore determine at once its unique critical point from the system2 x1 = 02 αx2 = 0But, let us use steepest descent instead, to make a point.
This requires us to determinethe minimum of the functiong(t) = F( x - tF(x))= F(x 1 (1 - 2t), x2(1 - 2α t))getting g´(t) = 0 gives the equation0 = 2(x(1 - 2t))(-2) + α2(x (1 - 2αt ) ) ( - 2 α)whose solution isguess, thenHence, if x = [x1, x2] T is our currentis our next guess.Now take x in the specific form c [α, ± 1]T . Then the next guess becomesi.e., the error is reduced by the factor (α - l)/(α + 1). For example, for α = 100, andx(0) = [1, 0.01 we get, after 100 steps of steepest descent, the pointwhich is still less thanof the way from the first guess to the solution.In Fig. 5.2, we have shown part of the steepest descent iteration forExample 5.2. To understand this figure one needs to realize the followingtwo points: (i) Since (d/dt) F(x + tu) =by Theorem 1.8,the gradient of F at the minimum x - t*of F in the negativegradient direction is perpendicular to that direction, that is,(ii) A function F( x1, x2 ) of two variables is often described by its level orcontour lines in the (x1, x2 )-plane, i.e., by the curvesF(x1, x2 ) = const*5.1OPTIMIZATION AND STEEPEST DESCENT213Figure 5.2 The method of steepest descent may shuffle ineffectually back and forth whensearching for a minimum in a narrow valley.Such lines are shown in Fig.
5.2. They give information about gradientdirection, since the gradient at a point is necessarily perpendicular to thelevel line through that point (Exercise 5.1-3).As the example shows, choice of the direction of steepest descent maybe a good tactic, but it is often bad strategy.One uses today more sophisticated descent methods, in which x ( m +1) isfound from x(m ) in the formx ( m +1) = x( m ) + tm u( m )(m )Here, uis a descent direction, i.e.,and tma line search, i.e., by approximately minimizing the function(5.4)is found byg(t) = F(x( m ) + tu( m ) )If the gradient of F is available, then this line search reduces to finding anappropriate zero of the functionand the methods of Chap. 3 may be applied.
One should keep in mind,though, that the accuracy with which this zero is determined shoulddepend on how close one is to the minimum of F(x).If the gradient of F is not available (or is thought to be too expensiveto evaluate), then it has been proposed to use quadratic interpolation insome form. The following is typical.Algorithm 5.2: Line ‘search by quadratic interpolation Given a functiong(t) with g´(0) < 0, a positive number tmax and a positive tolerance ε.1. s1 := 02. Choose s2, s3 so that 0 < s2 < s3 < tmax and g[s1,s2] < 03.
IF s2 = s3 = tmax, then tm := tmax and EXITELSE consider the parabola p2(t) which agrees with g(t) at s1, s2, s3214*SYSTEMS OF EQUATIONS AND UNCONSTRAINED OPTIMIZATION4. IF g[s1, s2, s3 ] < 0, hence p2(t) has no minimum,then s3 := tmax and GO TO 3ELSE calculate the minimum s of p2(t), i.e.,s := (sl + s2 - g[s1, s2 ]/ g[s1, s2, s3])/25.
IF s > tmax, then (s1, s2, s3 ) := (s2, s3, tmax ) and GO TO 3ELSE5.1. IF |g(s) - mini g(si )| < ε or |g(s) - p2(s)| < ε,then tm := s and EXITELSE select a new ordered three-point sequence (sl, s2, s3 )from the four-point set {sl, s2, s3, s} and in such a way thateither g[s1, s2] < 0 < g[s2, s3 ] or, if that is not possible, sothat maxi g(si ) is as small as possible and GO TO 4On EXIT, tm is taken to be an approximation to the minimum of g(t)on the interval [0, tmax].Note that EXIT at step 5.1. is no guarantee that the tm so found is“close” to a minimum of g(t); see Exercise 5.1-5. When Algorithm 5.2 isused as part of a multivariate minimization algorithm, it is usually startedwith s1 = s2 = 0 [since g´(0) =is usually available] and s3 =tmax = 1, and step 5.1.
is simplified to “tm := s and EXIT”. This can beshown to be allright provided the search direction u (m) is chosen so thatx (m) + u(m) is the local minimum of a quadratic which approximates F nearx (m) .We have made the point that optimization gives rise to systems ofequations, namely systems of the special formConversely, an arbitrary systemf(x) = 0of n equations in n unknowns can be solved in principle by optimization,since, e.g., every minimum of the function(5.5)is a solution of the equation f(x) = 0 and vice versa. For this specificfunction F,orwiththe Jacobian matrix of the vector-valued function f.(5.6)*5.1OPTIMIZATION AND STEEPEST DESCENT215EXERCISES5.1-l Find all critical points of the functionF(x1, x2 ) =by sketching the curves= 0 andThen classify them into maxima,minima, and saddle points using the gradient directions in their neighborhood.5.1-2 Use steepest descent and ascent to find the minima and maxima of the function ofExercise 5.1-l correct to within 10-6.5.1-3 Let u be the tangent direction to a level line F( x1, x2 ) = const at a point x = [x1, x2] T .Use Theorem 1.8 to prove that5.14 Write a FORTRAN subroutine for carrying out Algorithm 5.2, then use it to solveExercise 5.1-2 above.
(Note: To find a maximum of the function F is the same as finding aminimum of the function - F.)5.1-5 (S. R. Robinson [34]) Let h(t) be a smooth function on [a,b] with h´´(t) > 0 andh(a) = h(b).(a) Rove that h(t) has a unique minimum t* in [a,b].(b) Consider finding t* by picking some interval [α,β] containing t* and then applyingAlgorithm 5.2 to the input g(t) = h(t - α), tmax = β - α, some ε > 0, and the initial choice(0, t max /2, tmax ) for (s1, s2, s3). The resulting estimate tmax for t* then depends on α, β, and ε.then h(t) must be a parabola.
[Hint: ChooseProve: If, for all such a, b, we getα, β so that h(α) - h( β ) . ](c) Conclude that Algorithm 5.2 may entirely fail to provide a good estimate for theminimum of g (even if ε is very small), unless g is close to a parabola.5.1-6: Least-squares approximation A common computational task requires the determinationof parameters a1, . . . , ak so that the model y = R (x; al, . . . , ak ) fits measurements (xi, yi ),i = l , .
. . , N, as well as possible, i.e., so thatwith the N-vectoras small as possible.(a) Assuming that R depends smoothly on the parameter vector a = [ a1, a2 · · ·show that the choice a* which minimizes ||ε|| 2 must satisfy the so called normalequationswith the k × N matrix A given by(b) Determine the particular numbers a1, a2 in the modelwhich fits best in the above sense the following observations:xi12yi1.481.103450.810.610.4567890.330.240.180.13100.10216*SYSTEMS OF EQUATIONS AND UNCONSTRAINED OPTIMIZATION*5.2 NEWTON’S METHODWhen solving one equationf (ξ) = 0in one unknown ξ in Chap. 3, we derived Newton’s method by (i) usingTaylor’s expansionf(x + h) = f(x) + f´(x)h +(h2)for f at the point x, and then (ii), ignoring the higher-order termsolving the “linearized” equation0 = f(x) + f´(x)hinstead of the full equation 0 = f(x + h) for h, getting h = -f(x)/f´(x)and thereby the “improved” approximationx - f(x) / f´(x)Now that we are trying to determine an n-vector ξ satisfying the systemof n equations, we proceed in exactly the same way.
From Theorem 1.9, weknow that the ith component function fi of the vector-valued function fsatisfiesin case fi has continuous first and second partial derivatives. Thusf (x + h) = f(x) + f´(x)h +(5.7)with the matrix f´ called the Jacobian matrix for f at x and given byAgain we ignore the higher-order termequation(||h||2) and solve the “linearized”0 = f(x) + f´(x)hinstead of the full equation 0 = f(x + h) for the correction h, getting thesolutionh = -f´(x)-1f(x)provided the Jacobian f´(x) is invertible. In this way, we obtain the newapproximationx - f´(x)-1f(x)to ξ.
This is the basic step of Newton’s method for a system. The Newtonequationf´(x)h = -f(x)*5.2NEWTON’S METHOD217for the correction h to x is, of course, a linear system, and is solved by thedirect methods described in Chap. 4.Aigorithm 5.3: Newton’s method for a system Given the systemf(ξ) = 0of n equations in n unknowns, with f a vector valued function havingsmooth components, and a first guess x(0) for a solution ξ of thesystem.For m = 0, 1, 2, . . . , until satisfied, do:x(m+1) := x(m) - f´(x( m ) ) -1 f(x ( m ))It can be shown that Newton’s method converges to ξ provided x(0) isclose enough to ξ and provided the Jacobian f´ of f is continuous and f´ ( ξ )is invertible.