Heath - Scientific Computing (523150), страница 52
Текст из файла (страница 52)
This new point then replaces the oldest of the three previous points andthe process is repeated until convergence. This process is illustrated in Fig. 6.3. Successiveparabolic interpolation is riskier than golden section search, since it does not necessarilymaintain a bracketing interval in which the solution is known to lie, but asymptotically itconverges superlinearly with convergence rate r ≈ 1.324.•..........•................
.... ........... ........ .. ...... ......... ..... .... ... .......... ... ...... ..... . .. .. .. ... ... ...... .... ..... .... .... ..... ..... .... .... ............. .............. ......................................... ...................... ... .... ............................................................................ ....... ....................................................................................................................................................................................................................................•xk−2xk xk+1xk−1Figure 6.3: Successive parabolic iteration for minimizing a function.Example 6.3 Successive Parabolic Interpolation. We illustrate successive parabolicinterpolation by using it to minimize the function of Example 6.2,2f (x) = 0.5 − xe−x .We evaluate the function at three points, say, x0 = 0, x1 = 0.6, and x2 = 1.2, obtainingf (x0 ) = 0.5, f (x1 ) = 0.081, f (x2 ) = 0.216.
We fit a parabola to these three points and takeits minimizer, x3 = 0.754, to be the next approximation to the solution. We then discardx0 and repeat the process with the three remaining points. The first iteration is depictedin Fig. 6.4, and the full sequence of iterations is given next.xk0.0000.6001.2000.7540.7210.6920.7076.2.3f (xk )0.5000.0810.2160.0730.0710.0710.071Newton’s MethodA local quadratic approximation to the objective function is useful because the minimumof a quadratic is easy to compute. Another way to obtain a local quadratic approximationis to use a truncated Taylor series expansion,f (x + h) ≈ f (x) + f 0 (x)h +f 00 (x) 2h .2190CHAPTER 6.
OPTIMIZATION•.......... .......... ........... ............................................... .............................. ........................ ............................................................. ............... ........................................ .............. ................................. .....................................................................................................................................................................................................................................................................................................................................................................••x0x1 x3x2Figure 6.4: First iteration of successive parabolic iteration for example problem.By differentiation, we find that the minimum of this quadratic function of h is given byh = −f 0 (x)/f 00 (x).
This result suggests the iteration schemexk+1 = xk − f 0 (xk )/f 00 (xk ),which is simply Newton’s method for solving the nonlinear equation f 0 (x) = 0. As usual,Newton’s method for finding a minimum normally has a quadratic convergence rate. Unlessit is started near the desired solution, however, Newton’s method may fail to converge, orit may converge to a maximum or to an inflection point of the function.Example 6.4 Newton’s Method.
We illustrate Newton’s method by using it to minimizethe function of Example 6.2,2f (x) = 0.5 − xe−x .The first and second derivatives of f are given byf 0 (x) = (2x2 − 1)e−x2and2f 00 (x) = 2x(3 − 2x2 )e−x ,so the Newton iteration for finding a zero of f 0 is given byxk+1 = xk − (2x2k − 1)/(2xk (3 − 2x2k )).Using a starting guess of x0 = 1, we get the sequence of iterates shown next.xk1.0000.5000.7000.707f (xk )0.1320.1110.0710.0716.3.
MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION6.2.4191Safeguarded MethodsAs with solving nonlinear equations in one dimension, slow-but-sure and fast-but-riskyoptimization methods can be combined to provide both safety and efficiency. A bracketinginterval, in which the solution is known to lie, is maintained so that if the fast methodgenerates an iterate that would lie outside the interval, then the safe method can be usedto reduce the length of the bracketing interval before trying the fast method again, witha better chance of producing a reliable result.
Most library routines for one-dimensionaloptimization are based on such a hybrid approach. One popular combination, which requiresno derivatives of the objective function, is golden section search and successive parabolicinterpolation.6.3Multidimensional Unconstrained OptimizationWe turn now to multidimensional unconstrained optimization, which has a number of features in common with both one-dimensional optimization and with solving systems of nonlinear equations in n dimensions.6.3.1Direct Search MethodsRecall that golden section search for one-dimensional optimization makes no use of theobjective function values other than to compare them. Direct search methods for multidimensional optimization share this property, although they do not retain the convergenceguarantee of golden section search.
Perhaps the best known of these is the method of Nelderand Mead. For minimizing a function f of n variables, the method begins with a set of n + 1starting points, forming a simplex in Rn , at which f is evaluated. A move is then made toa new point along a straight line from the worst current point through the centroid of all ofthe points. The new point then replaces the worst point, and the process is repeated.
Thealgorithm involves several parameters that determine how far to move along the line andhow much to expand or contract the simplex, depending on whether the search is successfulor not. Such direct search methods can be attractive for a nonsmooth objective function,for which few other methods are applicable, and they are sometimes fairly effective when nis small, but they tend to be quite expensive when n is larger than two or three.6.3.2Steepest Descent MethodAs expected, greater use of the objective function and its derivatives leads to faster methods.Let f : Rn → R be a real-valued function of n real variables. Recall that the gradient of fevaluated at x, denoted by ∇f (x), is a vector-valued function whose ith component functionis the partial derivative of f with respect to xi , ∂f (x)/∂xi .
From calculus, we know that ata given point x where the gradient vector is nonzero, the negative gradient, −∇f (x), pointsdownhill toward lower values of the function f . In fact, −∇f (x) is locally the directionof steepest descent for the function f in the sense that the value of the function decreasesmore rapidly along the direction of the negative gradient than along any other direction.This fact leads to one of the oldest methods for multidimensional optimization, thesteepest descent method . Starting from some initial guess x0 , each successive approximate192CHAPTER 6.
OPTIMIZATIONsolution is given byxk+1 = xk − αk ∇f (xk ),where αk is a line search parameter that determines how far to go in the given direction.Given a direction of descent, such as the negative gradient, determination of an appropriate value for the line search parameter αk at each iteration is a one-dimensionalminimization problemmin f (xk − α∇f (xk ))αthat can be solved by the methods discussed in Section 6.2.The steepest descent method is very reliable in that it can always make progress providedthe gradient is nonzero.
But as the following example demonstrates, the method is rathermyopic in its view of the behavior of the function, and the resulting iterates can zigzag backand forth, making very slow progress toward a solution. In general, the convergence rate ofsteepest descent is only linear, with a constant factor that can be arbitrarily close to 1.Example 6.5 Steepest Descent. We illustrate the steepest descent method by using itto minimize the functionf (x) = 0.5x21 + 2.5x22 ,whose gradient is given byx1∇f (x) =.5x2If we take x0 = [ 5 1 ]T as starting point, the gradient is ∇f (x0 ) = [ 5perform a line search along the negative gradient direction, i.e.,5 ]T .
We nextmin f (x0 − α∇f (x0 )).αOne-dimensional minimization of f as a function of α along the line gives α0 = 31 , so that thenext approximation is x1 = [ 3.333 −0.667 ]T . We then evaluate the gradient at this newpoint to determine the next search direction and repeat the process. The resulting sequenceof iterations is shown numerically in the following table and graphically in Fig.
6.5, wherethe ellipses represent level curves, or contours, on which the function f has a constantvalue. The gradient direction at any given point is always normal to the level curve passingthrough that point. Note that the minimum along a given search direction occurs when thegradient at the new point is orthogonal to the search direction.
The sequence of iteratesgiven by steepest descent is converging slowly toward the solution, which for this problemis at the origin, where the minimum function value is zero.6.3. MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATIONxk5.0003.3332.2221.4810.9880.6580.4390.2930.1950.1301.000−0.6670.444−0.2960.198−0.1320.088−0.0590.039−0.026f (xk )15.0006.6672.9631.3170.5850.2600.1160.0510.0230.010193∇f (xk )5.0005.0003.333 −3.3332.2222.2221.481 −1.4810.9880.9880.658 −0.6580.4390.4390.293 −0.2930.1950.1950.130 −0.1303................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................