Heath - Scientific Computing (523150), страница 53
Текст из файла (страница 53)
.............................................................................. ..................................................................................................... ... .... .................................................................. ........... ............ ........ ............................................... ................................
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................•−66−3Figure 6.5: Convergence of steepest descent.6.3.3Newton’s MethodA broader view of the function can be obtained by a local quadratic approximation, whichas we have seen is equivalent to Newton’s method.
In the case of multidimensional optimization, we seek a zero of the gradient. Thus, the iteration scheme for Newton’s methodhas the formxk+1 = xk − Hf−1 (xk )∇f (xk ),where Hf (x) is the Hessian matrix of second partial derivatives of f ,{Hf (x)}ij =∂ 2 f (x),∂xi ∂xjevaluated at xk . As usual, we do not explicitly invert the Hessian matrix but instead useit to solve a linear systemHf (xk )sk = −∇f (xk )for sk , then take as next iteratexk+1 = xk + sk .The convergence rate of Newton’s method for minimization is normally quadratic. As usual,however, Newton’s method is unreliable unless started close enough to the solution.194CHAPTER 6. OPTIMIZATIONExample 6.6 Newton’s Method.
We illustrate Newton’s method by again minimizingthe function of Example 6.5,f (x) = 0.5x21 + 2.5x22 ,whose gradient and Hessian are given byx11and Hf (x) =∇f (x) =5x200.5If we take x0 = [ 5 1 ]T as starting point, the gradient is ∇f (x0 ) = [ 5system to be solved for the Newton step is therefore1 0−5s =,0 5 0−55 ]T . The linearand hence the next approximate solution is 5−50x1 = x0 + s0 =+=,1−10which is the exact solution for this problem. That Newton’s method has converged in asingle iteration in this case should not be surprising, since the function being minimized is aquadratic.
Of course, the quadratic model used by Newton’s method is not exact in general,but nevertheless it enables Newton’s method to take a more global view of the problem,yielding much more rapid convergence than the steepest descent method.Intuitively, unconstrained minimization is like finding the bottom of a bowl by rollinga marble down the side. If the bowl is oblong, then the marble will rock back and forthalong the valley before eventually settling at the bottom, analogous to the zigzagging pathtaken by the steepest descent method. With Newton’s method, the metric of the spaceis redefined so that the bowl becomes circular, and hence the marble rolls directly to thebottom.Unlike the steepest descent method, Newton’s method does not require a line searchparameter because the quadratic model determines an appropriate length as well as directionfor the step to the next approximate solution.
When started far from a solution, however,it may still be advisable to perform a line search along the direction of the Newton step skin order to make the method more robust (this procedure is sometimes called the dampedNewton method ). Once the iterations are near the solution, then the value αk = 1 for theline search parameter should suffice for subsequent iterations.An alternative to a line search is a trust region method , in which an estimate is maintained of the radius of a region in which the quadratic model is sufficiently accurate for thecomputed Newton step to be reliable (see Section 5.3.5), and thus the next approximatesolution is constrained to lie within the trust region.
If the current trust radius is binding,minimizing the quadratic model function subject to this constraint may modify the direction as well as the length of the Newton step, as illustrated in Fig. 6.6. The accuracy ofthe quadratic model at a given step is assessed by comparing the actual decrease in theobjective function value with that predicted by the quadratic model, and the trust radius6.3.
MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION195................................................................................................................................................................................................................................................................ .........................................................................................................................................................................................................
...................... ..................................k................................................................. ..... ............................................................... ...................................................... ........................................................... ....................................... k+1................................................................................................................ ......................................................................................................................
..................................................... ............................. ................................................................................................................................. ................................................................................................................................................................................................................................trust radius•xcontours ofquadratic modelNewtonstep•xneg.
grad. dir.Figure 6.6: Modification of Newton step by trust region method.is then increased or decreased accordingly. Once near a solution, the trust radius should belarge enough to permit full Newton steps, yielding rapid local convergence.If the objective function f has continuous second partial derivatives, then the Hessianmatrix Hf is symmetric; and near a minimum it is positive definite.
Thus, the linear systemfor the step to the next iterate can be solved by Cholesky factorization, requiring only abouthalf of the work required by LU factorization. Far from a minimum, however, Hf (xk ) maynot be positive definite and thus may require a symmetric indefinite factorization. Theresulting Newton step sk may not even be a descent direction for the function, i.e., we maynot have∇f (xk )T sk < 0.In this case, an alternative descent direction can be computed, such as the negative gradientor a direction of negative curvature (i.e., a vector sk such that sTk Hf (xk )sk < 0, which canbe obtained readily from a symmetric indefinite factorization of Hf (xk )), and a line searchperformed.
Another alternative is to shift the spectrum of Hf (xk ) so that it becomespositive definite, i.e., replace Hf (xk ) with Hf (xk ) + µI, where µ is a scalar chosen so thatthe new matrix is positive definite. As µ varies, the resulting computed step interpolatesbetween the standard Newton step and the steepest descent direction. Such alternativemeasures should become unnecessary once the approximate solution is sufficiently close tothe true solution, so that the ultimate quadratic convergence rate of Newton’s method canstill be attained.6.3.4Quasi-Newton MethodsNewton’s method usually converges very rapidly once it nears a solution, but it requiresa substantial amount of work per iteration, specifically O(n3 ) arithmetic and O(n2 ) scalarfunction evaluations per iteration for a dense problem.