Heath - Scientific Computing (523150), страница 55
Текст из файла (страница 55)
If we define the components of the residualvector r(x) byri (x) = yi − f (ti , x), i = 1, ..., m,then we wish to minimize the function g(x) = 12 r T (x)r(x). The gradient vector and Hessianmatrix of g are given by∇g(x) = J T (x)r(x)200CHAPTER 6. OPTIMIZATIONandHg (x) = J T (x)J (x) +mXri (x)Hi (x),i=1where J (x) is the Jacobian matrix of the vector function r(x), and Hi (x) denotes theHessian matrix of the component function ri (x). Thus, if xk is an approximate solution,the Newton step sk is given by the linear system[J T (xk )J (xk ) +mXri (xk )Hi (xk )]sk = −J T (xk )r(xk ).i=16.4.1Gauss-Newton MethodThe m Hessian matrices Hi are usually inconvenient and expensive to compute.
Moreover,in Hg each of these matrices is multiplied by the residual component function ri , whichshould be small at a solution if the fit of the model function to the data is reasonably good.These features motivate the Gauss-Newton method for nonlinear least squares, in which thesecond-order term is dropped and the linear systemJ T (xk )J (xk )sk = −J T (xk )r(xk )is solved for the approximate Newton step sk at each iteration. But we recognize this systemas the normal equations (see Section 3.3) for the linear least squares problemJ(xk )sk ≈ −r(xk ),which can be solved more reliably by orthogonal factorization (see Section 3.4). The nextapproximate solution is then given byxk+1 = xk + sk ,and the process is repeated until convergence.
In effect, the Gauss-Newton method replacesa nonlinear least squares problem with a sequence of linear least squares problems whosesolutions converge to the solution of the original nonlinear problem.Example 6.9 Gauss-Newton Method. We illustrate the Gauss-Newton method fornonlinear least squares by fitting the nonlinear model functionf (t, x) = x1 ex2 tto the dataty0.02.01.00.72.00.33.00.1For this model function, the entries of the Jacobian matrix of the residual function r aregiven by∂ri (x)∂ri (x){J(x)}i,1 == −ex2 ti , {J(x)}i,2 == −x1 ti ex2 ti .∂x1∂x26.4.
NONLINEAR LEAST SQUARES201If we take x0 = [ 1 0 ]T as starting point, then the linear least squares problem to be solvedfor the Gauss-Newton correction step s0 is−10−1 −1 −1 0.3 −1 −2 s0 ≈ 0.7 .−1 −30.9The least squares solution to this system is s0 = [ 0.69 −0.61 ]T . We take x1 = x0 + s0 asthe next approximate solution and repeat the process until convergence. The sequence ofiterations is shown next.xk1.0001.6901.9751.9941.9951.9950.000−0.610−0.930−1.004−1.009−1.010kr(xk )k222.3900.2120.0070.0020.0020.002Like all methods based on Newton’s method, the Gauss-Newton method for solvingnonlinear least squares problems may fail to converge if it is started too far from the solution.A line search can be used to improve its robustness, but additional modifications may benecessary to ensure that the computed step is a descent direction when far from the solution.In addition, if the residual function at the solution is too large, then the second-orderterm omitted from the Hessian matrix may not be negligible, which means that the GaussNewton approximation is not sufficiently accurate, so that the method converges very slowlyat best and may not converge at all.
In such “large-residual” cases, it may be best to use ageneral nonlinear minimization method that takes into account the true full Hessian matrix.6.4.2Levenberg-Marquardt MethodThe Levenberg-Marquardt method is another useful alternative when the Gauss-Newtonapproximation is inadequate or yields a rank-deficient linear least squares subproblem. Inthis method, the linear system at each iteration is of the form(J T (xk )J (xk ) + µk I)sk = −J T (xk )r(xk ),where µk is a scalar parameter chosen by some strategy. The corresponding linear leastsquares problem to be solved isJ (xk )−r(xk )√sk ≈.µk IoThis method, which is an example of a general technique known as regularization (seeSection 8.6), can be variously interpreted as replacing the term omitted from the trueHessian by a scalar multiple of the identity matrix, or as shifting the spectrum of theapproximate Hessian to make it positive definite (or equivalently, as boosting the rank of202CHAPTER 6.
OPTIMIZATIONthe corresponding least squares problem), or as using a weighted combination of the GaussNewton step and the steepest descent direction. With a suitable strategy for choosing theparameter µk , the Levenberg-Marquardt method can be very robust in practice, and itforms the basis for several effective software packages for solving nonlinear least squaresproblems.6.5Constrained OptimizationA thorough study of constrained optimization is beyond the scope of this book, but the basicideas of some of the concepts and algorithms involved are briefly sketched here. Considerthe minimization of a nonlinear function subject to nonlinear equality constraints,min f (x)xsubject to g(x) = o,where f : Rn → R and g: Rn → Rm , with m ≤ n. From multivariate calculus, we know thata necessary condition for a feasible point x to be a solution to this problem is that thenegative gradient of f lies in the space spanned by the constraint normals, i.e., that−∇f (x) = JgT (x)λ,where Jg is the Jacobian matrix of g and λ is an m-vector of Lagrange multipliers.
This condition says that we cannot reduce the objective function without violating the constraints,and it motivates the definition of the Lagrangian function, L: Rn+m → R, given byL(x, λ) = f (x) + λT g(x),whose gradient and Hessian are given by ∇x L(x, λ)∇f (x) + JgT (x)λ∇L(x, λ) ==∇λ L(x, λ)g(x)andHL (x, λ) =B(x, λ) JgT (x),Jg (x)OwhereB(x, λ) = ∇xx L(x, λ) = Hf (x) +mXλi Hgi (x).i=1Together, the necessary condition and the requirement of feasibility say that we are lookingfor a critical point of the Lagrangian function, which is expressed by the system of nonlinearequations∇f (x) + JgT (x)λ= o.g(x)It is important to note that the block 2 × 2 matrix HL is symmetric but cannot be positivedefinite, even if the matrix B is positive definite (in general, B is not positive definite, but6.5.
CONSTRAINED OPTIMIZATION203an extra “penalty” term is sometimes added to the Lagrangian to make it so). Thus, acritical point of L is necessarily a saddle point rather than a minimum or maximum.If the Hessian of the Lagrangian is never positive definite, even at a constrained minimum, then how can we check a critical point of the Lagrangian for optimality? It turnsout that a sufficient condition for a constrained minimum is that the matrix B(x, λ) at thecritical point be positive definite on the tangent space to the constraint surface, which issimply the null space of Jg (i.e., the set of all vectors orthogonal to the rows of Jg ).
If Z is amatrix whose columns form a basis for this subspace, then we check whether the symmetricmatrix Z T BZ is positive definite. This condition says that we need positive definitenessonly with respect to locally feasible directions (i.e., parallel to the constraint surface), formovement orthogonal to the constraint surface would violate the constraints. A suitablematrix Z can be obtained from an orthogonal factorization of JgT (see Section 3.4.3).Applying Newton’s method to the foregoing nonlinear system, we obtain a system oflinear equations B(x, λ) JgT (x)s∇f (x) + JgT (x)λ=−Jg (x)Oδg(x)for the Newton step (s, δ) in (x, λ) at each iteration.
Many of the algorithms for solvingconstrained optimization problems amount to different ways of solving this block 2×2 linearsystem or some variant of it. Methods for constrained optimization fall roughly into threecategories:• Range space methods, which are based on block elimination in the block 2 × 2 linearsystem, yielding an approach akin to the normal equations for linear least squares• Null space methods, which are based on orthogonal factorization of the matrix of constraint normals, JgT (x)• Methods that solve the entire block 2 × 2 system directly, with an appropriate pivotingstrategy that takes advantage of its symmetry and sparsityThe methods just outlined for equality constraints can be extended to handle inequalityconstraints by using an active set strategy in which the inequality constraints are provisionally divided into those that are satisfied already (and can therefore be temporarilydisregarded) and those that are violated (and are therefore temporarily treated as equalityconstraints).
This division of the constraints is revised as iterations proceed until eventuallythe correct constraints that are binding at the solution are identified.Example 6.10 Constrained Optimization. As a simple illustration of constrainedoptimization, we minimize the same quadratic function as in our previous examples,f (x) = 0.5x21 + 2.5x22 ,but this time subject to the constraintg(x) = x1 − x2 − 1 = 0.The Lagrangian function is given byL(x, λ) = f (x) + λT g(x) = 0.5x21 + 2.5x22 + λ(x1 − x2 − 1),204CHAPTER 6. OPTIMIZATIONwhere the Lagrange multiplier λ is a scalar in this instance because there is only oneconstraint. Sincex1and Jg (x) = [ 1 −1 ] ,∇f (x) =5x2we have∇x L(x, λ) = ∇f (x) +JgT (x)λx11=+λ.5x2−1Therefore, the system of equations to be solved for a critical point of the Lagrangian isx1 + λ = 0,5x2 − λ = 0,x1 − x2 = 1,which in this case is a linear system whose matrix formulation is 101x1005 −1 x2 = 0 .1 −10λ1Solving this system, we obtain the solutionx1 = 0.833,x2 = −0.167,λ = −0.833.The solution is illustrated in Fig.