Nash - Compact Numerical Methods for Computers (523163), страница 46
Текст из файла (страница 46)
When the function isMinimising a nonlinear sum of squares211sufficiently well behaved for the unmodified Gauss-Newton algorithm to worksuccessfully, the linear search introduces some waste of effort. Furthermore, whenthere are large differences in scale between the elements of the Jacobian J and/orit is difficult to evaluate these accurately, then the condition of J T J may deteriorate so that positive definiteness can no longer be guaranteed computationally andthe result (17.19) is inapplicable. Nevertheless, I have found that a carefullycoded implementation of Hartley’s (1961) ideas using the Choleski decompositionand back-solution for consistent (but possibly singular) systems of linear equationsis a very compact and efficient procedure for solving the nonlinear least-squaresproblem.
However, it has proved neither so simple to implement nor so generallyreliable as the method which follows and, by virtue of its modifications to handlesingular JTJ, no longer satisfies the conditions of the convergence result (17.19).17.4. MARQUARDT’S METHODThe problems of both scale and singularity of J T J are attacked simultaneously byMarquardt (1963). Consider solutions q to the equations( J TJ+e 1) q=-JTf(17.20)where e is some parameter. Then as e becomes very large relative to the norm ofJTJ, q tends towards the steepest descents direction, while when e is very smallcompared to this norm, the Gauss-Newton solution is obtained. Furthermore, thescaling of the parametersx' =Dx(17.21)where D is a diagonal matrix having positive diagonal elements, implies atransformed Jacobian such thatJ ' =JD - 1(17.22)and equations (17.20) become[(J' ) TJ'+ e1]q'=-( J') Tf=( D - 1JTJD - 1 + e1)Dq = - D- 1JTf(17.23a)(17.23b)or(JTJ+eD 2 ) q= - JTf(17.24)so that the scaling may be accomplished implicitly by solving (17.24) instead of(17.23 a).Marquardt (1963) and Levenberg (1944) have suggested the scaling choice(17.25)THowever, to circumvent failures when one of the diagonal elements of J J is zero,I prefer to use(17.26)where f is some number chosen to ensure the scale is not too small (see Nash1977).
A value of f=1 seems satisfactory for even the most pathologicalproblems. The matrix JTJ+e D2 is always positive definite, and by choosing e212Compact numerical methods for computerslarge enough can, moreover, be made computationally positive definite so that thesimplest forms of the Choleski decomposition and back-solution can be employed.That is to say, the Choleski decomposition is not completed for non-positivedefinite matrices.
Marquardt (1963) suggests starting the iteration with e=0·1,reducing it by a factor of 10 before each step if the preceding solution q hasgivenS(x+ q) <S( x)and x has been replaced by (x+q). IfS(x+q)> S (x)then e is increased by a factor of 10 and the solution of equations (17.24)repeated. (In the algorithm below, e is called lambda.)17.5. CRITIQUE AND EVALUATIONBy and large, this procedure is reliable and efficient. Because the bias e is reducedafter each successful step, however, there is a tendency to see the followingscenario enacted by a computer at each iteration, that is, at each evaluation of J:(i) reduce e, find S (x+q)>S(x);(ii) increase e, find S(x+q)<S(x), so replace x by (x+q ) and proceed to (i).This procedure would be more efficient if e were not altered. In other examplesone hopes to take advantage of the rapid convergence of the Gauss-Newton partof the Marquardt equations by reducing e, so a compromise is called for.
I retain10 as the factor for increasing e, but use 0·4 to effect the reduction. A furthersafeguard which should be included is a check to ensure that e does not approachzero computationally. Before this modification was introduced into my program,it proceeded to solve a difficult problem by a long series of approximateGauss-Newton iterations and then encountered a region on the sum-of-squaressurface where steepest descents steps were needed. During the early iterations eunderflowed to zero, and since JT J was singular, the program made many futileattempts to increase e before it was stopped manually.The practitioner interested in implementing the Marquardt algorithm will findthese modifications included in the description which follows.Algorithm 23.
Modified Marquardt method for minimising a nonlinear sum-of-squaresfunctionprocedure modmrt( n : integer; {number of residuals and number of parameters}var Bvec : rvector; {parameter vector}var X : rvector; {derivatives and best parameters}var Fmin : real; {minimum value of function}Workdata:probdata);{alg23.pas == modified Nash Marquardt nonlinear least squares minimisationmethod.Copyright 1988 J.C.Nash}varMinimising a nonlinear sum of squares213Algorithm 23. Modified Marquardt method for minimising a nonlinear sum-of-squaresfunction (cont.)a, c: smatvec;delta, v : rvector;dec, eps, inc, lambda, p, phi, res : real;count, i, ifn, igrad, j, k, nn2, q : integer;notcomp, singmat, calcmat: boolean;beginwriteln(‘alg23.pas -- Nash Marquardt nonlinear least squares’);with Workdata dobegin {STEP 0 partly in procedure call}if nlls = false then halt; {cannot proceed if we do not have a nonlinearleast squares problem available}Fmin:=big; {safety setting of the minimal function value}inc:=10.0; {increase factor for damping coefficient lambda}dec:=0.4; {decrease factor for damping coefficient lambda}eps:=calceps; {machine precision}lambda:=0.0001; {initialize damping factor}phi:=1.0; {set the Nash damping factor}ifn:=0; igrad:=0; {set the function and gradient evaluation counts}calcmat:=true; {to force calculation of the J-transpose * J matrix andJ-transpose * residual on the first iteration}nn2:=(n*(n+l)) div 2; {elements in the triangular form of the innerproduct matrix -- ensure this is an integer}p:=0.0; {STEP 1}for i:=1 to m dobeginres:-lllres(i, n, Bvec, notcomp); {the residual}{writeln(‘res[‘,i,’]=’,res);if notcomp then halt; {safety check on initial evaluation}p:=p+res*res; {sum of squares accumulation}end;ifn:=ifn+1; {count the function evaluation}Fmin:=p; {to save best sum of squares so far}count:=0; {to avoid convergence immediately}{STEP 2}while count<n dobegin {main Marquardt iteration}{NOTE: in this version we do not reduce the damping parameter here,The structure of Pascal lends itself better to adjusting the dampingparameter below.}if calcmat thenbegin {Compute sum of squares and cross-products matrix}writeln(igrad,’ ’,ifn,’ sum of squares=’,Fmin);for i:=1 to n dobeginwrite(Bvec[i]:10:fi,’ ’);if (7 * (i div 7) = i) and (i<r) then writeln;end; {loop on i}writeln;igrad:=igrad+1; {STEP 3}for j:=1 to nn2 do a[j]:=0.0;for j:=1 to n do v[j]:=0.0;for i:=1 to m do {STEP 4}214Compact numerical methods for computersAlgorithm 23.
Modified Marquardt method for minimising a nonlinear sum-of-squaresfunction (cont.)beginnljac(i, n, Bvec, X); {puts i’th row of Jacobian in X}res:=nlres(i, n, Bvec, notcomp); {This calculation is not reallynecessary. The results of the sum of squares calculationcan be saved in a residual vector to avoid therecalculation. However, this way saves storing a possiblylarge vector. NOTE: we ignore notcomp here, since thecomputation of the residuals has already been provenpossible at STEP 1.}for j:=1 to n dobeginv[j]:=vu[j]+X[j]*res; {to accumulate the gradient}q:=(j*(j-l)) div 2; {to store the correct position in therow-order vector form of the lower triangle of a symmetricmatrix}for k:=1 to j do a[q+k]:=a[q+k]+X[j]*X[k];end; {loop on j}end; {loop on i}for j:=1 to nn2 do c[j]:=a[j]; {STEP 5 -- copy a and b}for j:=1 to n do X[j]:=Bvec[j]; {to save the best parameters}end; {if calcmat}writeln(‘LAMDA=’,lambda:8); {STEP 6}for j:=1 to n dobeginq:=(i*(j+l)) div 2;a[q]:=c[q]*(1.0+lambda)+phi*lambda; {damping added}delta[j]:=-v[j]; {to set RHS in equations 17.24}if j>1 thenfor i:=1 to (j-1) do a[q-i]:=c[q-i];end; {loop on j}notcomp:=false; {to avoid undefined value}Choldcmp(n, a, singmat); {STEP 7 -- Choleski factorization}if (not singmat) then {matrix successfully decomposed}begin {STEP 8 -- Choleski back-substitution}Cholback(n, a, delta); {delta is the change in the parameters}count:=0; {to count parameters unchanged in update}for i:=1 to n do {STEP 9}beginBvec[i]:=X[i]+delta[i]; {new = old + update}if (reltest+Bvec[i])=(reltest+X[i]) then count:=count+1;{Here the global constant reltest is used to test for theequality of the new and old parameters.}end; {loop on i over parameters)if count<n then {STEP 10: parameters have been changed}begin {compute the sum of squares, checking computability}p:=0.0; i:=0; {initialization}repeati:=i+1; res:=nlres(i,n,Bvec,notcomp);if (not notcomp) then p:=p+res*res;until notcomp or (i>=n);ifn:=ifn+1; {count the function evaluation}end; {if count<n}Minimising a nonlinear sum of squares215Algorithm 23.
Modified Marquardt method for minimising a nonlinear sum-of-squaresfunction (cont.)end; {if not singmat}if count<n thenif (not singmat) and (not notcomp) and (p<Fmin) thenbegin {successful in reducing sum of squares}lambda:=lambda*dec; {to decrease the damping parameter}Fmin:=p; {to save best sum of squares so far}calcmat:=true; {to perform new iteration}endelse {somehow have not succeeded in reducing sum ofsquares: matrix is singular, new point is nota feasible set of parameters, or sum of squaresat new point not lower}beginlambda:=lambda*inc; {to increase the damping parameter}if lambdaceps*eps then lambda:=eps; {safety check}calcmat:=false; {since we simply renew search from the samepoint in the parameter space with new search step}end; {adjustment of damping factor}{This also ends ‘if count<n’}end; {while count<n}end; {with Workdata}end; {alg23.pas == modmrt}17.6.
RELATED METHODSFletcher (1971) has proposed a Marquardt method in which the bias e (lambda) iscomputed on the basis of some heuristic rules rather than adjusted arbitrarily. Hisprogram, incorporated in the Harwell and NAG libraries, has had widespreadusage and seems highly reliable. On straightforward problems I have found it veryefficient, and in terms of evaluations of J and S(b) generally better than aFORTRAN version of algorithm 23 even on difficult problems.