Nash - Compact Numerical Methods for Computers (523163), страница 42
Текст из файла (страница 42)
Therefore, a restart is suggested in any of the following cases:(i) t T g>0, that is, the direction of search is ‘uphill’.(ii) b'=b, that is, no change is made in the parameters by the linear searchalong t.If either case (i) or case (ii) occurs during the first step after B has been set tothe unit matrix, the algorithm is taken to have converged.(iii) Since the method reduces S along t, it is expected thatt T g(b')will be greater (less negative) thant T g(b).Figure 15.1 illustrates this idea. Therefore, tT y =d 1 should be positive. If it is notthere exists the danger that B may no longer be positive definite. Thus if t T y < 0 ,the matrix B is reset to unity.This completes the general description of a variable metric algorithm suitablefor small computers. I am indebted to Dr R Fletcher of Dundee University forsuggesting the basic choices of the Broyden-Fletcher-Shanno updating formulaand the acceptable point search.
Note particularly that this search does notrequire the gradient to be evaluated at each trial pointAlgorithm 21. Variable metric minimiserThe algorithm needs an order-n square matrix B and five order-n vectors b, x, c, g and t. Careshould be taken when coding to distinguish between B and b.procedure vmmin(n: integer; {the number of parameters in thefunction to be minimised}var Bvec, X: rvector; {the parameter values oninput (Bvec) and output (X) from minmeth}var Fmin: real; {’minimum’ function value}Workdata: probdata; {user defined data area}var fail: boolean; {true if method has failed}var intol: real); {user-initialized convergencetolerance}{alg21.pas == modified Fletcher variable metric method.,Original method due to R. Fletcher, Computer Journal, vol 13,pp. 317-322, 1970Unlike Fletcher-Reeves, we do not use a quadratic interpolation,since the search is often approximately a Newton stepCopyright 1988 J.C.Nash}constMaxparm = 25; {maximum allowed number of parameters in thepresent code.
May be changed by the user,along with dependent constants below.}stepredn = 0.2; {factor to reduce stepsize in line search}Descent to a minimum I: variable metric algorithmsAlgorithm 21. Variable metric minimiser (cont.)acctol = 0.0001; {acceptable point tolerance -- see STEP 11}reltest = 10.0; {to check equality of parameters -- see STEP 8}varaccpoint : boolean, {to indicate an acceptable point}B : array[1..Maxparm, 1..Maxparm] of real;{approximation to inverse Hessian}c : rvector; {to store last gradient}count : integer; {to check for parameter equality}D1, D2 : real; {temporary working storage}f : real; {temporary function value}funcount : integer; {count of function evaluations}g : r-vector; {to hold gradient}gradcount : integer; {count of gradient evaluations}gradproj : real; {gradient projection on search vector}i, j : integer; {working integers}ilast : integer; {records last step at which B wasinitialized to a unit matrix}notcomp : boolean; {non-computability flag}s : real; {inner product accumulator}steplength: real; {linear search steplength}t : rvector; {to store working vector for line search}beginwriteln(‘alg21.pas -- version 2 1988-03-24’);writeln(‘Variable metric function minimiser’);fail:=false; {method has yet to fail}f:=fminfn(n, Bvec, Workdata, notcomp); {initial fn calculation -- STEP 1}if notcomp thenbeginwriteln(‘**** Function cannot be evaluated at initial parameters ****’);fail := true; {method has failed}endelse {proceed with minimisation}beginFmin:=f;{save the best value so far}funcount:=l; {function count initialized to 1}gradcount:=l; {initialize gradient count to 1}fmingr(n, Bvec, Workdata, g); {STEP 2}ilast:=gradcount; {set count to force initialization of B}{STEP 3 -- set B to unit matrix -- top of major cycle}repeat {iteration for method -- terminates when no progress can bemade, and B is a unit matrix so that search is a steepestdescent direction.}if ilast=gradcount thenbeginfor i:=1 to n dobeginfor j:=1 to n do B[i, j]:=0.0; B[i, i]:=1.0;end; {loop on i}end; {initialize B}writeln(gradcount,’ ‘, funcount,’ ’, Fmin); {STEP 4}write(‘parameters ’);for i:=1 to n do write(Bvec[i]:10:5,’ ’);193194Compact numerical methods for computersAlgorithm 21.
Variable metric minimiser (cont.)writeln;for i:=1 to n dobeginX[i]:=Bvec[i];{save best parameters}c[i]:=g[i]; {save gradient}end; {loop on i}{STEP 5 -- set t:=-B*g and gradproj=tT*g}gradproj:=0.0; {to save tT*g inner product}for i:=1 to n dobegins:=0.0; {to accumulate element of B*g}for j:=l to n do s:=s-B[i, j]*g[j];t[i]:=s; gradproj:=gradproj+s*g[i];end; {loop on i for STEP 5}{STEP 6} {test for descent direction}if gradproj<0 then {if gradproj<0 then STEP 7 to perform linearsearch; other parts of this step follow the ‘else’ below}begin {STEP 7 -- begin linear search}steplength:=1.0; {always try full step first}{STEP 8 -- step along search direction and test for a change}accpoint:=false; {don’t have a good point yet}repeat {line search loop}count:=0; {to count unchanged parameters}for i:=1 to n dobeginBvec[i]:=X[i]+steplength*t[i];if (reltest+X[i])=(reltest+Bvec[i]) then count:=count+1;end; {loop on i}if count<n then {STEP 9 -- main convergence test}begin {STEP 10 -- can proceed with linear search}f:=fminfn(n, Bvec, Workdata, notcomp); {function calculation}funcount:=funcount+1;accpoint:=(not notcomp) and(f<=Fmin+gradproj*steplength*acctol);{STEP 11 -- a point is acceptable only if function is computable(not notcomp) and it satisfies the acceptable point criterion}if not accpoint thenbeginsteplength:=steplength*stepredn; write(‘*’);end;end; {compute and test for linear search}until (count=n) or accpoint; {end of loop for line search}if count<n thenbeginFmin:=f; {save funcion value}fmingr(n, Bvec, Workdata, g); {STEP 12}gradcount:=gradcount+1;D1:=0.0; {STEP 13 -- prepare for matrix update}for i:=1 to n dobegint[i]:=steplength*t[i]; c[i]:=g[i]-c[i]; {to compute vector y}D1:=D1+t[i]*c[i]; {to compute inner product t*y}Descent to a minimum I: variable metric algorithmsAlgorithm 21.
Variable metric minimiser (cont.)end; {loop on i}if D1>0 then {STEP 14} {test if update is possible}begin {update}D2:=0.0; {STEP 15 -- computation of B*y, yT*B*y}for i:=1 to n dobegins:=0.0;for j:=1 to n do s:=s+B[i, j]*c[j];X[i]:=s; D2:=D2+s*c[i];end; {loop on i}D2:=1.0+D2/D1; {STEP 16 -- complete update}for i:=1 to n dobeginfor j:=1 to n dobeginB[i,j]:=B[i,j]-(t[i]*X[j]+X[i]*t[j]-D2*t[i]*t[j])/D1;end; {loop on j}end; {loop on i -- Update is now complete.}end {update}elsebeginwriteln(‘ UPDATE NOT POSSIBLE*);ilast:=gradcount; {to force a restart with B = l(n)}end;end {if count<n} {STEP 17}else {count<n, cannot proceed}beginif ilast<gradcount thenbegincount:=0; {to force a steepest descent try}ilast:=gradcount; {to reset to steepest descent search}end; {if ilast}end; {count=n}end {if gradproj<0 ...
do linear search}elsebeginwriteln(‘UPHILL SEARCH DIRECTION’);ilast:=gradcount; {(to reset B to unit matrix}count:=0; {to ensure we try again}end,until (count=n) and (ilast=gradcount);end, {minimisation -- STEP 18}{STEP 31 -- save best parameters and function value found}writeln(‘Exiting from alg21.pas variable metric minimiser’);writeln(‘ ‘, funcount,’ function evaluations used’);writeln(‘ ‘, gradcount,’ gradient evaluations used’);end; {alg21.pas == vmmin}195196Compact numerical methods for computersExample 15.1. Illustration of the variable metric algorithm 21The following output from an IBM 370/168 operating in single precision (sixhexadecimal digits) was produced by a FORTRAN version of algorithm 21 as itminimised the Rosenbrock banana-shaped valley function (Nash 1976)using analytic derivatives.
The starting point b1=-1·2, b 2=1 was used.####################################ITNS= 1 # EVALNS=ITNS= 2 # EVALNS=ITNS= 3 # EVALNS=ITNS= 4 # EVALNS=ITNS= 5 # EVALNS=ITNS= 6 # EVALNS=ITNS= 7 # EVALNS=ITNS= 8 # EVALNS=ITNS= 9 # EVALNS=ITNS= 10 # EVALNS=ITNS= 11 # EVALNS=ITNS= 12 # EVALNS=ITNS= 13 # EVALNS=ITNS= 14 # EVALNS=ITNS= 15 # EVALNS=ITNS= 16 # EVALNS=ITNS= 17 # EVALNS=ITNS= 18 # EVALNS=ITNS= 19 # EVALNS=ITNS= 20 # EVALNS=ITNS= 21 # EVALNS=ITNS= 22 # EVALNS=ITNS= 23 # EVALNS=ITNS= 24 # EVALNS=ITNS= 25 # EVALNS=ITNS= 26 # EVALNS=ITNS= 27 # EVALNS=ITNS= 28 # EVALNS=ITNS= 29 # EVALNS=ITNS= 30 # EVALNS=ITNS= 31 # EVALNS=ITNS= 32 # EVALNS=ITNS= 33 # EVALNS=ITNS= 34 # EVALNS=ITNS= 35 # EVALNS=ITNS= 35 # EVALNS=B( 1)= 0.10000000E+01B( 2)= 0.10000000E+01# ITNS= 35 # EVALNS=1 FUNCTION= 0.24199860E+026 FUNCTION= 0.20226822E+029 FUNCTION= 0.86069937E+0114 FUNCTION= 0.31230078E+0116 FUNCTION= 0.28306570E+0121 FUNCTION= 0.26346817E+0123 FUNCTION= 0.20069408E+0124 FUNCTION= 0.18900719E+0125 FUNCTION= 0.15198193E+0126 FUNCTION= 0.13677282E+0127 FUNCTION= 0.10138159E+0128 FUNCTION= 0.85555243E+0029 FUNCTION= 0.72980821E+0030 FUNCTION= 0.56827205E+0032 FUNCTION= 0.51492560E+0033 FUNCTION= 0.44735157E+0034 FUNCTION= 0.32320732E+0035 FUNCTION= 0.25737345E+0037 FUNCTION= 0.20997590E+0038 FUNCTION= 0.17693651E+0039 FUNCTION= 0.12203962E+0040 FUNCTION= 0.74170172E-0141 FUNCTION= 0.39149582E-0143 FUNCTION= 0.31218585E-0144 FUNCTION= 0.25947951E-0145 FUNCTION= 0.12625925E-0146 FUNCTION= 0.78500621E-0247 FUNCTION= 0.45955069E-0248 FUNCTION= 0.15429037E-0249 FUNCTION= 0.62955730E-0350 FUNCTION= 0.82553088E-0451 FUNCTION= 0.54429529E-0552 FUNCTION= 0.57958061E-0753 FUNCTION= 0.44057202E-1054 FUNCTION= 0.054 FUNCTION= 0.054FUNCTION=0.0Previous HomeChapter 16DESCENT TO A MINIMUM II:CONJUGATE GRADIENTS16.1.
CONJUGATE GRADIENTS METHODSOn a small computer, the principal objection to the Nelder-Mead and variablemetric methods is their requirement for a working space proportional to n2 ,where n is the number of parameters in the function to be minimised. Theparameters b and gradient g require only n elements each, so it is tempting toconsider algorithms which make use of this information without the requirementthat it be collected in a matrix. In order to derive such a method, consider onceagain the quadratic formS(b)=½b T Hb-c T b+(anyscalar)(15.11)of which the gradient at b isg= H b - c .(15.17)Then if the search direction at some iteration j is t j , we havey j =g j + l -g j =k jHt j(15.18)where kj is the step-length parameter.If any initial step t1 is made subject only to its being ‘downhill’, that is(16.1)then the construction of search directions ti , i=1, 2, . .
. , n, conjugate with respectto the Hessian H, is possible via the Gram-Schmidt process (see Dahlquist andBjörck 1974, pp 201-4). That is to say, given an arbitrary new ‘downhill’direction qi at step i, it is possible to construct, by choosing coefficients Zij, adirection(16.2)such thatfor j<i.This is achieved by applying(16.3)to both sides of equation (16.2), giving(16.4)by substitution of the condition (16.3) and the assumed conjugacy of the tj , j = 1 ,2, . .
. , (i-1). Note that the denominator of (16.4) cannot be zero if H is positivedefinite and tj is not null.197Next198Compact numerical methods for computersNow if q i is chosen to be the negative gradientandq i = -g iis substituted from (15.18), then we have(16.5(16.6)Moreover, if accurate line searches have been performed at each of the ( i -1)previous steps, then the function S (still the quadratic form (15.11)) has beenminimised on a hyperplane spanned by the search directions tj , j=1, 2, . . .