Nash - Compact Numerical Methods for Computers (523163), страница 43
Текст из файла (страница 43)
,(i-l), and gi is orthogonal to each of these directions. Therefore, we havez i j=0for j<(i- 1 )(16.7)(16.8)Alternatively, using(16.9)which is a linear combination of gj , j=1, 2, . . . , (i-1), we obtain(16.10)(16.11)by virtue of the orthogonality mentioned above.As in the case of variable metric algorithms, the formulae obtained forquadratic forms are applied in somewhat cavalier fashion to the minimisation ofgeneral nonlinear functions. The formulae (16.8), (16.10) and (16.11) are now nolonger equivalent. For reference, these will be associated with the names: Beale(1972) and Sorenson (1969) for (16.8); Polak and Ribiere (1969) for (16.10); andFletcher and Reeves (1964) for (16.11).
All of these retain the space-savingtwo-term recurrence which makes the conjugate gradients algorithms so frugal ofstorage.In summary, the conjugate gradients algorithm proceeds by settingt 1 = -g(b 1 )(16.12)i, i - 1t i - 1 -g i (b i )(16.13)andt j =zwithb j +l=b j +k jt jwhere kj is determined by a linear search for a ‘minimum’ of S(bj +kj tj )respect to kj.(16.14)with16.2. A PARTICULAR CONJUGATE GRADIENTS ALGORITHMA program to use the ideas of the previous section requires that a choice be madeof (a) a recurrence formula to generate the search directions, and (b) a linearsearch.Descent to a minimum II: conjugate gradients199Since the conjugate gradients methods are derived on the presumption thatthey minimise a quadratic form in n steps, it is also necessary to suggest a methodfor continuing the iterations after n steps. Some authors, for instance Polak andRibiere (1969), continue iterating with the same recurrence formula. However,while the iteration matrix B in the variable metric algorithms can in a number ofsituations be shown to tend towards the inverse Hessian H-1 in some sense, theredo not seem to be any similar theorems for conjugate gradients algorithms.Fletcher and Reeves (1964) restart their method every (n+1) steps witht 1 = -g 1(16.15)while Fletcher (1972) does this every n iterations.
Powell (1975a, b) has somemuch more sophisticated procedures for restarting his conjugate gradientsmethod. I have chosen to restart every n steps or whenever the linear search canmake no progress along the search direction. If no progress can be made in thefirst conjugate gradient direction--that of steepest descent-then the algorithm istaken to have converged.The linear search used in the first edition of this book was that of §13.2. However,this proved to work well in some computing environments but poorly in others. Thepresent code uses a simpler search which first finds an ‘acceptable point’ by stepsizereduction, using the same ideas as discussed in §15.3.
Once an acceptable point hasbeen found, we have sufficient information to fit a parabola to the projection of thefunction on the search direction. The parabola requires three pieces of information.These are the function value at the end of the last iteration (or the initial point), theprojection of the gradient at this point onto the search direction, and the new (lower)function value at the acceptable point.
The step length resulting from the quadraticinverse interpolation is used to generate a new trial point for the function. If thisproves to give a point lower than the latest acceptable point, it becomes the startingpoint for the next iteration. Otherwise we use the latest acceptable point, which is thelowest point so far.A starting step length is needed for the search. In the Newton and variable metric(or quasi-Newton) methods, we can use a unit step length which is ideal for theminimisation of a quadratic function. However, for conjugate gradients, we do nothave this theoretical support.
The strategy used here is to multiply the best steplength found in the line search by some factor to increase the step. Our own usualchoice is a factor 1.7. At the start of the conjugate gradients major cycle we set thestep length to 1. If the step length exceeds 1 after being increased at the end of eachiteration, it is reset to 1.If the choice of linear search is troublesome, that of a recurrence formula iseven more difficult. In some tests by the author on the 23-bit binary NOVA, theBeale-Sorenson formula (16.8) in conjunction with the linear search chosen aboverequired more function and derivative evaluations than either formula (16.10) orformula (16.11).
A more complete comparison of the Polak-Ribiere formula(16.10) with that of Fletcher-Reeves (16.11) favoured the former. However, it isworth recording Fletcher’s (1972) comment: ‘I know of no systematic evidencewhich indicates how the choice should be resolved in a general-purpose algorithm.’ In the current algorithm, the user is given a choice of which approachshould be used.200Compact numerical methods for computersIn other sections, conjugate gradients algorithms specially adapted to particularproblems will be reported. It is unfortunate that the application to such problemsof the general-purpose minimisation algorithm proposed here and detailed in thenext section (problems such as, for instance, minimising the Rayleigh quotient tofind a matrix eigensolution) may show the method to be extremely slow toconverge.A detail which remains to be dealt with is that the initial step length needs to beestablished.
The valuek=l(16.16)is probably as good as any in the absence of other information, though Fletcherand Reeves (1964) use an estimate e of the minimum value of the function in thelinear approximationk= [e-S(b) ] /gT t.(16.17)In any event, it is advisable to check that the step does change the parameters andto increase k until they are altered. After a conjugate gradients iteration, theabsolute value of k can be used to provide a step length in the next search.Algorithm 22. Function minimisation by conjugate gradientsprocedure cgmin(n: integer; {the number of parameters in thefunction to be minimised}vax Bvec, X: rvector; {the parameter values oninput (Bvec) and output (X) from minmeth}var Fmin: real; {‘minimum’ function value}Workdata: probdata; {user defined data area}var fail: boolean; {true if method has failed}var intol: real); {user-initialized convergencetolerance}{alg22.pas == modified conjugate gradients functionminimisation methodOriginal method due to R.
Fletcher & C M Reeves, Computer Journal,vol 7, pp. 149-154, 1964Copyright 1988 J.C.Nash}typemethodtype= (Fletcher_Reeves, Polak_Ribiere, Beale_Sorenson);constMaxparm = 25; { maximum allowed number of parameters in thepresent code. May be changed by the user,along with dependent constants below.}stepredn = 0.2; {factor to reduce stepsize in line search}acctol = 0.0001; {acceptable point tolerance -- see STEP 13}reltest = 10.0; {to check equality of parameters -- see STEP 8}varaccpoint : boolean; {to indicate an acceptable point}c : rvector; {to store last gradient}count : integer; {to check for parameter equality}cycle : integer; {cycle count of cg process}cyclimit : integer; {limit on cycles per major cg sweep}Descent to a minimum II: conjugate gradientsAlgorithm 22.
Function minimisation by conjugate gradients (cont.)f : real; {temporary function value}funcount : integer; {count of function evaluations}g : rvector; {to hold gradient}G1, G2 : real; {temporary working storage}G3, gradproj : real; {temporary working storage}gradcount : integer; {count of gradient evaluations}i, j : integer; {working integers}method : methodtype;newstep : real; {interpolation steplength}notcomp : boolean; {non-computability flag}oldstep : real; {last stepsize used}s : real; {inner product accumulator}setstep : real; {to allow for adjustment of best step beforenext iteration}steplength: real; {linear search steplength}t : r-vector; {to store working vector for line search}tol : real; {convergence tolerance}beginwriteln(‘alg22.pas -- Nash Algorithm 22 version 2 1988-03-24’);writeln(’Conjugate gradients function minimiser’);{method:=Fletcher-Reeves;} {set here rather than have an input}writeln(‘Steplength saving factor multiplies best steplength found at the’);writeln(‘end of each iteration as a starting value for next search’);write(‘Enter a steplength saving factor (sugg.
1.7) -- setstep’);readln(infile, setstep);if infname<>’con’ then writeln(setstep);write(‘Choose method (l=FR, 2=PR, 3=BS) ‘);readln(infile, i); if infname<>’con’ then writeln(i);case i of1: method:=Fletcher_Reeves;2: method:=Polak_Ribiere;3: method:=Beale_Sorenson;else halt;end;case method ofFletcher_Reeves: writeln(‘Method: Fletcher Reeves*);Polak_Ribiere: writeln(‘Method: Polak Ribiere’);Beale_Sorenson: writeln(‘Method: Beale Sorenson’);end;fail:=false; {method has not yet failed!}cyclimit:=n; {use n steps of cg before re-setting to steepest descent}if intol<0.0 then intol:=Calceps; {allows machine to set a value}tol:=intol*n*sqrt(intol); {gradient test tolerance}{Note: this tolerance should properly be scaled to the problem at hand.However, such a scaling presumes knowledge of the function and gradientwhich we do not usually have at this stage of the minimisation.}writeln(‘tolerance used in gradient test=‘, tol);f:=fminfn(n, Bvec, Workdata, notcomp); {initial fn calculation} {STEP 1}if notcomp thenbeginwriteln(‘**** Function cannot be evaluated at initial parameters ****‘);fail := true; {method has failed}201202Compact numerical methods for computersAlgorithm 22.