Nash - Compact Numerical Methods for Computers (523163), страница 47
Текст из файла (страница 47)
However, in a smallsample of runs, the cost of running the Fletcher program was higher than that ofrunning the FORTRAN algorithm 23 (Nash 1977).Jones (1970) uses an explicit search along spirals between the Gauss-Newtonand steepest descents directions. He reports very favourable performance for thisprocedure and some of my colleagues who have employed it speak well of it.Unfortunately, it does not seem possible to make it sufficiently compact for a‘small’ computer.Since the first edition was published, algorithm 23 has proved to be highly reliable.Nevertheless, I have at times been interested in using other approaches to minimisinga nonlinear sum of squares. In LEQB05 (Nash 1984b), the singular-value decomposition is used to solve the Gauss-Newton linear least-squares problem of equation(17.16) without the formation of the Jacobian inner-products matrix. In Nash andWalker-Smith (1987), we added the facility to impose bounds constraints on theparameters.
We also considered changing the method of solving equation (17.16)from the current Choleski method to a scaled Choleski method, but found that thischange slowed the iterations.216Compact numerical methods for computersExample 17.1. Marquardt’s minimisation of a nonlinear sum of squaresThe following output from a Data General NOVA (machine precision = 2-22)shows the solution of the problem stated in example 12.2 from the starting pointb(0) = (200, 30, -0·4)T.
The program which ran the example below used a test withinthe function (residual) subroutine which set a flag to indicate the function was notcomputable if the argument of the exponential function within the expression for thelogistic model exceeded 50. Without such a test, algorithm 23 (and indeed the otherminimisers) may fail to converge to reasonable points in the parameter space.NEWLOAD ENHMRTLOAD ENHHBSRUNENHMRT FEB 16 76REVISED MARQUARDT# OF VARIABLES ? 1# OF DATA POINTS ? 12# OF PARAMETERS ? 3ENTER VARIABLESVARIABLE 1 :? 5.308 ? 7.24 ? 9.638 ? 12.866 ? 17.609? 23.192 ? 31.443 ? 38.558 ? 50.156 ? 62.948? 75.995 ? 91.972ENTER STARTING VALUES FOR PARAMETERSB( 1 )=? 200B( 3 )= ? 30B( 3 )= -.4ITN 1 SS= 23586.3LAMBDA= .00004ITN 2 SS= 327.692LAMBDA= .000016ITN 3 SS= 51.1076LAMBDA= 6.4E-6ITN 4 SS= 2.65555LAMBDA= 2.56E-6ITN 5 SS= 2.58732LAMBDA= 1.024E-6ITN 6 SS= 2.58727LAMBDA= 4.096E-7LAMBDA= 4.096E-6LAMBDA= 4.096E-5LAMBDA= 4.096E-4LAMBDA= .004096ITN 7 SS= 2.58726LAMBDA= 1.6384E-3LAMBDA= .016384CONVERGED TO SS= 2.58726 # ITNS= 7 # EVALNS= 12SIGMA?2= .287473B( 1 )= 196.186 STD ERR= 11.3068 GRAD( 1 )= -7.18236E-6B( 2 )= 49.0916 STD ERR= 1.68843 GRAD( 2 )= 1.84178E-5B( 3 )= -.31357 STD ERR= 6.8632E-3 GRAD( 3 )= 8.48389E-3RESIDUALS1.18942E-2-3.27625E-29.20258E-2.208776.392632-5.75943E-2 -1.10573 .71579 -.107643 -.348396.652573 -.287567The derivatives in the above example were computed analytically.
UsingMinimising a nonlinear sum of squares217numerical approximation of the Jacobian as in §18.2 gives, from the samestarting point,b*=(196·251,49·1012,-0·313692)Twith S( b*)=2·6113 in 7 iterations and 16 function evaluations. By comparison, ifthe starting point isb (0)=(1, 1, 1)TS(b (0))=24349·5then using analytic derivatives yieldsb*=(196·151,49·0876,-0·313589)Twith S( b*)=2·58726 in 19 iterations and 33 evaluations, while numericallyapproximated derivatives giveb*=(194·503,48·8935,-0·314545)Twith S( b*)=2·59579 in 20 iterations and 36 evaluations.Previous Home NextChapter 18LEFT-OVERS18.1.
INTRODUCTIONThis chapter is entitled ‘left-overs’ because each of the topics-approximationof derivatives, constrained optimisation and comparison of minimisationalgorithms-has not so far been covered, though none is quite large enough inthe current treatment to stand as a chapter on its own. Certainly a lot more couldbe said on each, and I am acutely aware that my knowledge (and particularly myexperience) is insufficient to allow me to say it. As far as I am aware, very littlework has been done on the development of compact methods for the mathematical programming problem, that is, constrained minimisation with many constraints. This is a line of research which surely has benefits for large machines, butit is also one of the most difficult to pursue due to the nature of the problem.
Theresults of my own work comparing minimisation algorithms are to my knowledgethe only study of such methods which has been made on a small computer. Withthe cautions I have given about results derived from experiments with a singlesystem, the conclusions made in §18.4 are undeniably frail, though they are forthe most part very similar to those of other workers who have used largercomputers.18.2. NUMERICAL APPROXIMATION OF DERIVATIVESIn many minimisation problems, the analytic computation of partial derivatives isimpossible or extremely tedious.
Furthermore, the program code to compute(18.1)in a general unconstrained minimisation problem or(18.2)in a nonlinear least-squares problem may be so long as to use up a significantproportion of the working space of a small computer. Moreover, in my experience9 cases out of 10 of ‘failure’ of a minimisation program are due to errors in thecode used to compute derivatives. The availability of numerical derivativesfacilitates a check of such possibilities as well as allowing programs which requirederivatives to be applied to problems for which analytic expressions for derivatives are not practical to employ.In the literature, a great many expressions exist for numerical differentiation offunctions by means of interpolation formulae (see, for instance, Ralston 1965).However, in view of the large number of derivative calculations which must be218Left-overs219made during the minimisation of a function, these are not useful in the presentinstance.
Recall that(18.3)where ej is the jth column of the unit matrix of order n (b is presumed to have nelements). For explanatory purposes, the case n=1 will be used. In place of thelimit (18.3), it is possible to use the forward differenceD= [ S(b+h )- S(b)]/ h(18.4)for some value of h.Consider the possible sources of error in using D.(i) For h small, the discrete nature of the representation of numbers in thecomputer causes severe inaccuracies in the calculation of D.
The function S iscontinuous; its representation is not. In fact it will be a series of steps. Therefore,h cannot be allowed to be small. Another way to think of this is that since most ofthe digits of b are the same as those of (b+h), any function S which is not varyingrapidly will have similar values at b and (b+h), so that the expression (18.4)implies a degree of digit cancellation causing D to be determined inaccurately.(ii) For h large, the line joining the points (b , S(b)) and (b+h, S(b+h)) is nolonger tangential to the curve at the former. Thus expression (18.4) is in error dueto the nonlinearity of the function.
Even worse, for some functions there may bea discontinuity between b and (b+h). Checks for such situations are expensive ofboth human and computer time. The penalty for ignoring them is unfortunatelymore serious.As a compromise between these extremes, I suggest letting(18.5)where ε is the machine precision.
The parameter has once more been given asubscript to show that the step taken along each parameter axis will in general bedifferent. The value for h given by (18.5) has the pleasing property that it cannotbecome smaller than the machine precision even if bj is zero. Neither can it fail tochange at least the right-most half of the digits in bj since it is scaled by themagnitude of the parameter. Table 18.1 shows some typical results for thisstep-length choice compared with some other values.Some points to note in table 18.1 are:(i) The effect of the discontinuity in the tangent function in the computations forb=1 and b=1·57 (near π/2).
The less severely affected calculations for b=-1·57 suggest that in some cases the backward differenceD=[ S(b)-S (b- h)]/ h(18.6)may be preferred.(ii) In approximating the derivative of exp(0·001) using h=1·93024E-6 as inequation (18.5), the system used for the calculations printed identical values forexp(b) and exp(b+h) even though the internal representations were differentTABLE 18.1. Derivative approximations computed by formula (18.4) on a Data General NOVA. (Extendedlog(b)h10·06253·90625E–32·44141E–41·52588E–59·53674E–71·93024E–69·77516E–49·76572E–29·76658E–31·53416E–3Analyticderivative‡BASICsystem.
Machine precision = 16-5.)tan(b)exp(b)b = 0·001b=1b = 1006·9087666·4166407·171894·754992·4371000999·012†0·6931470·9699930·9980320·999756110·95064E–39·99451E–31·00098E–21·17188E–20·06250b = –107·80101E–54·68483E–54·54858E–54·52995E–54·3869E–53·05176E–5b = 0·001b=l1·721·032941·002931110·988142†4·670772·805022·723632·718752·7520·999512†b=0b = 1004·6188E432·7737H3432·69068E432·67435E431·81193E430b=l1·55741 –3·742451·00133·800453·4470210·999999 3·425780·999999 3·50·999999† 52·72†b = l·57b = –1.57–1256·61–20354·4–4038442·2655E61·57835E61·24006E61255·32198432670941·2074E61·59459E62·47322E63·43317†2·82359E43†9·9999E–3†4·56229E–5†–1·70216E6† 539026†100010·014·54E–51.0012.71828 2·68805E43 13.425521·57744E61·57744E6† h computed by formula (18.5).‡ The analytic derivative reported has been evaluated on the NOVA system and maybe in error to the extent that the special function routines of this systemare faulty.Left-overs221enough to generate a reasonable derivative approximation.