Nash - Compact Numerical Methods for Computers (523163), страница 50
Текст из файла (страница 50)
Comparison of algorithm performance as measured by equivalent function evaluations (efe’s).Algorithm19+2021TypeNelder–MeadVariablemetric1242n2 + 4n + 21068n2 + 5n6833394230145·19681829226170·08511079718458·686629158236123·55521606519681·96141·970·98105676·721·090·84388·251·500·5020154·781·250·665588929672Code length†Array elements(a) number ofsuccessful runs(b) total efe’s(c) total parameters(d) w1 = (b)/(c)(e) w0 = average efe’sper parameter(f) r = w0/Wl(g) a = 1/(2r – 1)(h) number of failures(i) number of problemsnot run(j) successes aspercentage ofproblems run21WithnumericallyapproximatedgradientOmittingproblem34†762641932282·0561802125531·45751 439931845·2878·960·961·0816106·011·290·63036·571·160·75867·361·490·5108113104937610088100Conjugategradients22Withnumericallyapproximatedgradient2323WithnumericallyapproximatedJacobian2210595n23Marquardt1231n2 + 5n† Problem 34 of the set of 79 has been designed to have residuals f so that the second derivatives of these residuals cannot be dropped fromequation ( 17, 15) to make the Gauss–Newton approximation.
The failure of the approximation in this case is reflected in the very slow (12000 efe’s)convergence of algorithm 23.‡ On Data General NOVA (23-bit mantissa).230Compact numerical methods for computerstime. Perry and Soland (1975) derive analytic solutions to this problem, but bothto check their results and determine how difficult the problem might be if such asolution were not possible, the following data were run through the Nelder-Meadsimplex algorithm 19 on a Data General NOVA operating in 23-bit binaryarithmetic.K1=3·82821a =0·23047K2=0·416β=0·12K3=5·24263γ =0·648F=8·78602δ =1·116.Starting at the suggested values bT=(7, 4, 300, 1621), S( b)=-77·1569, thealgorithm took 187 function evaluations to find S(b*)=-77·1602 at (b*)T=(6·99741, 3·99607, 300·004, 1621·11).
The very slow convergence here is causefor some concern, since the start and finish points are very close together.From b T=(1, 1, 300, 1621), S(b)=707·155, the Nelder-Mead procedure took888 function evaluations to b*=(6·97865, 3·99625, 296·117, 1619·92)T andS(b*)=-77·1578 where it was stopped manually. However, S was less than -77after only 54 evaluations, so once again convergence appears very slow near theminimum.
Finally, from bT=(1, 1, 1, l), S(b)=5·93981, the algorithm convergedto S(b*)=-77·1078 in 736 evaluations with b*=(11·1905, 3·99003, 481·054,2593·67)T. In other words, if the model of the lottery operation, in particular theproduction function for the number of tickets sold, is valid, there is an alternativesolution which ‘maximises’ revenue per unit time. There may, in fact, be severalalternatives.If we attempt the minimisation of S(b) using the variable metric algorithm 21and analytic derivatives, we obtain the following results.Initial point b( a) 7, 4, 300, 1621( b) 1, 1, 300, 1621( c) 1, 1, 1, 1S (b)-77·1569707·1555·93981Final point b*6·99509, 4·00185, 300, 1621591·676, 563·079, 501·378, 1821·556·99695, 3·99902, 318·797, 1605·5S (b*)efe’s-77·1574 460·703075 157-77·0697252(The efe’s are equivalent function evaluations; see §18.4 for an explanation.) Incase (b), the price per ticket (second parameter) is clearly exorbitant and theduration of the draw (first parameter) over a year and a half.
The first prize (thirdparameter, measured in units 1000 times as large as the price per ticket) isrelatively small. Worse, the revenue (-S) per unit time is negative! Yet thederivatives with respect to each parameter at this solution are small. An additional fact to be noted is that the algorithm was not able to function normally, thatis, at each step algorithm 21 attempts to update an iteration matrix. However,under certain conditions described at the end of §15.3, it is inadvisable to do thisand the method reverts to steepest descent.
In case (b) above, this occurred in 23of the 25 attempts to perform the update, indicating that the problem is very farfrom being well approximated by a quadratic form. This is hardly surprising. Thematrix of second partial derivatives of S is certain to depend very much on theparameters due to the fractional powers (a, β, γ, δ) which appear. Thus it isunlikely to be ‘approximately constant’ in most regions of the parameter space asLeft-overs231required of the Hessian in §15.2. This behaviour is repeated in the early iterationsof case (c) above.In conclusion, then, this problem presents several of the main difficulties whichmay arise in function minimisation:(i) it is highly nonlinear;(ii) there are alternative optima; and(iii) there is a possible scaling instability in that parameters 3 and 4 (v and w)take values in the range 200-2000, whereas parameters 1 and 2 (t and p) are inthe range l-10.These are problems which affect the choice and outcome of minimisation procedures.
The discussion leaves unanswered all questions concerning the reliability ofthe model or the difficulty of incorporating other parameters, for instance to takeaccount of advertising or competition, which will undoubtedly cause the functionto be more difficult to minimise.Example 18.2. Market equilibrium and the nonlinear equations that resultIn example 12.3 the reconciliation of the market equations for supplyq=Kp aand demandhas given rise to a pair of nonlinear equations. It has been my experience thatsuch systems are less common than minimisation problems, unless the latter aresolved by zeroing the partial derivatives simultaneously, a practice which generally makes work and sometimes trouble. One’s clients have to be trained topresent a problem in its crude form.
Therefore, I have not given any specialmethod in this monograph for simultaneous nonlinear equations, which can bewritten(12.5)f (b) =0preferring to solve them via the minimisation off Tf = S(b)(12.4)which is a nonlinear least-squares problem. This does have a drawback, however,in that the problem has in some sense been ‘squared’, and criticisms of the samekind as have been made in chapter 5 against the formation of the sum-of-squaresand cross-products matrix for linear least-squares problems can be made againstsolving nonlinear equations as nonlinear least-squares problems.
Nonetheless, acompact nonlinear-equation code will have to await the time and energy for itsdevelopment. For the present problem we can create the residualsf 1= q- Kp αf 2 = l n (q)-ln(Z)+bln(p) .232Compact numerical methods for computersThe second residual is the likely form in which the demand function would beestimated. To obtain a concrete and familiar form, substituteq =b 1= 1·5p = b2β = 1·2K =1Z = exp(2)so thatf 2 =ln( b 1 )-2+1·2ln( b 2 ) .Now minimising the sum of squaresshould give the desired solution.The Marquardt algorithm 23 with numerical approximation of the Jacobian asin §18.2 givesp =b 2 =2·09647q = b 1 =3·03773with S=5·28328E-6 after five evaluations of the Jacobian and 11 evaluations ofS.
This is effectively 26 function evaluations. The conjugate gradients algorithm22 with numerical approximation of the gradient gavep=2·09739q=3·03747S=2·33526E-8after 67 sum-of-squares evaluations. For both these runs, the starting pointchosen was b 1= b 2=1. All the calculations were run with a Data General NOVAin 23-bit binary arithmetic.Example 18.3. Magnetic rootsBrown and Gearhart (1971) raise the possibility that certain nonlinear-equationsystems have ‘magnetic roots’ to which algorithms will converge even if startingpoints are given close to other roots.
One such system they call the cubicparabola:To solve this by means of algorithms 19, 21, 22 and 23, the residualswere formed and the following function minimised:The roots of the system areR1 : b1=0= b2R2 : b1=0= b2R3 : b 1=-0·75b 2=0·5625.Left-overs233To test the claimed magnetic-root properties of this system, 24 starting pointswere generated, namely the eight points about each root formed by an axial stepof ±0·5 and ±0·1. In every case the starting point was still nearest to the rootused to generate it.All the algorithms converged to the root expected when the starting point wasonly 0·1 from the root.
With the following exceptions they all converged to theexpected root when the distance was 0·5.(i) The Marquardt algorithm 23 converged to R 3 from (-0·5,0) = (b1,b 2) insteadof to R1 .(ii) The Nelder-Mead algorithm 19 found R 2 from (0·5,0) instead of R1 .(iii) The conjugate gradients algorithm 22 found R 3 and the variable metricalgorithm 21 found R1 when started from (1·5,l), to which R2 is closest.(iv) All algorithms found R1 instead of R3 when started from (-0·25, 0·5625).(v) The conjugate gradients algorithm also found R 1 instead of R3 from(-1·25,0·5625).Note that all the material in this chapter is from the first edition of the book.However.
I believe it is still relevant today. We have, as mentioned in chapter 17,added bounds constraints capability to our minimisation codes included in Nash andWalker-Smith (1987). Also the performance figures in this chapter relate to BASICimplementations of the original algorithms. Thus some of the results will alter. Inparticular, I believe the present conjugate gradients method would appear to performbetter than that used in the generation of table 18.5. Interested readers should referto Nash and Nash (1988) for a more modern investigation of the performance ofcompact function minimisation algorithms.Previous HomeChapter 19THE CONJUGATE GRADIENTS METHOD APPLIEDTO PROBLEMS IN LINEAR ALGEBRA19.1.
INTRODUCTIONThis monograph concludes by applying the conjugate gradients method, developed in chapter 16 for the minimisation of nonlinear functions, to linear equations,linear least-squares and algebraic eigenvalue problems. The methods suggestedmay not be the most efficient or effective of their type, since this subject area hasnot attracted a great deal of careful research. In fact much of the work which hasbeen performed on the sparse algebraic eigenvalue problem has been carried outby those scientists and engineers in search of solutions. Stewart (1976) hasprepared an extensive bibliography on the large, sparse, generalised symmetricmatrix eigenvalue problem in which it is unfortunately difficult to find manyreports that do more than describe a method. Thorough or even perfunctorytesting is often omitted and convergence is rarely demonstrated, let alone proved.The work of Professor Axe1 Ruhe and his co-workers at Umea is a notableexception to this generality.
Partly, the lack of testing is due to the sheer size ofthe matrices that may be involved in real problems and the cost of findingeigensolutions.The linear equations and least-squares problems have enjoyed a more diligentstudy. A number of studies have been made of the conjugate gradients methodfor linear-equation systems with positive definite coefficient matrices, of whichone is that of Reid (1971).