lundy mees (1121217), страница 3
Текст из файла (страница 3)
such that the algorithm runs with c = cl until it finds a value o f f belowF1, then changes to c = c2 and so on. When F~ >>f~, there will be many neighboursof points which result in df< 0 and so even when c is large it will still be the casethat E(df)<O on states with f > F . Once we decrease F, however, we have toconsider states with fewer neighbours giving negative df, and moreover the depthwill be greater because of the monotonicity of d(F).
The control value will haveto be smaller to ensure uphill steps are usually rejected and E ( d f ) < 0 still. Theresult is that as we become more ambitious in our target value F, we are forced todecrease the control and wait longer.4.2. Varying the control parameterThe criterion discussed above for varying c is not the only approach one couldadopt. In fact, as we have stated it, it is not helpful as a practical bound since it ishard to estimate depth and at.Ideally, for any function f and space X with given matrix P we would like tofind a sequence {Cr}, possibly state-dependent, such that the errorE= maxileiT(Cl)T(c2)'" • T(CR) -- ellis minimized.
(Note that cr = cr+~ . . . . .Cr+q is possible; also, we are allowed tohave Or+1]> Cr, though this would only happen in a state-dependent scheme thatdecided it was necessary to increase the control to remove the system from a likelylocal minimum.)Optimal {ci} sequences could doubtless be found for trivial problems, but it seemsunlikely that this will be possible for real ones. To have a true algorithm, however,we need some way of specifying changes in c without operator intervention.
Thebest way is probably to use feedback: the value of c depends on the history andpresent behaviour of the system. We do not yet know of a satisfactory feedbackcontroller for the annealing method.Instead, we adopt the following pragmatic approach which is designed to terminateafter a number of steps that is essentially proportional to loglXI, but without anyguarantee of convergence. We have found over many simulations that this methodperforms well in terms of reaching good values of f though rather conservativelyin terms of number of steps needed.
We emphasize that this method is heuristic,and is not in accord with the theory developed above.Choice of Co. Choose an initial value of Co large enough for 7r(co) to be close touniform: this means, if 4~j = f j - f a , that exp(-~bJco) must be close to 1 for all j, soCo must be large compared with every thj. We can take Co>> U, where U is any upperbound on maxj ~bj. This is usually easy to find: in travelling salesman problems wecould take U = ~j dj where dj i s t h e longest link leaving the j ' t h city.Choice of dc.
We make 7r(c-dc) close to 7r(c) so that if we were already inequilibrium, we would nearly be in the new equilibrium state after one step (becauseM. Lundy, A. Mees / Annealing algorithm122~r(c) T ( c - dc) is close to 7r(c - dc)). To first order in dc, the normalization constantsin 7r(c) and 7 r ( c - d c ) are equal andzrj( c ) = exp( gof l c / c( c - dc ) ) crj( c - dc ).Thus we need the exponential term to be nearly 1, so (oflc<< c(c - de) for allj.
Thusd c / c 2= 6 / ( U + c6) for some ~<< 1, and so, substituting for dc in C~+l= c~-dci,ci+l = ci/(1 +/3c i )for some/3 << 1/U.Note that the c6 term is only significant in the early stages when c is comparablewith U. An almost equivalent method is therefore to choose dc to be min(/31c,/32c2),for small constants/31 and/32. For small values of c this is slower than the exponentialcooling scheme used by Kirkpatrick et al.Choice o f cf. The choice of cf depends on the acceptable error e in the solutiontogether with the error probability a. We need a / ( 1 - a ) > ( [ X [ - 1)e -~/c, and henceit is sufficient to takecf~ el(log(IX[-1) - l o g a).Note that the log a term will usually be swamped by the l o g ( I X [ - 1) term: that is,if we really were at equilibrium, the error probability could be made very smallindeed without significantly altering c f .It is trivial to show that the number R of steps before termination using the abovecontrol schedule satisfiesR < (loglXl-log o~)/(3~).Note that this means R is essentially proportional to log[X[ because the log a termis negligible as we have just seen.In implementing the above scheme, a practical point arises.
The value of cs willnearly always be so small that on any real computer working to finite accuracy andhaving a pseudo-random number generator, long before we reach cs we will haveprecisely zero probability of accepting nearly any uphill step that becomes a candidate. Consequently we might as well decide that as soon as c gets small enough forthis to happen (say, c < - 1 / l o g Pmin where Pmin is some small probability), the systemis 'frozen'.
In the physical analogy, the domains are now fixed and further coolingonly improves the ordering within them. Rather than waste time attempting uphillsteps which will always be rejected, we may as well perform an exhaustive searchby selecting neighbours of the current point without replacement. I f a neighbour isfound that has a better objective function value we accept and start checking again;if there are no such neighbours we stop.This may destroy the polynomial time termination property, since even althoughthere is a polynomial b o u n d on the number of neighbours of a point, there is noguarantee that there will only be a polynomial number of points to be visited beforeM.
Lundy, A. Mees / Annealing algorithm123a local optimum is reached. In practice this is unlikely to happen. (See e.g. Tovey,1983.) If it did, one could decide to stop after R steps even though further improvement was possible.5. ConclusionsThe attraction of the annealing method is that it is general yet simple to apply.Solving a problem with it requires only that one provide an adequate way ofgenerating neighbours of solution points.We have shown that the method converges, but that it is not possible to ensureconvergence in less than exponential time for general problems. The concept ofdepth of a state was introduced and was shown to have a major effect on convergencerate.
We conjectured that for problems where the annealing algorithm has beenfound to perform well, the depth is bounded by a quantity which grows slowly withproblem size. (Specifically, if the growth is logarithmic then the decision problemrelated to the optimization problem is solved, on average, in polynomial time.)For practical implementation a sequence of control parameter changes was introduced and shown to terminate in polynomial time.
Computational experience withthis control schedule (Lundy 1984, 1985) is encouraging.References[ 1] A. Berman and R.J. Plemmons, Non-negative matrices in the mathematical sciences (Academic Press,New York, 1979).[2] V. Cerny, "A thermodynamical approach to the travelling salesman problem: An efficient simulationalgorithm", Journal of Optimization Theory and Applications (1984), to appear.[3J A.W.F. Edwards, "Minimal evolution" (Unpublished, 1966).[4] B. Everitt, Cluster analysis (Heineman Educational Books, London, 1977).[5] M.R. Garey and D.S. Johnson, Computers and intractability: A guide to the theory of NP-completeness(W.H. Freeman, San Francisco, 1979).[6] S. Geman and D. Geman, "Stochastic relaxation, Gibbs distributions, and the Bayesian restorationof images", IEEE Transactions P A M I 6(6) (1984) 721-741.[7] D.P. Heyman and M.J.
Sobel, Stochastic models in operations research, Volume 1: Stochastic processesand operating characteristics (McGraw-Hill Inc., New York, 1982).[8] D.S. Johnson, private communication, 1984.[9] F.P. Kelly, Reversibility and stochastic networks (Wiley, Chichester, 1979).[10] S. Kirkpatrick, C.D. Gelatt, Jr. and M.P.
Vecchi, "Optimization by simulated annealing", ResearchReport RC 9355, IBM (Yorktown Heights, NY, 1982).[11] M. Lundy, "Applications of the annealing algorithm to combinatorial problems in statistics",Biometrika 72 (1) (1985) 191-198.[12] M. Lundy, "The annealing algorithm", Ph.D. Thesis, University of Cambridge (Cambridge, 1984).[13] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth and A.H.
Teller, "Equation of state calculationby fast computing machines", Journal of Chemical Physics 21 (1953) 1087-1092.[14] F. Romeo and A. Sangiovanni-Vincentelli, "Probabilistic hill-climbing algorithms: Properties andapplications", Memorandum No. UCB/ERL M84/34, University of California (Berkeley, CA,March 1984).124M. Lundy, A. Mees / Annealing algorithm[15] B.M. Schwarzschild, " Statistical mechanics algorithm for Monte Carlo optimization ", Physics Today(May 1982) 17-19.[16] E. Seneta, Non-negative matrices and Markov chains (Springer-Verlag, New York, 2nd Edition, 1981).[ 17] E.A.
Thompson, "The method of minimum evolution", Annals of Human Genetics 36 (1973) 333-340.[18] C. Tovey, "On the number of iterations of local improvement algorithms", Operations ResearchLetters 2 (5) (1983) 231-238..