lundy mees (1121217), страница 2
Текст из файла (страница 2)
At a localm i n i m u m r = n + 6 it performs a r a n d o m walk starting at r = n + 6, with an absorbingbarrier to the left ( r = n - 6) and a reflecting barrier to the right (r = n + 2 6 ) . Thetime for absorption is therefore O(1); for example, except near a corner (i.e. awayfrom the lines x, = +x2) the probability o f reaching n - 6 from n + 6 in two stepsis arbitrarily close to PL = 1/16, while the probability o f reaching n + 26 is arbitrarilysmall. At a corner there are m i n o r complications (absorption takes at least 3 steps)but the time is still O(1).Denote by E1 the expected passage time f r o m r = n + 3 to r = n - 6.
The expectedtime E(S) to reach the global m i n i m u m f r o m r=r(xo) is (if r # [ r ] ) close to4[r]/6+[r]E1 steps, which is O ( N ) . So the orders o f magnitude of the times tosolve the problem are O ( N ) for the annealing m e t h o d as against O ( N 2) for thedescent method. In higher dimensions, the effect will be even m o r e p r o n o u n c e d :E1 will change, as will the factor 4 above, but the time will still be O ( N ) as againstO(N d) for the descent method, where d is the problem dimension.3. Equilibrium theoryWe are given a function f : X ~ R and asked to find its minimum.
Assume thestates i • X are labelled 1 , . . . , IXl such that i < j i f f < f j . The p r o b l e m is thereforeM. Lundy, A. Mees / Annealing algorithm117to find fl in a reasonable time without having access to the labelling which ordersobjective function values.With each state i we associate a n e i g h b o u r h o o d N~ o f all states which can bereached from i in a single transition. N e i g h b o u r h o o d s are implicitly defined by wayo f a perturbation matrix P = {Pij}. Here Pu is the probability o f generating state jas a candidate next state, when we are in state i.
O f course, N~ = {j: Pij > 0}. Weassume that for any two states i0 and /y, there exists a finite sequence o f K statesJ l , J 2 , . . - , J K such that i0=j~, / f = j K , a n d p j , j .... > 0 for m = l , 2 , . . . , K - 1 .Thisensures X is connected and P is irreducible (Seneta, 1981).The algorithm accepts j with probability aq(c). We require aq to be a continuousfunction o f c with the following properties:a~j(e) =1for all i > j , for all c ~> 0,i<j,0<a~j(c)<lfor allaij(0) = 0for all i < j ,au(c)~for all1i<jfor all c > 0 ,as c - ~ .I f state j is not accepted the algorithm stays in state i and tries again to generate aneighbour.The sequence o f solutions is generated by a M a r k o v process with transition matrixT = {tij} wherelaij(c)pij(i #j),tij(C) = (1 -~i aik(c)pik (i =j).For c > 0, irreducibility o f P together with au(c ) > 0 implies T is irreducible.
Asc ~ 0 , T approaches a lower triangular matrix. In particular, t~i(0) = 1 if state i is alocal minimum: that is, i f j c N~ implies j > i, so p~j = 0 f o r j < i. The effect o f acceptinguphill steps for positive c is that there is a path from every local m i n i m u m to theglobal minimum. (This is just a restatement o f irreducibility.)Let e~ be the i'th unit vector in R Ixl. Then if state io is a local minimum,eioT(O) = eio.In particular, we always have e l T ( O ) = e~.
N o w for fixed c > O, T has a uniqueequilibrium vector ir with~r(c)T(c)=~r(c).The annealing algorithm works because~(c) ~ e1as c ~ 0, rather than 7r(c) ~ e~ for some local m i n i m u m io. To prove this, we calculate#(c).M. Lundy, A. Mees / Annealing algorithm118We make two additional assumptions:Assumption I. P is symmetric.Assumption II.
For all c > O, aij(c)ajk(C)= aik(C) if i < j < k.The first of these will hold in any straightforward neighbour generating scheme.It says that if j is a neighbour of i then i is a neighbour of j, and they are chosenwith the same probability. Usually there will be some fixed size Q for all Ni, andneighbours will be selected with uniform probability 1/Q.
We will show later howthis assumption may be altered to produce a slightly different convergence result.The second assumption holds for the standard acceptance criterion havinga~j(c)={~xp(-(fj-f)/c)iffj > f ,iffj <~f.It obviously also holds for many other functional forms of % : anything of the formao(c) = O ( f , c)/ (a(fj, c) will do.We now come to the main result.Theorem. I f P is irreducible and Assumptions I and H hold, then for c > 0 the uniqueequilibrium vector corresponding to T( c) is7r(c) = 7rl(c)(1, a12(c), a13(c) . . . .
. al.lxl(C)).Proof. We show that 7r(c) satisfies the detailed balance equations,l'l'itij = 7.rjtjifor all i, j. (See Kelly, 1979.)Suppose 1 < j < i. Theniritq = ( Trlali )( aijpij ) = 1rlalipij= ~laljajipqby assumption II= (~rla~)(aj~p~i) by assumption I= ~-jtj,.The case 1 < i < j is similar. When i = j the result is trivial.This explicit formula for the equilibrium distribution agrees with results derivedfrom physical reasoning and quoted by Metropolis (1953) and Cerny ( ! 9 8 4 ) , T h esame result has been obtained by Sangiovani et al.
(1984). The consequences of the~theorem are as follows.1. The equilibrium distribution is independent of the topology.2. 7r(c)->el as c-->0.3. 7r(c)+(1/IXl, 1/IXl,..., 1/IXl) as c-->~4. With the standard exponential form of % ( c ) , the equilibrium probability thatwe are no more than e > 0 above the optimum (i.e. that we are in a state i withf - f ~ < ~ e) is at least 1 - ( I X I - 1 ) e -~/c.M. Lundy, A.
Mees / Annealing algorithm119Conclusion (1) tells us that, subject to the space being connected, the choice oftopology can only have an effect through the convergence rate, not on the finalresult. Conclusion (2) gives us convergence with probability as close as we like to1 but says nothing about how long the convergence takes. Conclusion (3) confirmsthe intuitive idea that at high enough control values the function values should beirrelevant. Conclusion (4) allows us to determine a value of c to terminate theprocess: for error probability a we require a/(1 - a ) > (IX] - 1) e -~/c.Note that we can incorporate complex constraints into the problem by disallowingcertain neighbours.
I f we do so, P is no longer symmetric and the size of theneighbourhood Ni varies with i, so the above Theorem no longer applies. However,it is easy to check that if Assumption I is replaced byAssumption I'. p/j = l/[Nil f o r j e Ni and pij = 0 i f j ~ N~.then ~ ( c ) is proportional to [NjIaij(c). There is an obvious change in the aboveinequality for a.It is also true that constraints can sometimes be usefully handled using penaltyfunctions; in such cases (which are perhaps more relevant to continuous problemsthan to combinatorial ones) the annealing algorithm is still easy to apply.4. Reaching equilibriumWe now understand the equilibrium behaviour of the annealing algorithm, butwe have not yet discussed the problem of how to get to equilibrium.
Indeed, it isnot apparent so far why we should vary c at all: why not just choose c = cf in thefirst place?For any fixed value of c, the rate of convergence to the steady state is geometricsince the state space is finite, and the convergence factor is given by the secondlargest eigenvalue of the transition matrix. The latter is however impossible tocompute for matrices of size [X[. It is to be expected that when c is small it will bevery close to 1 for problems with local optima; in that case, convergence will beimpossibly slow if we simply start at a small value of c.4.1. The effect of the topologyRather than study the optimization problem directly, we work with the relatedrecognition problemD:Given F, is there a state j with f~ < F ?We try to estimate how long it would take the annealing algorithm to confirm thatthe answer to D is 'yes' when the answer is indeed 'yes'.
This is a commonsimplification in dealing with performance of algorithms. For a full discussion ofthe relationship between recognition problems and optimization problems, see Gareyand Johnson (1979).M. Lundy, A. Mees / Annealing algorithm120Observe first that for arbitrary topologies, E(S)= O(]X[) m a y easily occur. Forexample, let Pid m = 1 for some m > 0 ; then E(S)=O(IXI/m).We n o w analyze the p r o b l e m more formally. In yes-instances o f D the annealingalgorithm stops as soon as it finds a suitable fs.
Let this h a p p e n at step S(i) whenthe initial state is i. Let df~ be the change in the value o f f after one step startingin state j. ThenE (d£) = E ajkp~k(fs- A ) .kSuppose that d fs > - M and E (dfs) <~ - c~ < 0 when fs > F.Write E (S] i) for the expected n u m b e r of steps to give a yes answer starting fromstate i. A trivial extension o f Wald's equation ( H e y m a n and Sobel, 1982) would giveE(SIi)<~(f - F + M)/o~.(4.1.1)If there are any local optima, the assumption E(dfs)<~ - a is too strong and insteadwe should assumeE(d'f) <~-rar< 0(4.1.2)where drfj is the change in f after r steps f r o m state j and ar is then a b o u n d onthe i m p r o v e m e n t per step, averaged over the r steps.Proposition.
If(4.1.2) holds for some r,E ( S I i) ~ (fii - F + r M ) / a r.For each j e X, define its depthd(j) = min{r - 1: (Tr)Sk > 0 for some k < j } .and define d(F) = m a x { d ( j ) : fs > F}. Note that the d ( j ) ' s and d(F) are independento f c. The integer r in (4.1.2) is to be chosen so that r > d(F). For the travellingsalesman problem d(F)<~ n - 3 , but we conjecture that in typical instances o f it,d(F) is less than 10.To understand h o w E(S) varies with n we need a lower b o u n d for ar. In generalwe can merely state--a r=max Y~(fs --fk)(Tr)jk >~AfFrinj:f]>F kwhere Af - mink(fk+a --fk) and Train= rain{Tjk : Tjk > 0}.
A polynomial bound is thusonly guaranteed if d(F) = O(log n).In simulations we have found, in agreement with Cerny (1984) that two topologiesfor the travelling salesman p r o b l e m having very similarly structured P matricesbehave quite differently with respect to convergence speed. This is plausiblyexplained in the light o f the above discussion if one o f the two topologies resultsin a small b o u n d on the depth while the other does not.M. Lundy, A. Mees / Annealing algorithm121Finally, let us see how all this would be seen in terms of the real annealingalgorithm which varies c. Imagine a sequence of recognition problems {c~, F1},{c2, F2}. . . . .