Trouve (1121214), страница 2
Текст из файла (страница 2)
Let A E . We say that a set g of arrows i!j in Ac E is A-graphi:(1) for each i 2 Ac, there exists an unique j 2 E such that i!j 2 g;(2) for each i 2 Ac, there is a path in g ending on a conguration in A.We denote by G(A) the set of the A-graphs.Denition 1.5. Let q be an irreducible Markov kernel on E , let 2 [1; +1[ andlet Q 2 A(q; ). Let V be the associated communication cost.
For each i 2 E andOPTIMAL CONVERGENCE SPEED EXPONENT FOR G.S.AP7each g 2 G(fig), we denote V (g) = i!j2g V (i; j ). We say that W : E ?!R+ is thevirtual energy associated with Q if and only if for each i 2 EW (i) = inf f V (g) j g 2 G(fig) g:Remark 3. Since V (i; j ) < +1 if and only if q(i; j ) > 0 and since q is irreducible onE , one easily veries that W (i) < +1. The above construction of W has been givenby Wentzell and Freidlin in [5] as well as the following proposition.Proposition 1.6. Let q be an irreducible Markov kernel on E , let 2 [1; +1[ andlet Q 2 A(q; ).
For each T > 0, QT is irreducible and we denote T its uniqueinvariant probability measure. Then if W is the virtual energy associated with Q wehaveT ln(T (i))?!? (W (i) ? min W ):T !0Remark 4. This proposition shows that T (fi 2 E j W (i) = min W g)!1 when thetemperature vanishes. Moreover, T (i) is of the order of exp(?(W (i) ? min W )=T )for low temperature which should be compared to T (i) of the order of exp(?(U (i) ?min U )=T ) for the sequential simulated annealing.
Hence a G.S.A. is a optimizatingprocess for the virtual energy.1.3. Outlines. In the general framework, one can easily shows that the generalizedsimulated annealing algorithm converges for suciently slowly decreasing coolingschedule to the global minima of a virtual energy function W . For cooling scheduleTn = c=(ln(n + 2)), a necessary and sucient condition on the value of c for thisconvergence is given in [3],[10], [9] and [13] without giving the optimal convergencespeed in particular because such a result needs large deviation estimates uniformin the cooling schedules as established by O. Catoni in [2].
Our main goal will behere to prove that the approach of O. Catoni can be successfully extended to thegeneral framework and gives the value of the optimal convergence speed exponentconjectured by R. Azencott in [1].We will start in Section 2 with the cycle decomposition of the state space E .Roughly speaking, a cycle is a subset of E which can be considered as a point forsuciently low temperature.
One of there important properties is that at constanttemperature T , the exit time of a cycle is of order eHe()=T whatever the startingpoint in is. The non negative number He () is called the exit height of . Thecycles are structured on a tree whose root is E and leaves are the singletons sinceALAIN TROUVE8two cycles 1 and 2 are either disjoint or included one in the other. We will recallthe recursive construction of the cycles which depends only on the communicationcost V of the G.S.A. and we will show the link between the virtual energy and thecycle decomposition.In Section 3, we will introduce for each A E , i 2 A and j 2 E a new costCA (i; j ) called the renormalized communication cost in A.
Starting from i 2 A, theprobability that the process at constant temperature visits the edge (i; j ) before theexit time of A (included) is of order exp(?CA(i; j )=T ). This renormalized cost in Awill be expressed in function of the cycle decomposition.Both previous sections are in fact devoted to the combinatorial denitions of thecycle decomposition and of the renormalized costs without any results on their probabilistic meaning. We will state in Section 4, large deviation estimates on exit time andexit point of subset of E which will enlighten the probabilistic role of the quantitiesmentioned above. We will handle large deviation estimates for arbitrary decreasingcooling schedules which are the keys for the study of the convergence of the G.S.A.As done for the sequential simulated annealing, the problem of convergence can bewritten in the following way: Let > 0 and denotesE = f i 2 E j W (i) min W + g:Our rst convergence result, established in Section 5, concerns the necessary andsucient condition on the cooling schedule T for which the mass in E vanisheswhen the number of steps increases:Let q be an irreducible Markov kernel on E and let 2 [1; +1[.
Let X be a G.S.A.with parameter (q; ; P ) and let T be the underlying cooling schedule. Assume thatT is decreasing (Tn Tn+1) and vanishes when n tends to innity. Then for all > 0, there exists ? 0 (independent of T ) such thatX ?? =Tnsup P ( Xn 2 E j X0 = i)n?!0ie= +1:!+1i2En0The value of ? is explicitly given in function of the cycle decomposition by? = supf He () j cycle such that iinfW (i) min W + g:2Finally, in Section 6, we study the optimal convergence speed and we establishthe following result which says that if q is an irreducible Markov kernel on E , if 2 [1; +1[ and if X is a G.S.A with parameter (q; ; P ) whose underlying coolingOPTIMAL CONVERGENCE SPEED EXPONENT FOR G.S.A9schedule T is decreasing, then, assuming that the underlying family Q 2 A(q; )satises an additional weak condition C1 (which is fullled for instance if QT (i; j ) =P k ?bki;j =T P k ?dki;j =T= k ci;j e), there exists b > 0 (which depends on Q but not on T )k ai;j esuch that for all level > 0 and all n > 0,sup P (Xn 2 E j X0 = i ) nbi2Ewithmin W j cycle, W () ? min W g = inf f W ()H?()ewhere W () = inf i2 W (i).We nally establish a last result which say we can construct for each N a decreasingcooling schedule T N such that if PN = P(Q;T N ;0) then there exists K1 > 0 such thatsup PN ( XN 2 E j X0 = i) NK^1i2Ewheremin W ) ^ j cycle g:^ = inf f (W () ?H ()eThis exponent is optimal for suciently small > 0 since we have then ^ = .
Asa corollary, we prove a conjecture of Azencott [1] on the optimal convergence speedexponent for the convergence towards the global minima of the virtual energy W .2. The cycle decompositionLet q be an irreducible Markov kernel on E and let 1. Let Q = (QT )T 0 2A(q; ) and denotes V the underlying communication cost. We present now thecycle decomposition of E which depends only on V . The proof of the probabilisticproperties of the cycles are postponed to Section 4 where we will study exit time andexit point from a cycle.The cycle decomposition for a given communication cost has been introduced byWentzell and Freidlin [5] and we recall it briey for completness.
We need rst todene the usual notion of path associated with a cost function.Denition 2.1. Let F be a nite set, C : F F ![0; +1] be a function, and i andj be two distinct congurations of F .10ALAIN TROUVE(1) We denote PthF (i; j ) the set of all paths (gk )0kn in F such that g0 = i andgn = j . The path dependent integer n will be denoted ng and called the lengthof g.(2) Let g in PthF (i; j ), we dened C (g) byC (g) =nXg ?1k=0C (gk ; gk+1 ):We adopt the convention that C (g) = +1 if one of the summation terms is+1.The decomposition in cycles is dened in a recursive way.
First the set E 0 of thecycle of order 0 is dened by:E 0 = f fig j i 2 E g:Let us consider the communication cost function V 0 on E 0 dened byV 0(fig; fj g) = V (i; j ):Assume that the set E k of the cycles of order k has been constructed as well as acommunication cost function V k on E k . The construction of the pair E k+1 , V k+1 canbe split in several steps:(1) From V k , we dene an other communication cost Vk on E k by(if = 0;k0V (; ) = 0V k (; 0) ? H k () otherwise,ek00where Hek () = inf00 6= V (; ):(2) Consider on E k the equivalence relation Rk dened byRk 0 if = 0 or if there exists g 2 PthEk (; 0) and g0 2 PthEk (0; )such that Vk (g) = Vk (g0) = 0:(3) Dene E k+1 byE k+1 = f 0 R[ 0 j 2 E k g:k(4) We dene now a communication cost V k+1 on E k+1 byV k+1 (0k+1; k1+1) = Hmk+1 (k0+1)+inf f V k (k0; k1 ) ? Hek (k0) j (k0 ; k1) 2 E k E k ; k0 k0+1; k1 k1+1 gwhere Hmk+1 (k0+1) = supf Hek (k0 ) j k0 k0+1; k0 2 E k g:OPTIMAL CONVERGENCE SPEED EXPONENT FOR G.S.A11The construction continues until E k = fE g.
We denote nE the integer such thatE nE 6= fE g and E nE +1 = fE g.This procedure gives an hierarchical decomposition of the states space on a treebeginning with the singletons and ending with the whole space. From this tree wewill introduce the following subsets.Denition 2.2. (1) We denote C (E ) the set of all the cycles that isE +1 E kC (E ) = [nk=0(2) Let A E . We dene the maximal partition M(A) of A by:M(A) = f 2 C (E ) j is a maximal element in CA (E ) gwhere CA (E ) = f 2 C (E ) j Ag: We dene the maximal proper partition M(A) of A by:M(A) = f 2 C (E ) j is a maximal element in CA (E ) gwhere CA (E ) = f 2 C (E ) j A; 6= A g:We will now introduce in the following denition some important parameters onthe cycles.Denition 2.3.
Let 2 C (E ).(1) We denote He () the real valued number(fHek () j k nE 2 E k g if 6= E;He () = sup+1if = E:The real He () will be called the exit height of . For = fig we will oftenprefer the notation He(i) to He (fig).(2) We denote Hm () the real numberHm() = supf He (0) j 0 2 M() g _ 0:The real Hm () will be mixing height of .(3) We denote W () the real valued numberW () = inf f W (i) j i 2 g:ALAIN TROUVE12(4) We denote F () the setF () = f i 2 j W (i) = W () g:The set F () will be called the bottom of .Remark 5. The probabilistic interpretation of He () and Hm () can be easily given.For T > 0, let X be a G.S.A. with parameter (q; ; PT ) where the underlying coolingschedule T = (Tn)n2N is assuming to be constant and equal to T (Tn = T , 8n 2 N).For each cycle the exit time from is of order eHe()=T for any starting point in and the probability to go from i 2 to j 2 within a time of order eHm()=T tendsto 1 when the temperature T vanishes.Some parts of the above denition can be extended to an arbitrary set.Denition 2.4.
Let A E .(1) We dene the exit height He (A) of A byHe (A) = supf He () j 2 C (E ); A g _ 0:(2) We denote W (A) the real valued numberW (A) = inf f W (i) j i 2 Ag:(3) We dene the bottom F (A) of A byF (A) = f i 2 A j W (i) = W (A) g:Denition 2.5. Let i 2 E , we dene by induction the increasing family of cycles(ik )0knE by i0 = fig andik+1 2 E k+1 ; ik ik+1 for k nE :Remark 6. A set A which is not a cycle should be considered as an inhomogeneoussubset of E for the exit time from A.