Trouve (1121214)
Текст из файла
Rough Large Deviation Estimates forthe Optimal Convergence SpeedExponent of Generalized SimulatedAnnealing AlgorithmsAlain TROUVELaboratoire de Mathematiques, UA 762 du CNRSDepartement de Mathematiques et d'InformatiqueEcole Normale SuperieureLMENS - 94 - 8April 1994ROUGH LARGE DEVIATION ESTIMATESFOR THE OPTIMAL CONVERGENCE SPEED EXPONENTOF GENERALIZED SIMULATED ANNEALING ALGORITHMSALAIN TROUVESeptember 18, 1993Abstract. We study the convergence speed of generalized simulated annealingalgorithms. Large deviations estimates uniform in the cooling schedule are established for the exit from the cycles in the spirit of Catoni's sequential annealing work[2]. We compute the optimal convergence speed exponent proving a conjectured ofR.
Azencott [1].1. IntroductionContents1.1. From sequential to generalized simulated annealing1.2. Theoretical framework1.3. Outlines22. The cycle decomposition3. Renormalization of the communication cost4. The localization theorems913195. Necessary and sucient condition for convergence6. Optimal convergence speed exponent36387. Conclusion454.1.
Emptying kernels4.2. Localization kernels4.3. Main large deviation estimates6.1. Upper bound6.2. Lower bound1ALAIN TROUVE2References451. Introduction1.1. From sequential to generalized simulated annealing. Sequential simu-lated annealing is a very general and exible method for nding good solutions tohard optimization problems. Its principle is surprisingly simple and was formulatedin the early 1980s [12]. Consider a real valued function U to be minimized on a niteset E called the state space (or the conguration space). Now, let an irreducibleMarkov kernel q on E be given and dene for all T > 0 the Markov kernel QT on Esuch that for any i; j 2 E , i 6= j(1)QT (i; j ) = q(i; j ) exp(?(U (j ) ? U (i))+=T );where x+ denotes the positive part of x. For T = 0, denote Q0 = limT !0;T>0 QT .An homogeneous Markov chain X = (Xn )n2N with transition kernel Q0 satisesU (Xn ) U (Xn+1 ) and is trapped in a nite number of steps into a local minimaof U .
For T > 0, QT is a perturbation of the previous mechanism allowing hillclimbing moves. Assuming that q(i; j ) = q(j; i), QT admits T as unique equilibriumprobability measure which satises for all i 2 E(2)T (i) = exp(?U (i)=T )=ZT :The parameter T is physically interpreted as a temperature and one veries that(3)lim (fi 2 E j U (i) = min U g) = 1:T !0 TThis particular feature is the key of the simulated annealing approach.
Considering adecreasing cooling schedule T = (Tn)n2N i.e. a sequence of decreasing temperatures,we dene an inhomogeneous Markov chain X = (Xn )n2N with transition kernel QTn+1at time n. The intuitive idea is that for suciently slowly decreasing cooling schedules, the law of Xn should be close to Tn and we should have(4)sup P (U (Xn ) 6= min U j X0 = i)?!n!10:i2EMany results are known on sequential simulated annealing.
Since Hajek's paper [8],one knows that there exists H 0 such that for all decreasing cooling schedulesT vanishing to zero, (4) holds if and only if Pn0 e?H=Tn = +1. This problem ofOPTIMAL CONVERGENCE SPEED EXPONENT FOR G.S.A3convergence can be extended to the more general problem of convergence on levelsets E = f i 2 E j U (i) min U + g for > 0. This problem has been successfullysolved by O. Catoni in [2] who proves that there exists ? 0 such that if T is adecreasing cooling cooling schedule vanishing to zero, thenX ?? =Tn(5)= +1:esup P (Xn 2= E j X0 = i)?!0in!1i2En0More crucial for practical use of sequential simulated annealing is the rate of convergence.
More precisely, letMn = T0Tinf(6)sup P (Xn 2 E j X0 = i):1 Tni2ECatoni's work [2] gives that there exist 0, ^ 0, K1 > 0 and K2 > 0 such thatK1 M K2 :(7)nnn^The lower bound says that the mass in E cannot vanish faster than a power of 1=nand that this power is upper bounded by . Conversely, the upper bound says thatfor each N 2 N there exists a decreasing cooling schedule T N such that for thiscooling schedule(8)sup P (XN 2 E j X0 = i) 2nK^2 :i2ENote that we may have T N 6= T N +1 and in fact the proof of (8) shows that oneshould designed the cooling schedule according to the horizon. The value of and^ are given explicitly in function of U and q (even if their numerical computationis an hard combinatorial problem).
Since for suciently small value of we have = ^ we deduce that if(9)opt = lim = lim ^ ;!0 !0 then we cannot expect for the convergence to the set of global minima of U a betterspeed exponent with decreasing cooling schedules than opt.In many practical situation, computer scientists have proposed modied versionof the sequential simulated annealing in order to increase the speed of convergence.One of the most promising issues is the parallelization of the algorithm in orderto distribute the computations on several processes. As an illustration of such anapproach, let us have a look on sequential simulated annealing in image processing(see [6]).
In this case, the conguration space E is a product space E = LS where S4ALAIN TROUVEis a set of pixels and L a set of labels (usually grey levels). For each i 2 E , i(s) is thevalue of the grey level at pixel s 2 S . The sequential annealing process has in thisframework the following particular form. One dene rst a family of local updatingkernels (Qs;T )s2S;T 0 by(10)8U (il;s )=T )< P exp(?exp(if j = il;s;0Qs;T (i; j ) = : l06=l;l0 2L ?U (il ;s)=T )0otherwisewhere il0;s is the conguration j 2 E such that j (s) = l0 and j (s0) = i(s0) for s0 6= s(only the pixel s is changed).
Now, let a family (sp)1pjSj exhausting the elementsof S be given (generally we choose a sweeping line by line of the image). We denethe family Q = (QT )T 0 of transition kernels of the sequential simulated annealingby(11)QT (i; j ) = (Qs1;T Qs2;T Qsp;T )(i; j ):Rigorously speaking, QT cannot be written of the form (1) and the previous results do not apply. However, an extension of them to this case does not imply deepmodications of the simulated annealing theory since T is still the unique invariant probability measure of QT and since QT is reversible for T > 0.
Now, considerthe following parallelization of the sequential process: assume that one has a smallprocessor attaches to each pixel and assume that all the pixels are updating synchronously according to the local updating rule. We dene then a new transitionmatrix given by(12)KT (i; j ) =Ys2SQs;T (i; ij(s);s):One can expect that the speed of convergence is increased by a factor jS j since allthe labels are updating in one unit of time (instead of jS j units of time previously).However, some very important changes have occurred owing to the interactions between processors and we have to generalize the framework of simulated annealing.Instead of the form (1), we can only say that there exist an irreducible Markov kernelq on E , an real valued number 2 [1; +1[ and a family of non negative numbers(V (i; j ))i;j2E such that1 q(i; j )e?V (i;j)=T K (i; j ) q(i; j )e?V (i;j)=T :(13)TOPTIMAL CONVERGENCE SPEED EXPONENT FOR G.S.A5For the classical sequential annealing, we have = 1 and V (i; j ) = (U (j ) ? U (i))+so that V (i; j ) ? V (j; i) = U (j ) ? U (i).
This property which is a consequence ofthe reversibility of the sequential annealing transition kernel does not hold any morefor the kernel KT (in fact, there does not exist any function W such that V (i; j ) ?V (j; i) = W (j ) ? W (i)). As a consequence, none of the previous results can apply andwe need an extended theory to handle the convergence properties of parallel versionof sequential simulated annealing. This extension will be the main subject of thispaper. We should emphasize the fact that many stochastic optimization algorithmsfall into the scope of this extension and not only the parallel versions of the sequentialannealing for image processing presented above. This extension covers in fact all thesmall random perturbations (with discrete time and nite state space) of dynamicalsystems as developed by Wentzell and Freidlin in [5].
The point is to handle here theconvergence properties of such processes when the perturbation vanishes with time.1.2. Theoretical framework. Let us now introduce precisely our theoretical framework. Let E be a nite set which will stay xed throughout this work.Denition 1.1. Let 2 [1; +1[ and let q be an irreducible Markov kernel on E .We say that a continuously parametered family Q = (QT )T 0 of Markov kernel on Eis admissible for and q if and only if there exists a family V = (V (i; j ))i;j2E suchthat(1) V (i; j ) 2 [0; +1] and V (i; j ) < +1 i q(i; j ) > 0,(2) for all T 0, all i; j 2 E1 q(i; j )e?V (i;j)=T Q (i; j ) q(i; j )e?V (i;j)=T :TFor T = 0, we take the convention that e?V (i;j)=0 = 1V (i;j)=0.
The family V is calledthe communication cost and the set of all the admissible families Q for q and isdenoted by A(q; ).Remark 1. As a main extension of the standard sequential simulated annealing transition kernels, we do not assume that q is symmetric and and V can be completelyarbitrary.In the following denition, we dene the probability measure associated with anabstract annealing process on E .ALAIN TROUVE6Denition 1.2.
Let X = (Xn )n2N be the coordinate process on E N. For all =(Q; T ; 0) where Q = (QT )T 0 is a continuously parametered family of Markov kernels on E , T = (Tn)n2N is a sequence of non negative real valued numbers called coolingschedule, 0 is a probability measure on E called the initial distribution,we denote P the unique probability measure on E N for the -algebra (Xn ; n 0)such that X is under P a Markov chain with initial distribution 0 and transitionkernel QTn+1 at time n.We can now dene the extended notion of generalized simulated annealing.Denition 1.3.
Let q be an irreducible Markov kernel on E , let 2 [1; +1[ andlet X = (Xn )n2N be the coordinate process on E N. We say that X is a generalizedsimulated annealing process (G.S.A.) with parameter (q; ; P ) if there exist Q 2A(q; ), a cooling schedule T and an initial distribution 0 such that P = P with = (Q; T ; 0). Moreover, as usual, P ( j X0 = i) will denote the probability measurePi where for all i 2 E , i = (Q; T ; i).Remark 2.
An equivalent framework has been considered by Hwang and Sheu in [10].Their results will be reported below in this section.Considering an arbitrary communication cost V , a G.S.A. is not dened around anexplicit energy function U . However, there exists an implicit function W introducedby the denitions below which plays for low temperatures the same role. This functiondepends on the communication cost V through a functional on A-graphs that wedene now.Denition 1.4.
Характеристики
Тип файла PDF
PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.
Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.