locatelli (1121215), страница 2
Текст из файла (страница 2)
Assumptions for the Convergence ResultsIn Ref. 15, the next candidate point XkC1 is generated inside the intersection of the feasible set X with a sphere with center in the current pointYk and radius R. Here, we consider algorithms which satisfy the followingps893$p22710-01-:0 11:35:20126JOTA: VOL. 104, NO. 1, JANUARY 2000condition:supp(D(·, zk ))GS(Yk , Rk )∩X,where supp(·) denotes the support of a distribution. Therefore, the newcandidate point XkC1 , generated by the distribution D(·; zk ), must belongto the intersection of the feasible set X with the sphere whose center is thecurrent point Yk and whose radius is Rk .
The basic idea is that not onlyshould the temperature tk decrease to 0 when we are close to the currentrecord [see (4)], but also that the same should be true for the radius Rk ofthe support sphere for the distribution D(·; zk ); i.e., we should haveRkXδ 2H0,if f (Yk )Af k*H2̄,(6a)Rk Grk ,otherwise,(6b)where {rk } is a deterministic nonincreasing sequence converging to 0.
Themeaning of (6) is that, if we are in a point whose function value is poorwith respect to the current record, then we guarantee the possibility of performing steps XkC1AYk whose length is bounded away from 0, while if weare close to the record, we decrease to 0 the maximum allowed steplength.We need to introduce some assumptions, more restrictive than the onespresented in Ref. 15, but still quite general. The first assumption restrictsthe possible choices for the distribution D of the next candidate point.Assumption 3.1. For any zk , the distribution D(·; zk ) of the next candidate point XkC1 is equivalent to the uniform distribution overS(Yk , Rk )∩X; i.e., there exist constants g, GH0, gYG such thatg[m(B)ym(S(Yk , Rk )∩X )]YD(B; zk )YG[m(B)ym(S(Yk , Rk )∩X )],∀B⊆S(Yk , Rk )∩X, ∀Yk ∈X.(7)The second assumption introduces restrictions on the objective function fand the feasible set X.Assumption 3.2.
The feasible set X is compact, convex, and fulldimensional, and the objective function f is continuous; moreover, f has aunique global optimum over X (the extension to the case of finitely manyglobal optima is trivial).Note that the continuity of f and the compactness of X imply that f isuniformly continuous over X; i.e.,∀δ H0, ∃γGγ (δ )H0: d(x, y)Yγ ⇒ u f (x)Af (x)uYδ .ps893$p22710-01-:0 11:35:20(8)127JOTA: VOL.
104, NO. 1, JANUARY 2000Next, we introduce an assumption about the speed of convergence to 0 forthe sequences {ck } and {rk }. First, we need to introduce some notation. Let5s6sk Gmin s: ∑ rkCiX2 diam(X ) ,i G0diam(X )Gmax d(x, y),x,y ∈Xand let∆Fk Gmaxmax [ f ( y)Af (x)].x ∈X y ∈S(x,rk) ∩ XMoreover, we define the following sequence of integers:q0 G1,q jC1 Gq j Csqj ,∀j∈{0, 1, . .
.}.Then, the assumption is the following.Assumption 3.3. The sequences {ck } and {rk } decrease to 0 slowlyenough; specifically, the following condition must be satisfied:Ss∑ ψ qj exp{A[sq j ∆Fqj ycq jC1 ]}GS,(9)j G1where ψ ∈(0, 1) is a given constant.While condition (9) is quite general, it is quite difficult to appreciate itsdependency on the two sequences {ck } and {rk }. In particular, while thesequence {ck } appears explicitly in (9), the sequence {rk } appears onlyimplicitly through sq j and ∆Fq j . In order to make the last sequence appearexplicitly, we can introduce two further requirements. The first one is thatthe sequence {rk } is updated only at iterations qj , jG1, 2, . . . ; i.e.,∀k∈[q j , q jC1).rk Grq j ,Then, it follows thatsq j Gmin{s: srq jX2 diam(X )}G2 diam(X )yrq j.The second requirement is that the function f is Lipschitz with Lipschitzconstant L. Then, an upper bound for ∆Fq j is given by Lrq j . If both theserequirements hold, (9) is implied by the following condition:S2 diam(X)/rqj exp{−2L diam(X )yc∑ ψq jC1 }GS,j G1where also the sequence {rk } appears explicitly.Under the above assumptions, the following result can be proved.ps893$p22710-01-:0 11:35:20128JOTA: VOL.
104, NO. 1, JANUARY 2000Theorem 3.1. Let tk and Rk be given by (4) and (6), respectively, andlet Assumptions 3.1–3.3 hold. Then,P[∃k: Yk ∈B2 ]G1,∀2H0.(10)hProof. See Theorem 2 in Ref. 11.Basically, Theorem 3.1 states that, under the given assumptions, therecord value f *k converges to the global optimum value f * with probability1. However, Assumptions 3.1–3.3 are not enough to ensure the strongerresultlim P[Yk ∈B2 ]G1,k→S∀2H0.(11)In order to get this result, further assumptions have to be introduced. Thefirst one imposes restrictions on the possible choices of the value 2̄ whichappears in (4) and (6).Assumption 3.4.
The value 2̄ satisfiesm(S(x, ρ2 )∩B 2 )H0,∀x∈B22̄ ,∀2H0.This assumption is the correspondent of condition (5) with R replacedby ρ2 and can be interpreted in the same way; see the comment after (5).We point out that a drawback of the above assumption is that generally itis not possible to know in advance a suitable value for 2̄. We delay thediscussion on this difficulty to the following section.Next, we need another assumption, basically requiring that the functionf is locally Lipschitz in a neighborhood of the global optimum.Assumption 3.5.
There exists v1H0 such that, ∀x∈B2̄ and ∀y∈S(x, rk ),f ( y)Yf (x)Cv1rk .While Assumption 3.5 requires that the function does not increase toofast in a neighborhood of the global optimum, the next assumption requiresthat the function is not too flat. Indeed, it is required that, while we are ina neighborhood of the global optimum, it is always possible, with a strictlypositive probability, to get close enough to the global optimum, both fromthe point of view of the function value and from the point of view of thedistance.ps893$p22710-01-:0 11:35:20JOTA: VOL. 104, NO. 1, JANUARY 2000129Assumption 3.6.
Let x* be the global optimizer, and let 2∈(0, 2̄]. Forsome v2 Gv2(2)H0, let Dk denote the event{ f (YkC1)Yf (Yk )Av2rk , d(YkC1 , x*)Yd(Yk , x*)A(1y2)rk },i.e., the event of getting close enough to x*. Then, there exist pH0 and apositive integer K2 such thatP[Dk uYk Gx, zk ]Xp,∀x∈B22̄ \B2/2 , ∀kXK2 , ∀zk .It can be proved that for instance Assumption 3.6 holds if f is twicecontinuously differentiable in a neighborhood of the global minimum andif the sufficient optimality conditions of the second-order are satisfied in theglobal optimum. However, the assumption is satisfied also for more generalfunctions.The final assumption requires that, in a neighborhood of the globaloptimum, the probability of accepting an ascent step decreases to 0 as theiteration counter k increases.Assumption 3.7.
There exists a sequence bk Gbk(2) → 0 such thatP[ f (YkC1)Hf (Yk )uYk Gx, RkGrk , zk ]Ybk ,∀x∈B22̄ \B2/2 , ∀zk .(12)While Assumption 3.7 is what we need for the proof of (11), we wouldlike to introduce an alternative assumption, which is less general thanAssumption 3.7, but gives a better indication about how {ck } and {rk }should be chosen in order to enforce it.Assumption 3.8. There exists a sequence {uk } such that ck yuk → 0 ask → S, and a sequence ρk Gρk(2) → 0 such thatm(X∩S(x, rk )∩{y: f (x)Ff ( y)Ff (x)Cuk })ym(S(x, rk )∩X )Yρk ,∀x∈B22̄ \B2/2 .Intuitively, Assumption 3.8 requires that rk decreases to 0 at a rateslower than uk , and consequently at a rate also slower than ck . Indeed, if ukis smaller than rk , the measure of the set of points in S(x, rk )∩X with function value bigger than f (x), but by not more than uk , is typically also smallwith respect to the measure of the whole set S(x, rk )∩X.
Under Assumption3.1, Assumption 3.7 is always implied by Assumption 3.8. Indeed, for anyx∈B22̄ \B2/2 , the probability in (12) can be split in the following sumP[ f (YkC1)Xf (Yk )Cuk uYk Gx, Rk Grk , zk ]CP[ f (Yk )Ff (YkC1)Ff (Yk )Cuk uYk Gx, Rk Grk , zk ].ps893$p22710-01-:0 11:35:20(13)130JOTA: VOL. 104, NO. 1, JANUARY 2000By observing the Metropolis acceptance function (1), it follows that anupper bound for the first probability in (13) is the value exp{−(uk yck )},while from Assumption 3.8 and (7) withBGX∩S(x, rk )∩{y: f (x)Ff ( y)Ff (x)Cuk },it follows that an upper bound for the second probability in (13) is Gρk .Therefore,P[ f (YkC1)Hf (Yk )uYk Gx, Rk Grk , zk ]Ybk Gexp{A(uk yck )}CGρk → 0,i.e., Assumption 3.7 holds.4.
Convergence ResultsNow, we are ready to present the new convergence results. The firstone states the convergence with probability 1 of the sequence {Yk } to theglobal optimum.Theorem 4.1. If Assumptions 3.1–3.7 hold, then there exists a value2̃H0 such that, if in (4) and (6) 2̄ ∈(0, 2̃] is chosen, thenlim P[Yk ∈B2 ]G1,k→S∀2H0.Proof. See Theorem 3 in Ref.