J89 (1121213), страница 5
Текст из файла (страница 5)
This means that z ∈ N p (z) only1differs from z in z(t) or α(t) and remains the same for the other variables. This isdifferent from CSA that perturbs z in the overall variable space. As in CSA, αi is notperturbed when hi (z(t)) = 0 is satisfied. Last, Line 10 accepts z with the Metropolisprobability AT (z, z ) similar to that in (25):⎧⎨exp − (Lm (z )−Lm (z))+ ,TAT (z, z ) =⎩exp − (Lm (z)−Lm (z ))+ ,Tif z = (z , α, γ )T ,if z = (z, α , γ )T or z = (z, α, γ )T .(31)When t = N + 1 (Line 11), it represents a global exploration step.
In this case, Line(g)12 generates a random trial point z ∈ N p (z) using a generation probability G(g) (z, z )that satisfies the condition similar to that in (28). Assuming N m1(z(g)) to be the mixedneighborhood of z(g) and (g) = Rp , z is obtained by perturbing z(g) and γ in their(g)neighborhood N p (z):(g)(g)N p (z) = (z , α, γ )T ∈ S where z ∈ N p (z)1(g)∪ (z, α, γ )T ∈ S where γ ∈ N p2 (γ ) ,(g)N p (z) = z where z (g) ∈ N m (z(g)) and zi = zi ∀zi ∈/ z(g) ,11(g)N p (γ ) = γ ∈ (g) where (γi < γi or γi > γi if Hi (z) = 0)2and (γi = γi if Hi (z) = 0) .(32)(33)(34)(t)Again, z is accepted with probability AT (z, z ) in (31) (Line 13). Note that both N p (z)(g)and N p (z) ensure the ergodicity of the Markov chain, which is required for achievingasymptotic convergence.When compared to CSA, CPSA reduces the search complexity through constraintpartitioning.
Since both CSA and CPSA need to converge to an equilibrium distribution of variables at a given temperature before the temperature is reduced, the totalsearch time depends on the convergence time at each temperature. By partitioningthe constraints into subsets, each subproblem only involves an exponentially smallersubspace with a small number of variables and penalties. Thus, each subproblem takessignificantly less time to converge to an equilibrium state at a given temperature, andthe total time for all the subproblems to converge is also significantly reduced. Thisreduction in complexity is experimentally validated in Sect. 5.J Glob Optim____Fig.
4Greedy ESPC search method (GEM)3.3 Greedy ESPC search method (GEM)In this section, we present a dynamic penalty method based on a greedy search of anESP. Instead of probabilistically accepting a probe as in CSA and CPSA, our greedyapproach accepts the probe if it improves the value of the penalty function and rejectsit otherwise.One simple approach that does not work well is to gradually increase α ∗∗ untilα ∗∗ > α ∗ , while minimizing the penalty function with respect to z using an existinglocal-search method. This simple iterative search does not always work well becausethe penalty function has many local minima that satisfy the second inequality in (13),but some of these local minima do not satisfy the first inequality in (13) even whenα ∗∗ > α ∗ .
Hence, the search may generate stationary points that are local minima ofthe penalty function but are not feasible solutions to the original problem.To address this issue, Fig. 4 shows a global search called the Greedy ESPC SearchMethod [32] (GEM). GEM uses the following penalty function:m1Lg (z, α)T = f (z) +αi |hi (z)| + ||h(z)||2 .2(35)i=1Lines 5–8 carries out Ng iterative descents in the z subspace. In each iteration, Line6 generates a probe z ∈ N m1(z) neighboring to z. As defined in (24) for CSA, weselect z with uniform probability across all the points in N m1 (z).
Line 7 then evaluatesLg (z , α)T and accepts z only when it reduces the value of Lg . After the Ng descents,Line 9 updates the penalty vector α in order to bias the search toward resolving thoseviolated constraints.When α ∗∗ reaches its upper bound during a search but a local minimum of Lg doesnot correspond to a CLMm of Pm , we can reduce α ∗∗ instead of restarting the searchfrom a new starting point.
The decrease will change the terrain of Lg and “lower” itsbarrier, thereby allowing a local search to continue in the same trajectory and moveto another local minimum of Lg . In Line 10, we reduce the penalty value of a constraint when its maximum violation is not reduced for three consecutive iterations. Toreduce the penalties, Line 11 multiplies each element in α by a random real numberuniformly generated between 0.4 and 0.6.
By repeatedly increasing α ∗∗ to its upperbound and by reducing it to some lower bound, a local search will be able to escapeJ Glob Optimfrom local traps and visit multiple local minima of the penalty function. We leave thepresentation of the parameters used in GEM and its experimental results to Sect. 5.4 Asymptotic convergence of CSA and CPSAIn this section, we show the asymptotic convergence of CSA and CPSA to a constrained global minimum in discrete constrained optimization problems.
Withoutrepeating the definitions in Sect. 1, we can similarly define a discrete nonlinear programming problem (Pd ), a discrete neighborhood (N d (y)), a discrete constrained localminimum (CLMd ), a discrete constrained global minimum (CGMd ), and a penaltyfunction in discrete space (Ld ).4.1 Asymptotic convergence of CSAWe first define the asymptotic convergence property. For a global minimization problem, let be its search space, s be the set of all global minima, and ω(j) ∈ ,j = 0, 1, .
. . , be a sequence of points generated by an iterative procedure ψ until somestopping conditions hold.Definition 8 Procedure ψ is said to have asymptotic convergence to a global minimum, or simply asymptotic convergence [3], if ψ converges with probability one to anelement in s ; that is, limj→∞ P (ω(j) ∈ s ) = 1, independent of ω(0), where P(w) isthe probability of event w.In the following, we first prove the asymptotic convergence of CSA to a CGMdof Pd with probability one when T approaches 0 and when T is reduced accordingto a specific cooling schedule. We model CSA by an inhomogeneous Markov chain,show that the chain is strongly ergodic, prove that the chain minimizes an implicitvirtual energy based on the framework of generalized SA (GSA) [25,26], and showthat the virtual energy is at its minimum at any CGMd .
We state the main theoremsand illustrate them by examples, while leaving the proofs to the appendices.CSA can be modeled by an inhomogeneous Markov chain that consists of asequence of homogeneous Markov chains of finite length, each at a specific temperature in a cooling schedule. Its one-step transition probability matrix is PT = [PT (y, y )],where:⎧if y ∈ N d (y),G(y, y )AT (y, y ),⎪⎪⎨"G(y, y )AT (y, y ),if y = y,(36)PT (y, y ) = 1 − y∈N(y)⎪d⎪⎩0,otherwise.Example 2 Consider the following simple discrete minimization problem:min f (y) = −y2y(37)subject to h(y) = |(y − 0.6)(y − 1.0)| = 0,where y ∈ Y = {0.5, 0.6, . .
. , 1.2}. The corresponding penalty function is:Ld (y, α)T = −y2 + α · |(y − 0.6)(y − 1.0)|.(38)J Glob Optimα =6y =0.6y =0.8y =1.0y =1.2y =0.5y =0.7y =0.9y =1.1Ld (y , α)TLd (y , α)T0.50-0.5-1-1.5α =5α =465α =30.5 0.60.7 0.80.9 1.01.1 1.2 2yα =2a)43αb)Fig. 5 The Markov chain with the transition probabilities defined in (36) for the example problemin (37) and the corresponding penalty-function value at each state. The four ESPs are shaded in (a)By choosing α ∈ = {2, 3, 4, 5, 6}, with the maximum penalty value α max at 6, thestate space is S = {(y, α)T ∈ Y × } with |S| = 8 × 5 = 40 states.
At y = 0.6 or y = 1.0where the constraint is satisfied, we can choose α ∗ = 1, and any α ∗∗ > α ∗ , includingα max , would satisfy (13) in Theorem 1.In the Markov chain, we define N d (y) as in (21), where N d (y) and N d2(α) are as1follows:N d (y) = {y − 0.1, y + 0.1| 0.6 ≤ y ≤ 1.1} ∪ {y + 0.1| y = 0.5} ∪ {y − 0.1| y = 1.2},1(39)N d (α) = {α − 1, α + 1| 3 ≤ α ≤ 5, y = 0.6 and y = 1.0} ∪ {α − 1| α = 6,2y = 0.6 and y = 1.0} ∪ {α + 1| α = 2, y = 0.6 and y = 1.0}.(40)Figure 5 shows the state space S of the Markov chain. In this chain, an arrow from yto y ∈ N d (y) (where y = (y , α)T or (y, α )T ) means that there is a one-step transitionfrom y to y whose PT (y, y ) > 0.
For y = 0.6 and y = 1.0, there is no transition amongthe points in the α dimension because the constraints are satisfied at those y values(according to (22)).There are two ESPs in this Markov chain at (0.6, 5)T and (0.6, 6)T , which correspond to the local minimum at y = 0.6, and two ESPs at (1.0, 5)T and (1.0, 6)T , whichcorrespond to the local minimum at y = 1.0. CSA is designed to locate one of the ESPsat (0.6, 6)T and (1.0, 6)T . These correspond, respectively, to the CLMd at y∗ = 0.6 andy∗ = 1.0.Let yopt = {(y∗ , α max )T | y∗ ∈ Yopt }, and NL be the maximum of the minimumnumber of transitions required to reach yopt from all y ∈ S.
By properly constructingN d (y), we state without proof that PT is irreducible and that NL can always be found.This property is shown in Fig. 5 in which any two nodes can always reach each other.Let NT , the number of trials per temperature, be NL .
The following theorem statesthe strong ergodicity of the Markov chain, where strong ergodicity means that statey of the Markov chain has a unique stationary probability πT (y). (See the proof inAppendix A.)Theorem 3 The inhomogeneous Markov chain is strongly ergodic if the sequence oftemperatures {Tk , k = 0, 1, 2, . . . , } satisfies:Tk ≥NL L,loge (k + 1)(41)J Glob Optim!where Tk > Tk+1 , lim Tk = 0, and L = 2 maxy∈S,y ∈Nd (y) |Ld (y ) − Ld (y)| .k→∞Example 2 (cont’d) In the Markov chain in Fig. 5, L = 0.411 and NL = 11. Hence,the Markov chain is strongly ergodic if we use a cooling schedule Tk ≥ log4.521.e (k+1)Note that the cooling schedule used in CSA (Line 10 of Fig.
1) does not satisfy thecondition.Our Markov chain also fits into the framework of GSA [25,26] when we definean irreducible Markov kernel PT (y, y ) and its associated communication cost v(y, y ),where v: S × S → [0, +∞] and y ∈ N d (y):(Ld (y ) − Ld (y))+ ,if y = (y , α)T ,v(y, y ) =(42)+(Ld (y) − Ld (y )) ,if y = (y, α )T .Based on the communication costs over all directed edges, the virtual energy W(y)(according to Definition 2.5 in [25,26]) is the cost of the minimum-cost spanning treerooted at y:W(y) = min V(g),g∈G(y)(43)where G(y) is the set of spanning trees rooted at y, and V(g) is the sum of the communication costs over all the edges of g.The following quoted result shows the asymptotic convergence of GSA in minimizing W(i):Proposition 1 “(Proposition 2.6 in [15,25,26]). For every T > 0, the unique stationarydistribution πT of the Markov chain satisfies:W(i) − W(E)πT (i) −→ exp −as T −→ 0,(44)Twhere W(i) is the virtual energy of i, and W(E) = mini∈S W(i).”In contrast to SA thatstrives to minimize a single unconstrained objective, CSAdoes not minimize Ld (y, α)T .