Chen Disser (1121212), страница 14
Текст из файла (страница 14)
We define Yopt to be the set of all CGMd of Pe : Yopt ={y ∗|y ∗ is a CGMd of Pe }.The goal of constrained partitioned simulated annealing (CPSA) is to find an extendedsaddle point with the minimum objective value, i.e. a CGMd , of Pet , by solving the set ofpartitioned subproblems and by resolving the violated global constraints.The CPSA algorithm is based on the framework in Figure 3.3.
Instead of considering all6566h(0) (y(0))Constraints:h(1) (y(1))h(N ) (y(N))Variables:y(0)y(1)y(N)Penalties:α(0)α(1)α(N)t=0t=1y = N0 (y)y = N1 (y)Constraints: H(y)Variables: yy = Nglobal (y)t=Ny = NN (y)t=N+1Penalties: γ1. procedure Constraint Partitioned Simulated Annealing (CPSA)2.set starting values of y = (y, α, γ);3.set starting temperature T = T 0 and cooling rate 0 < ξ < 1;4.set NT (number of trials per temperature);5.while stopping condition is not satisfied do6.for k ← 1 to NT do7.set subproblem t to be a random integer from 0 to N + 1;8.if 0 ≤ t ≤ N9.generate trial point y from Nt (y) using qt (y, y );10.accept y with probability AT (y, y);11.else /* t = N + 1 */12.generate trial point y from Nglobal (y) using qglobal (y, y );13.accept y with probability AT (y, y);14.end if15.end for16.reduce temperature by T ←− ξ · T ;17.end while18.
end procedureFigure 3.4: The search procedure of constraint partitioned simulated annealing (CPSA).Figure 3.5: CPSA: the constraint partitioned simulated annealing algorithm.the constraints together without partitioning as in CSA, CPSA performs searches in multiplesubproblems, each with a small subset of constraints.Before presenting the CPSA procedure, we illustrate its idea in Figure 3.4. Unlike theoriginal CSA that probes the whole space and takes all the constraints into account in eachThe initial temperature T 0 is set (Line 3) to be so large that almost all trial points y willbe accepted.
Line 4 sets NT , the number of iterations at each temperature. At each fixedtemperature, we generate NT probes. In each step, we randomly pick a subproblem t between0 and N + 1. pt (i), the probability that t = i, can be chosen arbitrarily as long as:step, CPSA partitions the constraints into N + 1 subsets. For subproblem t, t = 0, ..N,we still perform constrained simulated annealing, but only in the smaller space of thosevariables relevant to the local constraints h(t) (y(t)), including the original variables y(t) andN+1pt (i) = 1 and pt (i) > 0.(3.36)i=0the penalties α(t).
In addition, there is a global search that perturbs the original variable yand the penalties γ for the global constraints. This additional search component is needed inWhen t is an integer between 0 to N, we generate a probe in the search space of sub-order to resolve any violated global constraint in H(y) and ensure the asymptotic convergenceproblem t that perturbs y(t) and α(t) (Line 9); when t = N + 1, it is a global explorationof CPSA to CGMd .step where we generate a probe that perturbs y and γ in the global constraints (Line 12).Figure 3.5 describes the CPSA procedure.
It begins from starting point y = (y, α, γ)Line 9 generates a random trial point y in the neighborhood Nt (y) of the current point(Line 2), where y can be either user-provided or randomly generated, and α = λ = 0.6768x = (y, α, γ) in search space S = D w × Rp × Rq , where Nt (y) is defined as follows:Nt (y) = {(y , α, γ) ∈ S where y ∈Nt1 (y) =Nt2 (α) =Nt1 (y)}γ and to have a global exploration in the y-space by perturbing y. Nglobal (y) ensures the{(y, α, γ) ∈ S where α ∈ Nt2 (α)}y y (t) ∈ Nd (y(t)) and ∀yi ∈/ y(t), yi = yi ,(3.37)α α (t) ∈ RP (t) and ∀αi ∈/ α(t), αi = αi .(3.38)connectivity of the underlying Markov chain model of the process, which is required toachieve the asymptotic convergence of the algorithm. We will identify where this definitionis used in the proof for asymptotical convergence.Here, Nd (y(t)) is a user-defined discrete-space neighborhood of variable vector y(t) in stageWe accept y according to acceptance probability AT (y, y ):⎧⎪⎪⎨ exp −AT (y, y) =⎪⎪⎩ exp −(L(y )−L(y))+T(L(y)−L(y ))+Tif y = (y , α, γ)(3.42)if y = (y, α, γ) or y = (y, α, γ )t.
Intuitively, Nt (y) is the neighborhood of the original variable vector y(t) and the penaltyvector α(t) in subproblem t. In solving subproblem t, we generate a point y ∈ Nt (y) witha generation probability qt (y, y) that can be arbitrarily defined as long as the followingwhere (a)+ = a if a > 0, and (a)+ = 0 otherwise for all a ∈ R.The CPSA algorithm fits in the ESPC search framework in Figure 3.3a by performingascents in the penalty subspace and descents in the y subspace in order to solve each subprob-condition is satisfied:0 ≤ qt (y, y ) ≤ 1 andlem individually.
However, when compared to the deterministic search algorithm in Figureqt (y, y ) = 1.(3.39)y ∈Nt (y)3.3b, there are a couple of major differences with CPSA. First, instead of deterministicallyenumerating the subproblems from 0 to N in a round-robin fashion, CPSA randomly picksLine 12 generates a random trial point y in the neighborhood Nglobal (y) of the currenta subproblem in each step. We use this random selection in order to have a memory-lesspoint y = (y, α, γ) in search space S = D w × Rp × Rq , where Nglobal (y) is defined as follows:Markovian process in CPSA.
Since a round-robin selection is not memory-less, the randomselection of subproblems is used to facilitate the convergence analysis in a Markov-chainNglobal (y) = {(y, α, γ) ∈ S where y ∈ N 3 (y)} {(y, α, γ ) ∈ S where γ ∈ N 4 (γ)},N 3 (y) =N t=0N 4(γ) =/ y(t) ,y y (t) ∈ D y(t) and yi = yi ∀yi ∈γ γ ∈ Rq ,model. Second, instead of deterministic decreases of the penalty function in the y subspaceand deterministic increases in the penalty subspaces for α and γ in Figure 3.3b, CPSA ac-(3.40)cepts new probes in all the three subspaces stochastically using an acceptance probabilitycontrolled by a decreasing temperature. This is a technique used in constrained simulated(3.41)annealing for ensuring global convergence.
Based on the above modifications, we can provein the next section that CPSA converges asymptotically to a CGMd with probability one.Intuitively, probes in Nglobal (y) are made to resolve violated global constraints by updating69Comparing to CSA, CPSA reduces the search complexity through constraint partition70ing. Since both CSA and CPSA need to converge to an equilibrium distribution of variablesat a given temperature before the temperature is reduced, the total search time dependson the convergence time at each fixed temperature.
By partitioning the constraints intosubsets, each subproblem involves an exponentially smaller subspace with only a small number of variables and penalty values. Thus, each subproblem takes significantly less time toconverge to an equilibrium distribution at a given temperature, and the total time for allthe one-step transition probability of the Markov chain is:⎧⎪⎪pt (i)qi (y, y)AT (y, y)if y ∈ Ni (y), i = 0, · · · , N⎪⎪⎪⎪⎪⎨ pt (N + 1)qN +1 (y, y)AT (y, y)if y ∈ Nglobal (y)PT (y, y) =(3.43)⎪⎪if y = y1− N⎪i=0y∈Ni (y) PT (y, y) −y∈Nglobal (y) PT (y, y)⎪⎪⎪⎪⎩ 0otherwise,the subproblems to converge is also reduced significantly.and the corresponding transition matrix is PT = [PT (y, y)].
Note that when y ∈ Ni (y), i =3.3.2Asymptotic convergence of CPSAIn this section, we prove the asymptotic convergence of CPSA to a CGMd y ∗ of Pd bymodelling the process as an inhomogeneous Markov chain and by following a similar line asthe original proof for the asymptotic convergence of CSA [98]. We show that the Markovchain is strongly ergodic, that the Markov chain minimizes an implicit virtual energy basedon the framework of generalized SA (GSA) [82], and that the virtual energy is at the its0, · · · , N, y in fact only differs from y in y (i) of the ith subproblem and remains the samefor the other parts. It is different from CSA which perturbs y in the overall variable space.Let yopt = {(y ∗, α, γ)| y ∗ ∈ Yopt}, and NL be the maximum of the minimum number oftransitions required to reach yopt from all y in the Markov space.
Consider the sequenceof temperatures {Tk , k = 0, 1, 2, · · · }, where Tk > Tk+1 and limk→∞ Tk = 0, and set NT ,the number of trials per temperature, to be NL . The following theorem proves the strongergodicity of the Markov chain.minimum at any CGMd .Theorem 3.7 The inhomogeneous Markov chain is strongly ergodic if the sequence of tem-Inhomogeneous Markov Chainperatures {Tk } satisfy:CPSA can be modelled by an inhomogeneous Markov chain that consists of a sequenceof homogeneous Markov chains of finite length, each at a specific temperature in a givenTk ≥temperature schedule.
In order to model the CPSA algorithm by a Markov chain, we need atuple with three components to describe the state space of the Markov chain: y = (y, α, γ),where y ∈ D w is the original variable and α, γ are the penalty vectors.According to the generation probability q(y, y ) and acceptance probability AT (y, y),NL L,ln(k + 1)(3.44)where L = maxy {|L(y) − L(y)|, y ∈ N (y)}.Proof.The proof of strong ergodicity follows the steps used to show the weak ergodicityof CSA [98] and use the strong ergodicity conclusions [2, 3].
Before we provide the detailedproof, we outline the strategy of the proof and point out its difference from the original proof7172y = y0 to y = yNT satisfies the following:of strong ergodicity of CSA [98].Based on previous strong ergodicity conclusions [2, 3], the key of the proof is to show theNTPTNkT (y, y) ≥ PTk (y0 , y1 )PTk (y1 , y2 ) · · · PTk (yNT −1 , yNT ) ≥ P G e−L /Tk.following inequality:∞[1 − τ1 (PTNkT )] ≥ ∞.(3.45)Let τ1 (P ) be the coefficient of ergodicity of transition matrix P .