Chen Disser (1121212), страница 15
Текст из файла (страница 15)
Then the lower bound of1 − τ1 (PTNkT ) is:k=0where PTNkT is the NT -step transition matrix of the Markov chain under temperature Tk and1 − τ1 (PTNkT ) =τ1 (P ) is known as the coefficient of ergodicity of a transition matrix P . Since the NT -step≥transition matrix PTNkT is the convolution of the one-step transition matrix PTk , we showthat all entries in the one-step transition matrix PTk is larger than a lower bound. CSA has≥miny,y ∈Smin{PTNkT (y, y), PTNkT (y , y)}y∈Smin min{PTNkT (y, y), PTNkT (y , y)}y,y ∈S y ∈SP G e−L /Tkderived such a lower bound for PTk .
In this proof, we have derived a different lower boundfor PTk based on the new transition matrix of CPSA.NTNT −L NT /TkT= N.P G eThen using any temperature schedule that satisfies (3.44), the following holds:a) Let:∞G =miny∈S,y ∈Ni (y)|i=0,··· ,N +1qi (y, y ),and P =mini=0,··· ,N +1pt (i)(3.46)For all y ∈ S and y ∈ N (y), we havek=0[1 − τ1 (PTNkT )] ≥∞NT −L NT /TkNTTTN≥ NP G eP Gk=k0∞k=k01= ∞.k+1(3.48)Therefore, the Markov chain is weakly ergodic.c) In addition, because the transition probability PTk (y, y) for all y, y ∈ S belongs to thePTk (y, y) =N+1pt (i)qi (y, y)ATk (y, y ) ≥ P G e−L /Tk ,(3.47)i=0exponential rationals in a closed class of asymptotically monotone functions (CAM) [2, 3],the Markov chain is strongly ergodic.Strongly ergodicity implies that the Markov chain has a unique stationary distributionbecause (L(y )−L(y))+ ≤ L for y = (y , α, γ) and (L(y)−L(y ))+ ≤ L for y = (y, α, γ)or y = (y, α, γ ), according to the definition of L .b) Based on (3.47), for all y, y ∈ S and k ≥ k0 , the NT -step transition probability fromπT , where πT (y) is the probability of hitting point y during the search of CPSA.Comparing the ΔL for CSA and CPSA, although it depends on the user-defined neighborhood, the ΔL for CPSA is usually smaller than the ΔL for CSA.
Since ΔL is the maximumdifference of the penalty function value between two neighboring points, CPSA has a smaller7374ΔL because its neighborhood is partitioned and two neighboring points only differ by a smallfunction.”subset of the variables, which leads to small differences in their penalty function values. InAny one-step transition probability QT (i, j) that satisfies the two conditions in Definitioncontrast, two neighboring points in CSA can differ in all variables and therefore have larger3.1 is called generalized simulated annealing (GSA).
In the following, we quote the notion ofvariations in their penalty function values. A smaller ΔL for CPSA means that it can reduceA-graph and virual energy as defined in [26, 82]:the temperature and converge to CGM faster than CSA.Definition 3.2 A-Graph “(Definition 2.4 in [26, 82]). Let A ⊂ E. A set gAsymptotic Convergence to Constrained Global Minimaof arrows i → j in Ac × E is an A-graph, where j ∈ N (i), iff a) for each i ∈ Ac ,In this section, we briefly overview the framework of generalized simulated annealing (GSA) [82]there exists a unique j ∈ E such that i → j ∈ g; b) for each i ∈ Ac , there is aand show that the Markov chain modelling CPSA fits into this framework in which thepath in g ending on a point j ∈ A.”Markov chain minimizes a virtual energy.
Based on the results of GSA, we prove that CPSAHere E is the set of all states (or nodes) in the Markov chain defined in Definition 3.1,has asymptotic convergence to a CGMd .A is a subset of E, and Ac is the complement of A in E. Accordingly, A-graph g is actuallyGeneralized Simulated Annealing (GSA). GSA [82] aims to establish a new frame-a spanning tree rooted at set A, where the tree is defined over the digraph constructed bywork for a class of stochastic search procedures broader than the SA algorithm, where thethe neighborhood N (i). Let G(A) be the set of A-graphs. For each g ∈ G(A), the cost of gis defined as V (g) = i→j∈g V (i, j).family of transition probabilities is given as follows:Definition 3.1 Communication Cost “(Definition 2.1 in [82]). Let (QT )T >0be a family of Markov kernels on E. We say that (QT )T >0 is admissible for q andDefinition 3.3 Virtual Energy “(Definition 2.5 in [82]).
For each state i ∈ E,its virtual energy W (i) is defined as:k if there exists a family of positive real-valued numbers (V (i, j))(i,j)∈E×E suchW (i) = min V (g),that:g∈G({i})(3.50)• V (i, j) < +∞, iff q(i, j) > 0,which is the cost of the minimum spanning tree rooted at point i.”• for all T > 0, all i, j ∈ E,The following theorem shows the asymptotic convergence of GSA in minimizing virtual1q(i, j)e−V (i,j)/T ≤ QT (i, j) ≤ κq(i, j)e−V (i,j)/T ,κwhere function V(3.49)energy W (i).: E × E → [0, +∞] is called the communication cost7576Proposition 3.1 “(Proposition 2.6 in [26, 82]).
For every T > 0, the unique111000000000≥111≥111000111000000000 111111000 111111000111γ1100000000≥111≥111001100000000 11111000 111111000111αstationary distribution πT of the Markov chain satisfies:W (i) − W (E)πT (i) −→ exp −Tyas T −→ 0,111000000000≥111≥111000111000000000 111111000 11111100011111001100≥ 00≥ 11001100110011001100111100≥≥(3.51)y ∗ (0)y=y∗where W (i) is the virtual energy of i, and W (E) = mini∈S W (i).”≥≥≥≥≥≥≥≥≥Asymptotic Convergence of CPSA.
Our Markov chain (3.43) fits into the framework of11001100≥ 1100 ≥ 001100110011001100110011≥≥≥≥≥y(0)GSA (3.49), if we define an irreducible Markov kernel QT (i, j) = PT (y, y ) and its associated11000011001100111100001100110011≥≥111000000111000111000111≥≥000111111000000111000111≥≥≥≥1110000001110001110001111100001100110011111000000111000111y ∗ (N 111)000111000000111000111000111000111111000000111000111111000000111000111111000000111000111000111111000000111000111000111y(N )y(1)communication cost V (y, y):V (y, y) =y ∗ (1)111000000000≥111≥111000111000000000 111111000 111111000111Figure 3.6: Proof strategy for Theorem 3.8.⎧⎪⎪(L(y ) − L(y))+ if y = (y , α, γ)⎪⎪⎨⎪⎪⎪⎪⎩ (L(y) − L(y ))+ if y = (y, α, γ) or y = (y, α, γ ).(3.52)Yopt with probability one.Proof.Obviously V (y, y) ≥ 0 and function V : S × S → [0, +∞].Since CPSA fits into the framework of GSA, according to the result of GSA, any procedurethat can be modelled by GSA minimizes an implicit virtual energy W (y), and converges tothe global minimum of W (y) with probability one.
Here, virtual energy W (y) is the cost ofthe minimum spanning tree rooted at point y of the digraph governed by N (y). Therefore,in order to prove that CPSA converges asymptotically to a CGMd , we need to show thatW (y) is minimized at (y ∗, α, γ) for certain α, γ. This is stated in the following theorem.The proof consists of two steps as shown in Figure 3.6. First, we show thatfor a given y, the virtual energy satisfies W ((y, α, γ)) ≤ W ((y, α, λ)) for any α > α andW ((y, α, γ )) ≤ W ((y, α, λ)) for any γ > γ.
Second, we show that W ((y ∗, αmax , γmax )) <W ((y, αmax , γmax ) at the maximum value of the penalty values, where y ∗ ∈ Yopt and y ∈D w − Yopt . Hence, W (y) is minimized at (y ∗ , αmax , γmax ), and the Markov chain convergesto CGMd y ∗ ∈ Yopt with probability one.Figure 3.6 illustrates the key difference between the proof for CSA and CPSA in thesecond step. CSA was proved under an unpartitioned neighborhood, whereas CPSA isproved under a partitioned neighborhood here.
We outline the proof strategy and point outTheorem 3.8 The Markov chain modeling CPSA converges asymptotically to a CGMd y ∗ ∈the difference between the proof for CPSA and CSA before presenting the details. The proofis similar to the original proof for the asymptotic convergence of CSA [98].
The differenceis in the second step, where we need to show that W ((y ∗, αmax , γmax )) < W ((y, αmax , γmax ))7778at the maximum value of the penalty values. Since the virtual energy W ((y ∗ , αmax , γmax ))is defined to be the total cost of a minimum spanning tree rooted at y ∗ , the proof strategyin CSA is to construct a path from y to y ∗ in the minimum spanning tree of an arbitraryfor all y ∈ D w .Accordingly, we only need to compare W ((y, αmax , γmax )) (where y ∈ D w − Yopt ) withW ((y ∗, αmax , γmax )) (where y ∗ ∈ Yopt ) at the maximum αmax , γmax of the penalty values.point y ∈ D w − Yopt , reverse the path, and prove that we can construct a spanning treeLet MT (y) be the minimum spanning tree of y = (y, αmax , γmax ) and its associatedof y ∗ that has less total weight than the minimum spanning tree of y.
Since the minimumvirtual energy be W (y). There must exist a path P = y0 (= y∗ ) → y1 → · · · → yr−1 →spanning tree of y ∗ has an even less total cost than the constructed tree, we can concludeyr (= y) of length r from y∗ = (y ∗ , αmax , γmax ) to y in MT (y). This path can be constructedthat W ((y ∗, αmax , γmax )) < W ((y, αmax , γmax )).by linking all the partitioned neighborhoods together. In fact, we can find the followingThe key difference, as illustrated in Figure 3.6, between the proof for CPSA and that forpaths:CSA is that, while the neighborhood of CSA is defined in the entire search space withoutpartitioning, CPSA has partitioned neighborhoods.
Therefore, the key point in our proof isy ∈ N0 (y0,1 ), y0,1 ∈ N0 (y0,2 ), · · · , y0,i0 ∈ N0 (y0∗ )(3.56)to show that we can still construct a path from any point y ∈ D w − Yopt to a point y ∗ ∈ Yopty0∗ ∈ N1 (y1,1 ), y1,1 ∈ N1 (y1,2 ), · · · , y1,i1 ∈ N1 (y1∗ )(3.57)by linking the partitioned neighborhoods together, and that the spanning tree of y ∗ obtained···from reversing the path has a total cost less than the cost of the minimum spanning tree∗∗yN−1∈ NN (yN,1 ), yN,1 ∈ NN (yN,2 ), · · · , yN,iN ∈ NN (yN)(3.58)rooted at y.a) The proof for the first step is the same as that in the first step of the proof to Theoremwhere3.1 in Wang’s Thesis on the asymptotic convergence of CSA [98], except that the penaltyyi∗ = (y , αmax , γmax )vectors (α, γ) here are equivalent to λ in Wang’s Thesis.b) From (a) , we have that, for any y ∈ D w , α, γ,and y (j) = y ∗ (j), j = 0, · · · , i.∗Based on the definition of Nd (y(i)), there is a path from yi−1to yi∗ for i = 1, · · · , N.