J89 (1121213), страница 9
Текст из файла (страница 9)
11b). Then thecosts of P1 and P1 satisfy:C(P1 ) = v y , (y1 , α )T + · · · + v (yj−2 , α )T , (yj−1 , α )T + v (yj−1 , α )T , (ŷ, α )T $ %+= v y, (y1 , α)T + · · · + v (yj−2 , α)T , (yj−1 , α)T + Ld (ŷ, α )T − Ld (yj−1 , α )T $ %+≥ v y, (y1 , α)T + · · · + v (yj−2 , α)T , (yj−1 , α)T + Ld (ŷ, α)T − Ld (yj−1 , α)T= C(P1 ),αFig.
9 Strategy for provingTheorem 4W spaceα = αmax≥≥≥≥≥≥≥≥≥≥≥≥≥≥≥≥≥≥≥≥≥≥≥≥yy = y∗yαyyyy = (y, α)T y = (y , α )Ta) MT (y)Fig. 10αy∗ = (y ∗ , α max )Ty = (y , α)T y = (y , α )Tb) constructed tree T (y )Proof of part (a1) in Theorem 4 (a solid arrow indicates an edge in the spanning tree)J Glob Optima)Fig. 11b)Proof of part (a2) in Theorem 4 (a solid arrow indicates an edge in the spanning tree)where v (yi−1 , α )T , (yi , α )T = v (yi−1 , α)T , (yi , α)T , i = 1, .
. . , j − 1, and j−1 T j−1 T Ld (y , α ) = Ld (y , α) are true because h(yi ) = 0. Further, Ld (ŷ, α )T ≥Ld (ŷ, α)T is true because h(ŷ) = 0 and α > α.Similarly, the costs of P2 and P2 satisfy:C(P2 ) = v (ŷ, α)T , (ȳl−1 , α)T + v (ȳl−1 , α)T , (ȳl−2 , α)T + · · · + v (ȳ1 , α)T , y$ %+= Ld (ȳl−1 , α)T − Ld (ŷ, α)T+ v (ȳl−1 , α )T , (ȳl−2 , α )T+ · · · + v (ȳ1 , α )T , y$ %+≥ Ld (ȳl−1 , α )T − Ld (ŷ, α )T+ v (ȳl−1 , α )T , (ȳl−2 , α )T+ · · · + v (ȳ1 , α )T , y= C(P2 ),where v (ȳi , α )T , (ȳi−1 , α )T = v (ȳi , α)T , (ȳi−1 , α)T , i = 1, .
. . , l − 1, and l−1 T l−1 T Ld (ȳ , α) = Ld (ȳ , α ) are true because h(ȳi ) = 0. Further, Ld (ŷ, α )T ≥Ld (ŷ, α)T is true because h(ŷ) = 0 and α > α.& '+Moreover, for any ŷ, v (ŷ, α)T , (ŷ, α )T = Ld (ŷ, α)T − Ld (ŷ, α )T=''&&++(α − α )T |h(ŷ)| = 0 in MT(y ), and v (ŷ, α )T , (ŷ, α)T = (α − α)T |h(ŷ)| ≥ 0 inMT(y). Hence, W(y ) ≤ W(y).In the second part, we compare W(y) and W(y ) when α is fixed at α max .
For anyy ∈ Y and α ∈ , there exists a path such that α < α1 < · ·· < α <α max . From the firstmaxpart, we know that W(yα ) ≤ W (y, α )T ≤ · · · ≤ W (y, α1 )T ≤ W(y). Hence, amaxprobabilistic descent algorithm starting from y will arrive at yα eventually. Accordmaxingly, if we can show that W(y∗ ) ≤ W(yα ), then the same probabilistic descent willconverge to y∗ .maxmaxmaxLet MT(yα ) be the minimum-cost spanning tree at yα , and W(yα ) be itsmaxassociated virtual energy. There must exist a path of length q from y∗ to yαinmaxmaxα∗αMT(y): P = y0 (= y ) → y1 → · · · → yq−1 → yq (= y) (Fig. 12a).
Reversingmaxthis path, we obtain a path from yα to y∗ and also a spanning tree T(y∗ ) at y∗ with∗cost V(y ) (Fig. 12b). These costs satisfy:J Glob Optima)b)Fig. 12Proof of the second part in Theorem 4W(yαmax) − V(y∗ ) =q &'+ &'+ Ld (yk ) − Ld (yk−1 ) − Ld (yk−1 ) − Ld (yk )k=1qLd (yk ) − Ld (yk−1 ) = Ld (yq ) − Ld (y0 )=k=1= Ld (y, α max )T − Ld (y∗ , α max )T > 0based on the definition of α max and on evaluating the two possibilities Ld (yk ) ≥Ld (yk−1 ) and Ld (yk ) < Ld (yk−1 ). Because W(y∗ ) ≤ V(y∗ ), we have W(y∗ ) ≤ V(y∗ ) <maxW(yα ).By combining the two parts of the proof, we conclude that W(y) is minimized aty = y∗ .
Thus, the Markov chain converges to CGMd y∗ ∈ Yopt with probability oneaccording to Proposition 1.Appendix C: proof of Theorem 5The proof is similar to that of Theorem 4. The key difference, however, lies in thepartitioning of the neighborhood. The proof is centered on constructing, for y =(y, α max , γ max )T under a partitioned neighborhood, a minimum spanning tree rootedat y has a cost higher than that at y∗ = (y∗ , α max , γ max )T , where y ∈ Y − Yopt andy∗ ∈ Yopt .
We first construct a path from y to y∗ in the minimum-cost spanning treerooted at y. We then reverse the path and prove that a tree rooted at y∗ has lesstotal cost than that rooted at y. However, because the neighborhoods in CPSA arepartitioned by their constraints, the construction of the path from y to y∗ is differentand must be done across the partitioned neighborhoods. By proving that the minimum-cost spanning tree rooted at y∗ has less cost than that rooted at y, we concludethat W(y∗ ) < W(y). Finally, we use Proposition 1 to show that the Markov chainconverges to y∗ with probability one.The proof consists of two parts.J Glob OptimFig. 13 An illustration of the approach for proving Theorem 5, where (y∗ (t), α max , γ max )T is thepoint with the minimum virtual energy in the tth subspace(a) We show that W (y, α max , γ max )T ≤ W (y, α, γ )T for any y.
This can be doneby showing W (y, α , γ )T ≤ W (y, α, γ )T and W (y, α, γ )T ≤ W (y, α, γ )T forα > α, γ > γ . This proof is the same as the first part of the proof of Theorem 4,except that α and γ are used instead of α.Figure 13 shows the proof of of W (y, α , γ )T ≤ W (y, α, γ )T . The tth box inthe figure represents the subspace of (y(t), α(t))T similar to that in Fig. 9. Note that,although y(1), . . . , y(N) may overlap with each other, we have drawn the subspaceswithout overlap for clarity. In a way similar to that in Fig.
9, the search in the tthsubspace results in the solution (y∗ (t), α max (t))T with the minimum virtual energy(indicated by a solid shaded circle). Likewise, along the γ dimension of each subproblem, we can prove that W (y, α, γ )T ≤ W (y, α, γ )T .These observations lead to the conclusion that, for any y, α, and γ , W (y, α max , γ )T≤ W (y, α, γ )T and W (y, α, γ max )T ≤ W (y, α, γ )T , which can be combined to get:W (y, α max , γ max )T ≤ W (y, α, γ )T .(58)(b) We show that W(y∗ ) < W(y), where y = (y, α max , γ max )T and y ∈ Y − Yopt . Thisis done by constructing a path from y to y∗ that passes through the solution in eachsubproblem (the dashed path that joins the N solid circles in Fig. 13). We then showthat the reverse path has less cost.Let MT(yq ) and W(yq ) be the minimum-cost spanning tree of yq and its associated virtual energy.
For this tree, there must exist a path from y∗ to yq : P = y0 (=y∗ ) → y1 → · · · → yq−1 → yq of length q. The path exists because the Markov chainmodeling CPSA is ergodic.Consider the spanning tree T(y∗ ) at y∗ with the following path from yq to y∗ :yq → y1,1 → y1,2 · · · → y∗1 → y2,1 → y2,2 · · · → yi,1 → y∗2→ · · · → y∗N−1 → yN,1 · · · → y∗N = y∗ ,J Glob OptimFig. 14 The construction of a path from y to y∗ , where (y∗ (t), α max , γ max )T is the point with theminimum virtual energy in the tth subspacewhere yq ∈ N p(1) (y1,1 ), y1,1 ∈ N p(1) (y1,2 ), . .
. , y1,i ∈ N p(1) (y∗1 ),1∗y1 ∈N p(2) (y2,1 ),y2,1 ∈···∗yN−1 ∈N p(N) (yN,1 ),N p(2) (y2,2 ), . . . , y2,i2∈ N p(2) (y∗2 ),yN,1 ∈ N p(N) (yN,2 ), . . . , yN,iN ∈ N p(N) (y∗N )and y∗i = (y , α max , γ max )T and y (j) = y∗ (j) for j = 1, . . . , i.Figure 14 shows the construction of this path, where unshaded circles show thepartitioned components yq (1) to yq (N) of yq , solid circles show y∗ (1) to y∗ (N) of y∗ ,and shaded circles indicate those components of y∗ that may be changed during thepath-construction process.In the first step, we find a path from yq to y∗1 . Since the only difference betweenthese two points is y∗ (1), we only need to find a path from yq (1) to y∗ (1).
Such a pathalways exists due to the ergodicity of the Markov chain. After moving from yq (1) toy∗ (1), the values of yq (2), . . . , yq (N) may be changed to yq (2), . . . , yq (N) because theymay share some variables with yq (1).In the second step, we find a path from y∗1 to y∗2 . Since y∗ (1) has already beenreached, the only difference between these two points is y∗ (2), we only need to finda path from yq (2) to y∗ (2). Again, such a path must exist due to the ergodicity of theMarkov chain.In general, the path from y∗t−1 to y∗t for t = 2, . .
. , N, exists because the onlydifference between these two points is in one component, and the ergodicity of theMarkov-chain ensures the existence of the path. We continue the process until wereach y∗N = (y∗ (0), y∗ (1), . . . , y∗ (N))T = y∗ .By comparing W(yq ) of MT(yq ) and the cost V(y∗ ) of T(y∗ ), we have:W(yq ) − V(y∗ ) =q &'+ &'+ Ld (yk ) − Ld (yk−1 ) − Ld (yk−1 ) − Ld (yk )k=1q=k=1Ld (yk ) − Ld (yk−1 ) = Ld (yq ) − Ld (y0 )= Ld (y, α max , γ max )T − Ld (y∗ , α max , γ max )T > 0J Glob Optimbased on the definitions of α max and γ max and on evaluating the two possibilitiesLd (yk ) ≥ Ld (yk−1 ) and Ld (yk ) < Ld (yk−1 ).