J89 (1121213), страница 8
Текст из файла (страница 8)
Thereduction in time can be more than an order of magnitude for large problems, such asZAMB2-8 and READING6. CPSA can also achieve the same or better quality andsuccess ratio than CSA for most of the instances tested. For example, for LAUNCH,CPSA achieves an average quality of 21.85, best quality of 9.01, and a success ratio of100%, whereas CSA achieves, respectively, 26.94, 9.13, and 90%.The nonlinear continuous optimization benchmarks evaluated in this section aremeant to demonstrate the effectiveness of CSA and CPSA as dynamic penalty methods. We have studied these benchmarks because their formulations and solutions arereadily available and because benchmarks on nonlinear discrete constrained optimization are scarce.
These benchmarks, however, have continuous and differentiablefunctions and, therefore, can be solved much better by solvers that exploit such properties. In fact, the best solution of most of these problems can be found by a licensedversion of SNOPT [16] (Version 6.2) in less than one second of CPU time! In thisrespect, CSA and CPSA are not meant to compete with these solvers. Rather, CSAand CPSA are useful as constrained optimization methods for solving discrete, continuous, and mixed-integer problems whose constraint and objective functions arenot necessarily continuous, differentiable, and in closed form. In these applications,penalty methods are invariably used as an effective solution approach.
As an example,a recent application uses CSA to optimize out-of-core code generation for a specialclass of imperfectly nested loops encoding tensor contractions [20].1002010010010010025100100501001001001001001001001001001001001001001001001001001001001001009.47 × 107–34.24−28.65174.80680.65–2.354227.111.95−1735.55−1909.98−62.05−59.0122.530.01726.94−0.02100.711.41−105−1.0031.0412029.65−1442.48549.61−16.420.0050.00−970.19−54.719.47 × 107–1.98−49.08174.80680.63–0.1011.651.95−1735.57−1910.33−62.05−59.0122.531.69 × 10−29.13−0.0233.531.41−105−1.0029.321.53−2379.4549.49−16.420.0050.00−1007.35−97.3204.74–7.371.2465.861.79–18.450.70.55a 5091.47a 619.190.440.620.820.36a 495.891.21a 226.070.52a 625.0311.181031.340.66.95103.34296.496.920.435.773059.25100–10010083100–10010048100100100100100100901001001001001001001001001001001001001001009.47 × 1072.42 × 10582.84−17.34174.80680.641.87 × 1041.2420.281.95–−1909.39−62.05−59.0122.570.01722.80−0.01913.401.41−105−1.0031.561.3 × 1059393.45550.00−16.420.0050.00−763.80−66.129.47 × 1071.94 × 1052.17−49.05174.80680.631.87 × 1041.1211.721.95–−1910.05−62.05−59.0122.571.69 × 10−29.01−0.022.661.41−105−1.0031.2235.832699.04550.00−16.420.0050.00−956.14−94.727.9335.033.480.062.580.1780.368.150.070.02195.2817.830.020.040.060.0212.330.16.340.0417.370.4431.270.045.5612.366.230.40.030.33202.869.47 × 1072.14 × 10529.23−38.87174.80680.631.87 × 1041.2014.651.95−1734.83−1910.45−62.05−59.0122.530.01721.85−0.0276.351.41−105−1.0030.112564.35−2225.33549.61−16.420.0050.00−998.64−58.45AVION2BATCHCRESC4CSFI1DEMBO7DIPIGRIDNIEPEREXPFITAFLETCHERGIGOMEZ2HIMMELBIHIMMELBJHIMMELP2HIMMELP6HONGHUBFITLAUNCHLINLOADBALLOOTSMAMESHMISTAKEMRIBASISMWRIGHTODFITSOPTCNTRLOPTPRLOCPENTAGONPOLAK5QCREADING69.47 × 1072.00 × 1051.78−49.08174.79680.631.87 × 1040.0611.651.95−1735.39−1910.56−62.05−59.0122.531.69 × 10−29.01−0.020.781.41−105−1.0021.521.53−2379.4549.49−16.420.0050.00−1018.09−105.45QbestCPSA (with κ=0.8 and 100 runs)CSA (with κ = 0.8 and 100 runs)GEMQavgQbestTavg Psucc QavgQbestTavgPsucc Qavg(%)(%)ProblemID3213.4913242.4331.822.95232.1611.183071.6760.183.759.67–3514.920.951.588.6814.731205.033.022350.601.5614453.7663.8424345.349.8321.151376.451381.4647.321.6829.5626027aTavg10051009857100310010050–99971001001008710010010010090100100100100100931001001009.47 × 107–82.84−23.26174.81680.64–1.2423.97––−1904−62.05−59.0122.570.01724.583−0.01913.451.41−105−1.0034.031.4 × 1059497.43550.00−16.420.0050.00−743.65−66.332798–31.820.98198.239.45–63.944.76––2304.040.952.348.915.621403.403.012454.652.751456070.482043411.8420.481487.73872.4349.381.8330.5629820a100–1009142100–100100––9489100100100401001001001004590100100100100751001001009.47 × 107–82.84−20.57174.81680.64–1.2623.761.95–−1908−62.05−59.0122.570.01724.54−0.01914.541.41−105−1.00–2 × 10510320.3550.00−16.420.0050.00−743.22−68.122578–31.821.54203.6414.38–76.186.4212.04–1953.940.951.7710.316.071543.043.211934.342.431213977.74–10.3424.321432.54787.4236.341.9823.4930223a100–1009340100–10010022–57191001001003410010010010098–10010010010094100100100P3 Penalty MethodP4 Penalty MethodPsucc QavgTavgPsucc QavgTavgPsucc (%)(%)(%)Table 1 Experimental results comparing CPSA, CSA, GEM, P3 , and P4 in solving selected nonlinear continuous problems from CUTEJ Glob Optim25906.325.51334.130.000.00360.214.2915.732.76−0.5615.081586.97−0.1313016.095.46334.130.000.000.004.2515.730.76-0.5615.081586.97−0.150.880.210.010.020.020.050.030.780.130.774.8511.54364.06391001001001001001001001001001007710029614.395.47334.130.0079.12360.864.2515.732.33−0.5615.092566.821.3516928.295.46334.130.000.000.004.2515.730.76-0.5615.08509.5−0.157.454.860.250.40.380.880.6316.22.217.6348.5415.556247.5713100100100100100100100100100100983–5.46334.30–267.08505.704.25–2.61−0.5515.08–––5.46334.30–0.000.004.25–0.76−0.5515.08–––38.851.20–2.452.702.53–18.9395.36336.45–––100100–100100100–100100100–––5.46334.30–268.54512.564.25–2.98−0.5515.08–––40.661.24–2.562.722.57–19.2392.98458.92–––100100–100100100–100100100–––5.46334.30–269.18505.804.25–2.86−0.5515.08–––38.951.50–2.482.672.54–13.4392.21492.3–––100100–100100100–100100100––a Only ten runs were made for these problems due to the extensive CPU time required for each run.Each instance was solved by a solver 100 times from random starting points.
The best Qavg (resp., Qbest ) among the five solvers are underlined. − means that nofeasible solution was found in a time limit of 36,000 s. All runs were done on an AMD Athlon MP2800 PC with RH Linux AS4.RK23ROBOTS316-322SINROSNBSNAKESPIRALSTANCMINSVANBERGSYNTHES1SYNTHES2SYNTHES3TENBARS4ZAMB2-8Table 1 continuedJ Glob OptimJ Glob Optim6 ConclusionsWe have reported in this paper CSA and CPSA, two dynamic-penalty methods forfinding constrained global minima of discrete constrained optimization problems.Based on the theory of ESPs, our methods look for the local minima of a penaltyfunction when the penalties are larger than some thresholds and when the constraintsare satisfied.
To reach an ESP, our methods perform probabilistic ascents in the penaltysubspace, in addition to probabilistic descents in the problem-variable subspace as inconventional SA. Because both methods are based on sampling the search space ofa problem during their search, they can be applied to solve continuous, discrete, andmixed-integer optimization problems without continuity and differentiability.Based on the decomposition of the ESP condition into multiple necessary conditions [27], we have shown that many benchmarks with highly structured and localizedconstraint functions can be decomposed into loosely coupled subproblems that arerelated by a small number of global constraints. By exploiting constraint partitioning,we have demonstrated that CPSA can significantly reduce the complexity of CSA.Last, we have proved the asymptotic convergence of CSA and CPSA to a CGMwith probability one.
The result is theoretically important because it extends SA,which guarantees asymptotic convergence in discrete unconstrained optimization, tothat in discrete constrained optimization. Moreover, it establishes a condition underwhich optimal solutions can be found in constraint-partitioned nonlinear optimizationproblems.Appendix A: proof of Theorem 3The proof of strong ergodicity follows the steps used to show the weak ergodicityof SA [1] and uses the strong ergodicity conclusions [2,3].
Let G be the generationprobability that satisfies (23).(a) Let G =miny ∈Nd (y),y∈SG(y, y ). For all y ∈ S and y ∈ N d (y), we have:PTk (y, y ) = G(y, y )ATk (y, y ) ≥ G e−L /Tk .(55)The above is true because, according to the definition of L in the theorem, (Ld (y ) − TLd (y))+ ≤ L for y = (y , α)T and (Ld (y) − Ld (y ))+ ≤ L for y =T (y, α ) .(b) Let Ŝ be the set of points that are local maximum of Ld (y, α) with respect toy for any given α. Then for every y = (y, α)T ∈ S − Ŝ, there always exists some y =(y , α)T ∈ N d (y) such that Ld (y ) ≥ Ld (y).
Let δ = min y∈S−Ŝ, {Ld (y ) − Ld (y)} ≥ 0.y ∈Nd (y)We have:PTk (y, y) = 1 −−δG(y, y )ATk (y, y ) ≥ 1 − G(y, y )e Tk −y ∈Nd (y)−δ−δ= G(y, y ) 1 − e Tk ≥ G 1 − e Tk .y ∈Nd (y),y =yG(y, y )J Glob OptimBecause Tk is a decreasing sequence, it is always possible to find k0 > 0 such that forall k ≥ k0 , 1 − e−δ/Tk ≥ e−L /Tk . Thus, for y ∈ S − Ŝ, we get:PTk (y, y) ≥ G e−LTk.(56)(c) Based on (55) and (56), for all y, y ∈ S and k ≥ k0 , the NT -step transitionprobability from y = y0 to y = yNT satisfies the following:NT−LN.PTkT (y, y ) ≥ PTk (y0 , y1 )PTk (y1 , y2 ) .
. . PTk (yNT −1 , yNT ) ≥ G e TkLet τ1(P) be the coefficient of ergodicity of matrix P. Then the lower bound ofN1 − τ1 PTkT is:NNTNT 1 − τ1 PTkT = minminP(y,y),P(y,y)TTkky,y ∈Sy ∈SNT −L−L NTNTNT NTTkTkPmin(y,y),P(y,y)≥e=e.≥ minGTTGkky,y ∈S y ∈SHence, the following holds when using any cooling schedule that satisfies (41):∞∞ $∞% −L NT1NNN1 − τ1 PTkT ≥= ∞.(57)GT e Tk ≥ GTk+1k=0k=k0k=k0Therefore, the Markov chain is weakly ergodic.(d) In addition, because transition probability PTk (y, y ) for all y, y ∈ S belongsto the exponential rationals in a closed class of asymptotically monotone functions(CAM) [2,3], the Markov chain is strongly ergodic.Appendix B: proof of Theorem 4Our strategy in proving the theorem is through a sequence of homogeneous Markovchairs, using ergodic sequences under fixed temperatures.
Alternatively, the proof canbe accomplished based on the approach of Mitra et al. [23] by using inhomogeneousMarkov chains.The proof consists of two parts. First, we show that the virtual energy decreases withincreasing α for a given y (any horizontal direction in Fig. 9); that is, W(y ) ≤ W(y)where y = (y, α)T and y = (y, α )T for any α > α. Second, we show that W is minimized at y∗ when α is at the maximum penalty value α max (the vertical direction alongmaxthe α = α max column in Fig. 9); that is, W(y∗ ) < W(yα ) where y∗ = (y∗ , α max )T ,maxyα = (y, α max )T , y∗ ∈ Yopt , and y ∈ Y − Yopt . These two parts allow us to concludethat W(y) is minimized at y∗ .In the first part, we compare W(y) and W(y ) when y is fixed. The comparisondepends on whether h(y) = 0 is satisfied or not.(a1) Consider the case in which h(y) = 0 and y ∈ N d (y).
This means that at leastone hi (y) = 0 and that there exists an edge y → y . Let MT(y) be a minimum-costspanning tree rooted at y (Fig. 10a). We construct a spanning tree T(y ) rooted at y(Fig. 10b) as follows: (1) add an edge y → y to MT(y), and (2) delete an edge y → y ,where y is on the path from y to y in MT(y). Note that y → y always exists. ThenV(y ), the cost of spanning tree T(y ), satisfies:J Glob OptimV(y ) = W(y) + v(y, y ) − v(y , y ) = W(y) − v(y , y ) ≤ W(y).The equation is true because, according to (42), v(y, y ) = [Ld (y) − Ld (y )]+ = [(α −α )T |h(y)|]+ = 0 and v(y , y ) ≥ 0.
In addition, W(y ) ≤ V(y ) due to the fact thatW(y ) is the cost of a minimum-cost spanning tree. Therefore, we have W(y ) ≤V(y ) ≤ W(y).(a2) Consider the case in which h(y) = 0. This means that there is no edge from yto y because h(y) = 0 is satisfied and α is not allowed to change according to (22).The minimum-cost spanning tree rooted at y must have has a directed path fromy to (ŷ, α )T: P1 = y → (y1 , α )T → · · · , → (yj−1 , α )T → (ŷ, α )T ; and a directedpath from (ŷ, α)T to y: P2 = (ŷ, α)T → (ȳl−1 , α)T → · · · , → (ȳ1 , α)T → y (Fig. 11a).Here, (ŷ, α)T and (ŷ, α )T are points shown as shaded nodes in Fig.
11 with h(ŷ) = 0(meaning that at least one constraint is not satisfied at ŷ), h(yi ) = 0 (i = 1, 2, . . . , j − 1),and h(ȳi ) = 0 (i = 1, 2, . . . , l − 1). Such j and l always exist due to the ergodicity of theMarkov chain proved in Theorem 3, and path P1 may differ from path P2 . Note thatthere is no relationship between f (y) and f (ŷ) and that the spanning tree at yi andȳi can only move along the y subspace because the constraints at these points are allsatisfied.In contrast, the minimum-cost spanning tree at y must have a directed path fromy to (ŷ, α)T : P1 = y → (y1 , α)T → · · · → (yj−1 , α)T → (ŷ, α)T ; and another from(ŷ, α )T to y : P2 = (ŷ, α )T → (ȳl−1 , α )T → · · · → (ȳ1 , α )T → y (Fig.