Chapter 8 Theory and Practice of simulated Annealing (1121211), страница 3
Текст из файла (страница 3)
The probability density in phase space of the point representing S isproportional towhere b is the Boltzmann constant, and T is the absolute temperature of the surroundings. Therefore the proportion of time that the system spends in solution Sis proportional to (11) (Hammersley and Handscomb, 1964), hence the equilibriumprobability density for allisThe expectation of any valid solution function f(S) is thusUnfortunately, for many solution functions, (13) cannot be evaluated analytically.Hammersley and Handscomb (1964) notes that one could theoretically use naive MonteCarlo techniques to estimate the value of the two integrals in (11). However, this often294D.
Henderson et al.fails in practice since the exponential factor means that a significant portion of the integrals is concentrated in a very small region of the solution spaceThis problem canbe overcome using importance sampling (see Bratley et al., 1987; Chapter 2), by generating solutions with probability density (12). This approach would also seem to fail,because of the integral in the denominator of (12). However, Metropolis et al. (1953)solves this problem by first discretizing the solution space, such that the integrals in (12)and (13) are replaced by summations over the set of discrete solutionsand thenby constructing an irreducible, aperiodic Markov chain with transition probabilitiessuch thatwhereNote that to compute the equilibrium distributionthe denominator of (13) (anormalizing constant) does not need to be calculated.
Instead, the ratiosneed only be computed and a transition matrix P defined that satisfies (14). Hammersley and Handscomb (1964) show that Metropolis et al. (1953) accomplishes this bydefining P as the product of symmetric solution generation probabilitiesandthe equilibrium ratioswhereThe use of stationary probability ratios to define the solution acceptance probabilities, combined with symmetric solution generation probabilities, enable Metropoliset al.
(1953) to use the reversibility condition in (8) to show that (16) and (17)satisfy (14).Homogeneous proofs of convergence for simulated annealing become more difficult to establish when the reversibility condition is not satisfied. Note that the existenceof a unique stationary distribution for each outer loop iteration k is easily shown byspecifying that each transition matrixbe irreducible and aperiodic. On the otherhand, it becomes very difficult to derive an explicit closed-form expression for eachstationary distributionthat remains analytically tractable as the problem’s solutionspace becomes large.
One can no longer use (8) to describe each stationary distribution,because in general, the multiplicative condition is not met. Instead, one must directlysolve the system of equations formed with (6) and (7). For example, Davis (1991)The Theory and Practice of Simulated Annealingattempts to obtain a closed-form expression for(6) and (7) as295by using Cramer’s rule and rewritingandrespectively, where boldface type indicates vector/matrix notation, I is the identitymatrix, andis a column vector of ones.
Note that thetransitionmatrixassociated with (18) is of rank(1974). Therefore, bydeleting any one equation from (18), and substituting (19), the result is the set oflinearly independent equationswhere the square matrixis obtained by substituting the ith column of matrixwith a column vector of ones. The vector is a row vector of zeroes, exceptfor a one in the i th position. Sinceis of full rank, then its determinant (writtenasis non-zero. Defineto be the same matrix asexceptthat the elements of therow ofare replaced by the vectorTherefore,for all iterations k,Davis (1991) attempts to solve (21) for eachvia a multivariate Taylor seriesexpansion of each determinant, but is not able to derive a closed-form analyticalexpression.Overall, the difficulty of explicitly expressing the stationary distributions for largesolution spaces, combined with bounding the transition matrix condition number forlarge k, suggests that it is very difficult to prove asymptotic convergence of the simulatedannealing algorithm by treating (5) and (6) as a linear algebra problem.Lundy and Mees (1986) notes that for each fixed outer loop iteration k, convergenceto the solution equilibrium probability distribution vector(in terms of the Euclideandistance betweenis geometric since the solution space isfinite, and the convergence factor is given by the second largest eigenvalue of the transition matrixThis result is based on a standard convergence theorem for irreducible,aperiodic homogeneous Markov chains (see Çinlar, 1974).
Note that a large solutionspace precludes practical calculation of this eigenvalue. Lundy and Mees (1986) conjectures that when the temperature is near zero, the second largest eigenvalue will beclose to one for problems with local optima, and thus convergence to the equilibriumdistribution will be very slow (recall that the dominant eigenvalue foris one, withalgebraic multiplicity one (Isaacson and Madsen, 1976). Lundy and Mees (1986) usesits conjecture to justify why simulated annealing should be initiated with a relativelyhigh temperature. For an overview of current methods for assessing non-asymptoticrates of convergence for general homogeneous Markov chains, see Rosenthal (1995).The assumption of stationarity for each outer loop iteration k limits practical application of homogeneous Markov chain theory—Romeo and Sangiovanni-Vincentelli(1991) shows that if equilibrium (for a Markov chain that satisfies the reversibilitycondition) is reached in a finite number of steps, then it is achieved in one step.
Thus,296 D. Henderson et al.Romeo and Sangiovanni-Vincentelli (1991) conjectures that there is essentially nohope for the most-used versions of simulated annealing to reach equilibrium in a finitenumber of iterations.The second convergence approach for simulated annealing is based on inhomogeneous Markov chain theory (Anily and Federgruen, 1987; Gidas, 1985; Mitra et al.,1986). In this approach, the Markov chain need not reach a stationary distribution (e.g.,the simulated annealing inner loop need not be infinitely long) for each outer loop k. Onthe other hand, an infinite sequence of (outer loop) iterations k must still be examined,with the condition that the temperature parameter cool sufficiently slowly.
The proofgiven by Mitra et al. (1986) is based on satisfying the inhomogeneous Markov chainconditions of weak and strong ergodicity (Isaacson and Madsen, 1976; Seneta, 1981).The proof requires four conditions:1. The inhomogeneous simulated annealing Markov chain must be weakly ergodic(i.e., dependence on the initial solution vanishes in the limit).2. An eigenvectorwith eigenvalue one must exist such that (6) and (7) hold forevery iteration k.3. The Markov chain must be strongly ergodic (i.e., the Markov chain must beweakly ergodic and the sequence of eigenvectorsmust converge to a limitingform), i.e.,4.
The sequence of eigenvectors must converge to a form where all probability massis concentrated on the set of globally optimal solutionsTherefore,whereis the equilibrium distribution with only global optima having probabilities greater than zero. (Note that weak and strong ergodicity are equivalentfor homogeneous Markov chain theory.)Mitra et al. (1986) satisfies condition (1) (weak ergodicity) by first forming a lowerbound on the probability of reaching any solution from any local minimum, and thenshowing that this bound does not approach zero too quickly. For example, they definethe lower bound for the simulated annealing transition probabilities in (5) aswhere m is the number of transitions needed to reach any solution from any solutionof non-maximal objective function value,is a lower bound on the one-stepsolution generation probabilities, andis the maximum one-step increase in objectivefunction value between any two solutions.
Mitra et al. (1986) shows that the Markovchain is weakly ergodic ifThe Theory and Practice of Simulated Annealing297Therefore, weak ergodicity is obtained if the temperatureis reduced sufficientlyslowly to zero such that (25) is satisfied. In general, the (infinite) sequence oftemperaturesmust satisfywhereis a problem-dependent constant, and k is the number ofiterations.
Mitra et al. (1986) shows that conditions (2), (3), and (4) are satisfied byusing the homogeneous Markov chain theory developed for the transition probabilities(5), and assuming that the solution generation function is symmetric.Romeo and Sangiovanni-Vincentelli (1991) notes that while the logarithmic coolingschedule in (26) is a sufficient condition for the convergence, there are only a fewvalues for which make the logarithmic rule also necessary. Furthermore, there existsa unique choice for which makes the logarithmic rule both necessary and sufficientfor the convergence of simulated annealing to the set of global optima.
Hajek (1988)was the first to show that the logarithmic cooling schedule (26) is both necessary andsufficient, by developing a tight lower bound fornamely the depth of the deepestlocal minimum which is not a global minimum, under a weak reversibility assumption.(Note that Hajek requires the depth of global optima to be infinitely deep.) Hajek definesa Markov chain to be weakly reversible if, for any pair of solutionsand forany non-negative real number h, is reachable at height h fromif and only ifis reachable at height h from Note that Hajek (1988) does not attempt to satisfy theconditions of weak and strong ergodicity, but instead uses a more general probabilisticapproach to develop a lower bound on the probability of escaping local, but not globaloptima.