Chapter 8 Theory and Practice of simulated Annealing (1121211), страница 4
Текст из файла (страница 4)
Connors and Kumar (1989) substantiate the necessary and sufficient conditionsin Hajek (1988) using the orders of recurrence,Connors and Kumar (1989) shows that these orders of recurrence quantify the asymptotic behavior of each solution’s probability in the solution distribution. The key result isthat the simulated annealing inhomogeneous Markov chain converges in a Cesaro senseto the set of solutions having the largest recurrence orders. Borkar (1992) improves thisconvergence result by using a convergence/oscillation dichotomy result for martingales.Tsitsiklis (1989) uses bounds and estimates for singularly perturbed, approximately stationary Markov chains to develop a convergence theory that subsumes the condition ofweak reversibility in Hajek (1988). (Note that Tsitsiklis (1989) definesasthe set of all local minima (in terms of objective function value) of depth h + 1 or more.Therefore is the smallest h such that all local (but not global) minima have depth hor less.
Tsitsiklis (1989) conjectures that without some form of reversibility, there doesnot exist any h such that the global optima are contained in the set of local optima.)Note that Chiang and Chow (1988, 1994), Borkar (1992), Connors and Kumar (1989),Hajek (1988), and Mitra et al. (1986) all require (either explicitly or implicitly) themultiplicative condition (9) for their proofs of convergence.Anily and Federgruen (1987) uses perturbation analysis techniques (e.g., see Meyer,1980) to prove convergence of a particular stochastic hill-climbing algorithm, by298D. Henderson et al.bounding the deviations of the sequence of stationary distributions of the particularhill-climbing algorithm against the sequence of known stationary distributions corresponding to a simulated annealing algorithm.
In general, this convergence proofapproach is only useful for a restrictive class of simulated annealing algorithms, sincethe transition matrix condition number grows exponentially as the number of iterationsk becomes large.Anily and Federgruen (1987) also presents a proof of convergence for simulated annealing with general acceptance probability functions. Using inhomogeneousMarkov chain theory, it proves convergence under the following necessary and sufficientconditions:1. The acceptance probability function must, for any iteration k, allow any hillclimbing transition to occur with positive probability.2.
The acceptance probability function must be bounded and asymptoticallymonotone, with limit zero for hill-climbing solution transitions.3. In the limit, the stationary probability distribution must have zero probabilitymass for every non-globally optimal solution.4. The probability of escaping from any locally (but not globally) optimal solutionmust not approach zero too quickly.Anily and Federgruen (1987) uses condition (3) to relax the acceptance functionmultiplicative condition (9). However, in practice, condition (3) would be very difficult to check without assuming that (9) holds.
Condition (4) provides the necessarycondition for the rate that the probability of hill-climbing transitions approaches zero.Condition (4) is expressed quantitatively as follows: let be defined by (2), and definethe minimum one-step acceptance probability asDefine the set of local optimaalland letsuch thatimplies thatFinally, let any solutionbe reachable from any solutionor less.
Then if (non-globally) locally optimal solutions exist,forin q transitionsand conditions (1), (2), and (3) hold, then the simulated annealing algorithm willasymptotically converge to the set of global optima with probability one. However, if(non-globally) locally optimal solutions exist andThe Theory and Practice of Simulated Annealing299then the probability of each solution is asymptotically dependent upon the initial solution. Therefore, the simulated annealing algorithm will not always converge to theset of global optima with probability one. Johnson and Jacobson (2002b) relaxes thesufficient conditions found in Anily and Federgruen (1987) by using a path argumentbetween global optima and local (but not global) optima.Yao and Li (1991) and Yao (1995) also discuss simulated annealing algorithms withgeneral acceptance probabilities, though their primary contribution is with respect togeneral neighborhood generation distributions. Schuur (1997) provides a description ofacceptance functions ensuring the convergence of the associated simulated annealingalgorithm to the set of global optima.The inhomogeneous proof concept is stronger than the homogeneous approachin that it provides necessary conditions for the rate of convergence, but its asymptotic nature suggests that practical implementation may not be feasible.
Romeo andSangiovanni-Vincentelli (1991) notes “there is no reason to believe that truncating thelogarithmic temperature sequence would yield a good configuration, since the tail ofthe sequence is the essential ingredient in the proof.” In addition, the logarithmic cooling schedule dictates a very slow rate of convergence. Therefore, most recent work hasfocused on methods of improving simulated annealing’s finite-time behavior and modifying or blending the algorithm with other search methods such as genetic algorithms(Liepins and Hilliard, 1989), tabu search (Glover, 1994), or both (Fox, 1993).3 RELATIONSHIP TO OTHER LOCALSEARCH ALGORITHMSThe hill-climbing strategy inherent in simulated annealing has lead to the formulationof other such algorithms (e.g., threshold accepting, the noising method).
Moreover,though different in how they traverse the solution space, both tabu search and geneticalgorithm share with simulated annealing the objective of using local information tofind global optima over solution spaces contaminated with multiple local optima.3.1Threshold AcceptingQuestioning the very need for a randomized acceptance function, Dueck and Scheuer(1990), and independently, Moscato and Fontanari (1990) propose the thresholdaccepting algorithm, where the acceptance function is defined aswithdefined as the threshold value at iterationis typically set to be adeterministic, non-increasing step function in k.
Dueck and Scheuer (1990) reportscomputational results that suggest dramatic improvements in traveling salesman problem solution quality and algorithm run-time over basic simulated annealing. Moscatoand Fontanari (1990) reports more conservative results—they conjecture that simulatedannealing’s probabilistic acceptance function does not play a major role in the searchfor near-optimal solutions.Althofer and Koschnick (1991) develops a convergence theory for threshold accepting based on the concept that simulated annealing belongs to the convex hull of threshold300 D.
Henderson et al.accepting. The idea presented in Althofer and Koschnick (1991) is that (for a finitethreshold sequence) there can exist only finitely many threshold accepting transitionmatrices; but simulated annealing can have infinitely many transition matrices becauseof the real-valued nature of the temperature at each iteration. However, every simulated annealing transition matrix for a given problem can be represented as a convexcombination of the finitely many threshold accepting transition matrices.
Althofer andKoschnick (1991) is unable to prove that threshold accepting will asymptotically reacha global minimum, but it does prove the existence of threshold schedules that provide convergence to within anof the optimal solutions. Jacobson andYücesan (2002a) proves that if the threshold value approaches zero as k approachesinfinity, then the algorithm does not converge in probability to the set of globally optimalsolutions.Hu et al. (1995) modifies threshold accepting to include a non-monotonic,self-tuning threshold schedule in the hope of improving the algorithm’s finite-timeperformance. Hu et al.
(1995) allows the thresholdto change dynamically (eitherup or down), based on the perceived likelihood of being near a local minimum. Thesechanges are accomplished using a principle they call dwindling expectation—whenthe algorithm fails to move to neighboring solutions, the thresholdis graduallyincreased, in the hope of eventually escaping a local optimum. Conversely, when solution transitions are successful, the threshold is reduced, in order to explore local optima.The experimental results based on two traveling salesman problems presented in Huet al.
(1995) showed that the proposed algorithm outperformed previous hill-climbingmethods in terms of finding good solutions earlier in the optimization process.Threshold accepting’s advantages over simulated annealing lie in its ease of implementation and its generally faster execution time, due to the reduced computationaleffort in avoiding acceptance probability computations and generation of random numbers (Moscato and Fontanari, 1990).
However, compared to simulated annealing,relatively few threshold accepting applications are reported in the literature (Lin et al.,1995; Scheermesser and Bryngdahl, 1995; Nissen and Paul, 1995).3.2 Noising MethodCharon and Hudry (1993) advocates a simple descent algorithm called the noisingmethod.
The algorithm first perturbs the solution space by adding random noise to theproblem’s objective function values. The noise is gradually reduced to zero during thealgorithm’s execution, allowing the original problem structure to reappear. Charon andHudry (1993) provides computational results, but does not prove that the algorithm willasymptotically converge to the set of globally optimal solutions. Charon and Hudry(2001) shows how the noising method is a generalization of simulated annealing andthreshold accepting.Storer et al. (1992) proposes an optimization strategy for sequencing problems, byintegrating fast, problem-specific heuristics with local search. Its key contribution isto base the definition of the search neighborhood on a heuristic problem pair (h,p),where h is a fast, known, problem-specific heuristic and p represents the problemdata.
By perturbing the heuristic, the problem, or both, a neighborhood of solutions isdeveloped. This neighborhood then forms the basis for local search. The hope is thatthe perturbations will cluster good solutions close together, thus making it easier toperform local search.The Theory and Practice of Simulated Annealing 3013.3Tabu SearchTabu search (Glover, 1994) is a general framework for a variety of iterative localsearch strategies for discrete optimization. Tabu search uses the concept of memoryby controlling the algorithm’s execution via a dynamic list of forbidden moves. Thisallows the tabu search algorithm to intensify or diversify its search of a given problem’ssolution space in an effort to avoid entrapment in local optima.