Chapter 8 Theory and Practice of simulated Annealing (1121211), страница 6
Текст из файла (страница 6)
Their experiments suggest thatoptimal cooling schedules are not monotone decreasing in temperature. They also showthat for the test problem (a white noise surface), geometric and linear cooling schedulesperform better than inverse logarithmic cooling schedules, when sufficient computingeffort is allowed. Moreover, their experiments do not show measurable performancedifferences between linear and geometric cooling schedules. They also observe thatgeometric cooling schedules are not greatly affected by excessively high initial temperatures.
The results presented suggest that the even the most robust adaptive coolingschedule “produces annealing trajectories which are never in equilibrium” (Strenskiand Kirkpatrick, 1991). However, they also conclude that the transition acceptance rateis not sensitive to the degree of closeness to the equilibrium distribution.Christoph and Hoffmann (1993) also attempts to characterize optimal coolingschedules. They derive a relationship between a finite sequence of optimal temperature values (i.e., outer loops) and the number of iterations (i.e., inner loops) at eachrespective temperature for several small test problems to reach optimality (i.e., theminimal mean final energy).
They find that this scaling behavior is of the formwhere a and b are scaling coefficients,is referred to as the temperature,is the number of inner loop iterations at temperatureand m is the number of outerloops at which the temperatureis reduced. The proposed approach is to solve forthe coefficients a and b based on known temperature and iteration parameter valuesfor an optimal schedule based on several replications of the algorithm usingiterations for each replication, and then use (32) to interpolate the optimal coolingschedule for intermediate iterations. They however do not make any suggestions onhow to efficiently solve for the necessary optimal cooling schedules for a (typicallylarge) problem instance.Romeo and Sangiovanni-Vincentelli (1991) presents a theoretical framework forevaluating the performance of the simulated annealing algorithm. They discuss annealing schedules in terms of initial temperature T, the number of inner loops for each valueof T, the rate that the temperature T decreases (i.e., cooling schedule) and the criteriafor stopping the algorithm.
They conclude that the theoretical results obtained thus farThe Theory and Practice of Simulated Annealing305have not been able to explain why simulated annealing is so successful even when adiverse collection of static cooling schedule heuristics is used. Many heuristic methodsare available in the literature to find optimal cooling schedules, but the effectivenessof these schedules can only be compared through experimentation. They conjecturethat the neighborhood and the corresponding topology of the objective function areresponsible for the behavior of the algorithm.Conn and Fielding (1999) conducts a detailed analysis of various cooling schedulesand how they affect the performance of simulated annealing. Convergent simulatedannealing algorithms are often too slow in practice, whereas a number of nonconvergent algorithms may be preferred for good finite-time performance. They analyzevarious cooling schedules and present cases where repeated independent runs using anon-convergent cooling schedule provide acceptable results in practice.
They provide examples of when it is both practically and theoretically justified to use a veryhigh, fixed temperature, or even fast cooling schedules which have a small probabilityof reaching global minima and apply these cooling schedules to traveling salesmanproblems of various sizes. Fielding (2000) computationally studies fixed temperaturecooling schedules for the traveling salesman problem, the quadratic assignment problem, and the graph partitioning problem, and demonstrates that a fixed temperaturecooling schedule can yield superior results in locating optimal and near-optimal solutions.
Orosz and Jacobson (2002a,b) present finite-time performance measures forsimulated annealing with fixed temperature cooling schedules. They illustrate theirmeasures using randomly generated instances of the traveling salesman problem.Another approach to increasing the speed of simulated annealing is to implementa two-staged simulated annealing algorithm.
In two-staged simulated annealing algorithms, a fast heuristic is used to replace simulated annealing at higher temperatures,with a traditional simulated annealing algorithm implemented at lower temperatures toimprove on the fast heuristic solution. In addition to implementing an intelligent cooling schedule, finding the initial temperature to initialize the traditional simulatedannealing algorithm is important to the success of the two-staged algorithm. Varanelliand Cohoon (1999) proposes a method for determining an initial temperature fortwo-staged simulated annealing algorithms using traditional cooling schedules. Theynote that if is too low at the beginning of the traditional simulated annealing phase,the algorithm can get trapped in an inferior solution, while if the initial temperatureis too high, the algorithm can waste too many iterations (hence computing time) byaccepting too many hill-climbing moves.4.2Choice of NeighborhoodA key problem-specific choice concerns the neighborhood function definition.
The efficiency of simulated annealing is highly influenced by the neighborhood function used(Moscato, 1993). The choice of neighborhood serves to enforce a topology—Eglese(1990) reports that “a neighborhood structure which imposes a ‘smooth’ topologywhere the local minima are shallow is preferred to a ‘bumpy’ topology where thereare many deep local minima.” Solla et al. (1986) and Fleischer and Jacobson (1999)report similar conclusions. This also supports the result in Hajek (1988) that shows thatasymptotic convergence to the set of global optima depends on the depth of the localminima.306D.
Henderson et al.Another factor to consider when choosing neighborhood functions is the neighborhood size. No theoretical results are available, other than the necessity of reachability(in a finite number of steps) from any solution to any other solution. Cheh et al. (1991)reports that small neighborhoods are best, while Ogbu and Smith (1990) providesevidence that larger neighborhoods result in better simulated annealing performance.Goldstein and Waterman (1988) conjectures that if the neighborhood size is small compared to the total solution space cardinality, then the Markov chain cannot move aroundthe solution space fast enough to find the minimum in a reasonable time.
On the otherhand, a very large neighborhood has the algorithm merely sampling randomly from alarge portion of the solution space, and thus, is unable to focus on specific areas of thesolution space. It is reasonable to believe that neighborhood size is heavily problemspecific. For example, problems where the smoothness of its solution space topologyis moderately insensitive to different neighborhood definitions may benefit from largerneighborhood sizes.Fleischer (1993) and Fleischer and Jacobson (1999) use concepts from informationtheory to show that the neighborhood structure can affect the information rate or totaluncertainty associated with simulated annealing. Fleischer (1993) shows that simulatedannealing tends to perform better as the entropy level of the associated Markov chainincreases, and thus conjectures that an entropy measure could be useful for predictingwhen simulated annealing would perform well on a given problem.
However, efficientways of estimating the entropy are needed to make this a practical tool.Another issue on neighborhood function definition addresses the solution spaceitself. Chardaire et al. (1995) proposes a method for addressing 0–1 optimizationproblems, in which the solution space is progressively reduced by fixing the value ofstrongly persistent variables (which have the same value in all optimal solutions).
Theyisolate the persistent variables during simulated annealing’s execution by periodicallyestimating the expectation of the random variable (a vector of binary elements) thatdescribes the current solution, and fixing the value of those elements in the randomvariable that meet threshold criteria.4.3 Domains—Types of Problems with ExamplesSimulated annealing has developed into a popular tool for optimization in the lastdecade.
It has been used to address numerous discrete optimization problems as wellas continuous variable problems. Several application and computational survey articleshave been published on simulated annealing. Johnson et al. (1989, 1991) present a seriesof articles on simulated annealing applied to certain well-studied discrete optimizationproblems. The first in the series of articles uses the graph partitioning problem toillustrate simulated annealing and highlight the effectiveness of several modifications tothe basic simulated annealing algorithm. The second in the series focuses on applyinglessons learned from the first article to the graph coloring and number partitioningproblems.