Learning Rate Based Branching Heuristic for SAT Solvers (Презентации лекций)
Описание файла
Файл "Learning Rate Based Branching Heuristic for SAT Solvers" внутри архива находится в следующих папках: Презентации лекций, Статьи к лекциям. PDF-файл из архива "Презентации лекций", который расположен в категории "". Всё это находится в предмете "boolean sat-smt solvers for software engineering" из 11 семестр (3 семестр магистратуры), которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст из PDF
Learning Rate Based Branching Heuristic forSAT SolversJia Hui Liang, Vijay Ganesh, Pascal Poupart, and Krzysztof CzarneckiUniversity of Waterloo, Waterloo, CanadaAbstract. In this paper, we propose a framework for viewing solverbranching heuristics as optimization algorithms where the objective isto maximize the learning rate, defined as the propensity for variablesto generate learnt clauses.
By viewing online variable selection in SATsolvers as an optimization problem, we can leverage a wide variety ofoptimization algorithms, especially from machine learning, to design effective branching heuristics. In particular, we model the variable selectionoptimization problem as an online multi-armed bandit, a special-case ofreinforcement learning, to learn branching variables such that the learning rate of the solver is maximized. We develop a branching heuristic thatwe call learning rate branching or LRB, based on a well-known multiarmed bandit algorithm called exponential recency weighted average andimplement it as part of MiniSat and CryptoMiniSat. We upgrade theLRB technique with two additional novel ideas to improve the learningrate by accounting for reason side rate and exploiting locality. The resulting LRB branching heuristic is shown to be faster than the VSIDS andconflict history-based (CHB) branching heuristics on 1975 applicationand hard combinatorial instances from 2009 to 2014 SAT Competitions.We also show that CryptoMiniSat with LRB solves more instances thanthe one with VSIDS.
To the best of our knowledge, the VSIDS branchingheuristic and its variants have been the dominant class of such heuristicsfor SAT solvers since 2001, until we introduced a novel class of machinelearning based branching heuristics.1IntroductionModern Boolean SAT solvers are a critical component of many innovative techniques in security, software engineering, hardware verification, and AI such assolver-based automated testing with symbolic execution [9], bounded modelchecking [11] for software and hardware verification, and planning in AI [27]respectively.
Conflict-driven clause-learning (CDCL) SAT solvers [29, 23, 24, 4,12, 6] in particular have made these techniques feasible as a consequence oftheir surprising efficacy at solving large classes of real-world Boolean formulas.The development of various heuristics, notably the Variable State IndependentDecaying Sum (VSIDS) [24] branching heuristic (and its variants) and conflictanalysis techniques [23], have dramatically pushed the limits of CDCL solverperformance. The VSIDS heuristic is used in the most competitive CDCL SATsolvers such as Glucose [4], Lingeling [6], and CryptoMiniSat [29].
Since its introduction in 2001, VSIDS has remained one of the most effective and dominant branching heuristic despite intensive efforts by many researchers to replaceit [16, 7, 28, 15]. In 2015, we provided the first branching heuristic, we call conflict history-based (CHB) branching heuristic, that was much more effective thanVSIDS. The branching heuristic introduced in this paper, which we refer to aslearning rate branching (LRB), significantly outperforms CHB and VSIDS.In this paper, we introduce a general principle for designing branching heuristics wherein online variable selection in SAT solvers is viewed as an optimizationproblem. The objective to be maximized is called the learning rate (LR), a numerical characterization of a variable’s propensity to generate learnt clauses.The goal of the branching heuristic, given this perspective, is to select branchingvariables that will maximize the cumulative LR during the run of the solver.
Intuitively, achieving a perfect LR of 1 implies the assigned variable is responsiblefor every learnt clause generated during its lifetime on the assignment trail.We put this principle into practice in this paper. Although there are manyalgorithms for solving optimization problems, we show that multi-armed banditlearning (MAB) [31], a special-case of reinforcement learning, is particularly effective in our context of selecting branching variables. In MAB, an agent selectsfrom a set of actions to receive a reward. The goal of the agent is to maximize the cumulative rewards received through the selection of actions. As wewill describe in more details later, we abstract the branching heuristic as theagent, the available branching variables are abstracted as the actions, and LR isdefined to be the reward.
Abstracting online variable selection as a MAB problem provides the bridge to apply MAB algorithms from the literature directlyas branching heuristics. In our experiments, we show that the MAB algorithmcalled exponential recency weighted average (ERWA) [31] in our abstraction surpasses the VSIDS and CHB branching heuristics at solving the benchmarks fromthe 4 most recent SAT Competitions in an apple-to-apple comparison.
Additionally, we provide two extensions to ERWA that increases its ability to maximizeLR and its performance as a branching heuristic. The final branching heuristic, called learning rate branching (LRB), is shown to dramatically outperformCryptoMiniSat [29] with VSIDS.1.1ContributionsContribution I: We define a principle for designing branching heuristics, thatis, a branching heuristic should maximize the learning rate (LR). We showthat this principle yields highly competitive branching heuristics in practice.Contribution II: We show how to abstract online variable selection in themulti-armed bandit (MAB) framework.
This abstraction provides an interface for applying MAB algorithms directly as branching heuristics. Previously, we developed the conflict history-based (CHB) branching heuristic [19],also inspired by MAB. The key difference between this paper and CHB isthat in the case of CHB the rewards are known a priori, and there is nometric being optimized. Whereas in this work, the learning rate is beingmaximized and is unknown a priori, which requires a bona fide machinelearning algorithm to optimize under uncertainty.Contribution III: We use the MAB abstraction to develop a new branchingheuristic called learning rate branching (LRB). The heuristic is built ona well-known MAB algorithm called exponential recency weighted average(ERWA). Given our domain knowledge of SAT solving, we extend ERWAto take advantage of reason side rate and locality [20] to further maximizethe learning rate objective.
We show in comprehensive apple-to-apple experiments that it outperforms the current state-of-the-art VSIDS [24] andCHB [19] branching heuristics on 1975 instances from four recent SAT Competition benchmarks from 2009 to 2014 on the application and hard combinatorial categories.
Additionally, we show that a modified version of CryptoMiniSat with LRB outperforms Glucose, and is very close to matchingLingeling over the same set of 1975 instances.22.1PreliminariesSimple Average and Exponential Moving AverageGiven a time series of Pnumbers hr1 , r1 , r2 , ..., rn i, the simple average is computednas avg(hr1 , ..., rn i) = i=1 n1 ri . Note that every ri is given the same coefficient(also called weight) of n1 .In a time series however, recent data is more pertinent to the current situation than old data. For example, consider a time series of the price of a stock. Theprice of the stock from yesterday is more correlated with today’s price than theprice of the stock from a year ago. The exponential moving average (EMA) [8] follows this intuition by giving the recent data higher weights than past data whenaveraging.
Incidentally, the same intuition is built into the multiplicativedecay inPnVSIDS [5, 20]. The EMA is computed as emaα (hr1 , ..., rn i) = i=1 α(1−α)n−i riwhere 0 < α < 1 is called the step-size parameter. α controls the relativeweights between recent and past data. EMA can be computed incrementallyas emaα (hr1 , ..., rn i) = (1 − α) · emaα (hr1 , ..., rn−1 i) + αrn , and we define thebase case emaα (hi) = 0.2.2Multi-Armed Bandit (MAB)We will explain the MAB problem [31] through a classical analogy. Consider agambler in a casino with n slot machines, where the objective of the gambleris to maximize payouts received from these machines.