Learning Rate Based Branching Heuristic for SAT Solvers (1185836), страница 2

Файл №1185836 Learning Rate Based Branching Heuristic for SAT Solvers (Презентации лекций) 2 страницаLearning Rate Based Branching Heuristic for SAT Solvers (1185836) страница 22020-08-252020-08-25СтудИзба

Презентации лекций

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 2)

Each slot machine hasa probability distribution describing its payouts, associating a probability withevery possible value of payout. This distribution is hidden from the gambler.At any given point in time, the gambler can play one of the n slot machines,and hence has n actions to choose from. The gambler picks an action, playsthe chosen slot machine, and receives a reward in terms of monetary payout bysampling that slot machine’s payout probability distribution. The MAB problemis to decide which actions to take that will maximize the cumulative payouts.If the probability distributions of the slot machines were revealed, then thegambler would simply play the slot machine whose payout distribution hasthe highest mean.

This will maximize expected payouts for the gambler. Sincethe probability distribution is hidden, a simple MAB algorithm called sampleaverage [31] estimates the true mean of each distribution by averaging the samples of observed payouts. For example, suppose there are 2 slot machines. Thegambler plays the first and the second slot machine 4 times each, receiving the4 payouts h1, 2, 3, 4i and h5, 4, 3, 2i respectively. Then the algorithm will estimate the mean of the first and second slot machines’ payout distributions asavg(h1, 2, 3, 4i) = 2.5 and avg(h5, 4, 3, 2i) = 3.5 respectively. Since the secondslot machine has a higher estimated mean, the choice is to play the second slotmachine.

This choice is called greedy, that is, it chose the action it estimates tobe the best given extant observations. On the other hand, choosing a non-greedyaction is called exploration [31].The sample-average algorithm works well if the hidden probability distribution is fixed. If the distributions change over time, then the problem is callednonstationary, and requires different algorithms. For example, suppose a slot machine gives smaller and smaller payouts the more it has been played.

The olderthe observed payout, the bigger the difference between the current probabilitydistribution and the distribution from which the payout was sampled. Hence,older observed payouts should have smaller weights. This gives rise to the exponential recency weighted average [31] (ERWA) algorithm. Instead of computingthe simple average of the observed payouts, use EMA to give higher weightsto recent observations relative to distant observations. Continuing the prior example, ERWA estimates the mean payout of the first and second slot machinesas emaα (h1, 2, 3, 4i) = 3.0625 and emaα (h5, 4, 3, 2i) = 2.5625 respectively whereα = 0.5. Therefore ERWA estimates the first slot machine to have a higher mean,and hence the greedy action is to play the first slot machine.2.3Clause LearningThe defining feature of CDCL solvers is to analyze every conflict it encountersto learn new clauses to block the same conflicts, and up to exponentially similar conflicts, from re-occurring.

The solver maintains an implication graph, adirected acyclic graph where the vertices are assigned variables and edges recordthe propagations between variables induced by Boolean constraint propagation.A clause is falsified when all of its literals are assigned to false, and in this circumstance, the solver can no longer proceed with the current assignment. Thesolver analyzes the implication graph and cuts the graph into two sides: theconflict side and the reason side. The conflict side must contain all the variablesfrom the falsified clause and the reason side must contain all the decision variables.

A learnt clause is generated on the variables from the reason side incidentto the cut by negating the current assignments to those variables. In practice,the implication graph is typically cut at the first unique implication point [33].Upon learning a clause, the solver backtracks to an earlier state where no clausesare falsified and proceeds from there.2.4The VSIDS Branching HeuristicVSIDS can be seen as a ranking function that maintains a floating point number for each Boolean variable in the input formula, often called activity.

Theactivities are modified in two interweaving operations called the bump and themultiplicative decay. Bump increases the activity of a variable additively by 1whenever it appears in either a newly learnt clause or the conflict side of theimplication graph.

Decay periodically decreases the activity of every variable bymultiplying all activities by the decay factor δ where 0 < δ < 1. Decay typicallyoccurs after every conflict. VSIDS ranks variables in decreasing order of activity, and selects the unassigned variable with the highest activity to branch onnext.

This variable is called the decision variable. A separate heuristic, typicallyphase-saving [26], will select the polarity to assign the decision variable.3Contribution I: Branching Heuristic as Learning Rate(LR) OptimizationThe branching heuristic is responsible for assigning variables through decisionsthat the SAT solver makes during a run.

Although most of the assignmentswill eventually revert due to backtracking and restarts, the solver guaranteesprogress due to the production of learnt clauses. It is well-known that branchingheuristics play a significant role in the performance of SAT solvers. To framebranching as an optimization problem, we need a metric to quantify the degreeof contribution from an assigned variable to the progress of the solver, to serve asan objective to maximize. Since producing learnt clauses is a direct indication ofprogress, we define our metric to be the variable’s propensity to produce learntclauses.

We will now define this formally.Clauses are learnt via conflict analysis on the implication graph that thesolver constructs during solving. A variable v participates in generating a learntclause l if either v appears in l or v is resolved during the conflict analysis thatproduces l (i.e., appears in the conflict side of the implication graph inducedby the cut that generates l). In other words, v is required for the learning of lfrom the encountered conflict.

Note that only assigned variables can participatein generating learnt clauses. We define I as the interval of time between theassignment of v until v transitions back to being unassigned. Let P be the numberlearnt clauses in which v participates during interval I and let L be the numberof learnt clauses generated in interval I. The learning rate (LR) of variable vat interval I is defined as PL . For example, suppose variable v is assigned bythe branching heuristic after 100 learnt clauses are produced. It participates inproducing the 101-st and 104-th learnt clause. Then v is unassigned after the105-th learnt clause is produced.

In this case, P = 2 and L = 5 and hence theLR of variable v is 52 .The exact LR of a variable is usually unknown during branching. In theprevious example, variable v was picked by the branching heuristic after 100learnt clauses are produced, but the LR is not known until after the 105-th learntclause is produced.

Therefore optimizing LR involves a degree of uncertainty,which makes the problem well-suited for learning algorithms. In addition, theLR of a variable changes over time, due to modifications to the learnt clausedatabase, stored phases, and assignment trail. As such, estimating LR requiresnonstationary algorithms to deal with changes in the underlying environment.4Contribution II: Abstracting Online Variable Selectionas a Multi-Armed Bandit (MAB) ProblemGiven n Boolean variables, we will abstract online variable selection as an narmed bandit optimization problem.

A branching heuristic has n actions tochoose from, corresponding to branching on any of the n Boolean variables.The expressions assigning a variable and playing an action will be used interchangeably. When a variable v is assigned, then v can begin to participate ingenerating learnt clauses. When v becomes unassigned, the LR r is computedand returned as the reward for playing the action v. The terms reward and LRwill be used interchangeably. The MAB algorithm uses the reward to update itsinternal estimates of the action that will maximize the rewards.The MAB algorithm is limited to picking actions corresponding to unassignedvariables, as the branching heuristic can only branch on unassigned variables.This limitation forces some exploration, as the MAB algorithm cannot select thesame action again until the corresponding variable is unassigned due to backtracking or restarting.

Although the branching heuristic is only assigning onevariable at a time, it indirectly assigns many other variables through propagation. We include the propagated variables, along with the branched variables, asplays in the MAB framework. That is, branched and propagated variables willall receive their own individual rewards corresponding to their LR, and the MABalgorithm will use all these rewards to update its internal estimates. This alsoforces some exploration since a variable ranked poorly by the MAB algorithmcan still be played through propagation.5Contribution III: Learning Rate Branching (LRB)HeuristicGiven the MAB abstraction, we first use the well-known ERWA bandit algorithmas a branching heuristic. We will upgrade ERWA with two novel extensions toarrive at the final branching heuristic called the learning rate branching (LRB)heuristic. We will justify these extensions experimentally through the lens ofMAB, that is, these extensions are better at maximizing the LR rewards.

Характеристики

Тип файла

PDF-файл

Размер

425 Kb

Материал

Презентации лекций

Тип материала

Лекции

Предмет

Boolean SAT-SMT Solvers for Software Engineering

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов лекций

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.