Impact of Community Structure on SAT Solver Performance (Презентации лекций)
Описание файла
Файл "Impact of Community Structure on SAT Solver Performance" внутри архива находится в следующих папках: Презентации лекций, Статьи к лекциям. PDF-файл из архива "Презентации лекций", который расположен в категории "". Всё это находится в предмете "boolean sat-smt solvers for software engineering" из 11 семестр (3 семестр магистратуры), которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст из PDF
Impact of Community Structure onSAT Solver PerformanceZack Newsham1 , Vijay Ganesh1 ,Sebastian Fischmeister1 , Gilles Audemard2 , and Laurent Simon31University of Waterloo, Waterloo, Ontario, CanadaLaboratoire Bordelais de Recherche en Informatique, Bordeaux Cedex, FranceUniversité Lille-Nord de France, CRIL - CNRS UMR 8188, Artois, F-62307 Lens23Abstract. Modern CDCL SAT solvers routinely solve very large industrial SAT instances in relatively short periods of time. It is clear thatthese solvers somehow exploit the structure of real-world instances. However, to-date there have been few results that precisely characterise thisstructure. In this paper, we provide evidence that the community structure of real-world SAT instances is correlated with the running time ofCDCL SAT solvers.
It has been known for some time that real-worldSAT instances, viewed as graphs, have natural communities in them. Acommunity is a sub-graph of the graph of a SAT instance, such that thissub-graph has more internal edges than outgoing to the rest of the graph.The community structure of a graph is often characterised by a qualitymetric called Q. Intuitively, a graph with high-quality community structure (high Q) is easily separable into smaller communities, while the onewith low Q is not.
We provide three results based on empirical data whichshow that community structure of real-world industrial instances is a better predictor of the running time of CDCL solvers than other commonlyconsidered factors such as variables and clauses. First, we show that thereis a strong correlation between the Q value and Literal Block Distancemetric of quality of conflict clauses used in clause-deletion policies inGlucose-like solvers. Second, using regression analysis, we show that thethe number of communities and the Q value of the graph of real-worldSAT instances is more predictive of the running time of CDCL solversthan traditional metrics like number of variables or clauses. Finally, weshow that randomly-generated SAT instances with 0.05 ≤ Q ≤ 0.13 aredramatically harder to solve for CDCL solvers than otherwise.1IntroductionIn the last few years, we have witnessed impressive improvements in the performance of conflict-driven clause-learning (CDCL) Boolean SAT solvers overreal-world industrial SAT instances, despite the fact that the Boolean satisfiability problem is known to be NP-complete and the worst-case time complexityof our best solvers is exponential in the size of the formula.
What is even moreimpressive is that these solvers perform extremely well even for never-beforeseen classes of large industrial instances, where the biggest instances may haveupwards of 10 million clauses and millions of variables in them. In other words,one cannot reasonably argue that these solvers are being hand-tuned for everyclass of real-world instances. It is all but clear that CDCL solvers employ a verygeneral class of techniques, that have been robustly implemented and continuously tested for many applications ranging from software engineering to AI.
Allof this begs the question why CDCL solvers are so efficient, and whether theyare exploiting some structural features of real-world instances. It is this questionthat we address in this paper.In this paper, we present three results that show that there is correlationbetween the presence of natural communities [6] in real-world SAT instances [3,4] and the running time of MiniSAT CDCL solver [7] (by extension many otherCDCL SAT solvers that are either built using MiniSAT code or use the mostimportant techniques employed by CDCL SAT solvers). Informally, a community [6] in a SAT formula, when viewed as the variable-incidence graph 4 , is asub-graph that has more edges internal to itself than going out to the remainder of the graph. There is previous work pointing to some correlation betweencommunity structure in SAT instances and performance of CDCL solvers [2].However, we provide much stronger evidence as discussed in the Contributionssub-section below.We characterise the structure of SAT instances through a well-known metriccalled the Q value [6] and the number of communities present in its graph.
The Qvalue is a widely-accepted quality metric that measures whether the communitiesin a graph are easily separable. In particular, formulas 5 with high Q tend to havefew inter-community edges relative to the number of communities, while thosewith low Q have lots of inter-community edges.Contributions:1.
We show that there is a strong correlation between Literal Block Distance(LBD), introduced in a paper [5] by some of the authors on learnt clausequality, and number of communities. This correlation fits better and betteras the search progresses. In their original paper [5], the authors suggestedthat the quality of a learnt clause can be measured using the LBD metric.I.e. the lower the LBD the better the learnt clause.
They also suggested alearnt clause deletion policy, wherein clauses with high LBD were marked fordeletion. The result we found in this paper suggests that low LBD clausesalso are shared by very few communities.2. We performed a regression analysis of the performance of the MiniSAT [7]solver over SAT 2013 competition instances [1], using a variety of factorsthat characterise Boolean formulas including number of variables, numberof clauses, number of communities, Q and even ratios between some of these45A variable-incidence graph of a Boolean SAT formula is one where the variables ofthe formula are nodes and there is an edge between two nodes if the correspondingvariables occur in the same clause.In the rest of the paper, we do not distinguish a formula from its variable-incidencegraph.factors. We found that the number of communities and Q were more correlated with the running time of MiniSAT over these instances (real-world,hard combinatorial, and random) than the traditional factors like number ofvariables, clauses or the clause-variable ratio.3.
Additionally, we generated approximately 500,000 random Boolean formulas and made the surprising finding that MiniSAT finds it hard to solveinstances with Q value lying in the range from 0.05 to 0.13, whereas it wasable to easily solve the ones outside this range. While previous work [11]has shown that the clause-to-variable ratio is predictive of solver run timeon randomly-generated instances (phase transition at clause-to-variable ratio of 4.2 [11]), this metric is not predictive at all of solver efficiency onreal-world instances [15]. By contrast, according to our experiments, Q andnumber of communities measure for both real-world and random instancesare correlated with the running time of MiniSAT (and by extension all solversthat are significantly similar to it algorithmically) on these instances.2BackgroundIn this Section, we provide some background on regression analysis, the conceptof the community structure of graphs and how it relates to SAT formulas.2.1Community Structure of SAT FormulasThe idea of decomposing graphs into natural communities [6, 17] arose in thestudy of complex networks such as the graph of biological systems or the Internet.
Informally, a network or graph is said to have community structure, ifthe graph can be decomposed into sub-graphs where the sub-graphs have moreinternal edges than outgoing edges. Each such sub-graph (aka module) is calleda community. Modularity is a measure of the quality of the community structureof a graph. The idea behind this measure is that graphs with high modularityhave dense connections between nodes within sub-graph but have few intermodule connections.
It is easy to see informally that maximising modularity isone way to detect the optimal community structure inherent in a graph. Manyalgorithms [6, 17] have been proposed to solve the problem of finding an optimalcommunity structure of a graph, the most well-known among them being theone from Girvan and Newman [6]. The quality measure for optimal communitystructure is often referred to as the Q value, and we will continue to call it similarly. There are many different ways of computing the Q value and we refer thereader to these paper [6, 17, 12].We experimented with two different algorithms the Clauset-Neuman-Moore(CNM) algorithm [6] and the online community detection algorithm (OL) [17].While we did find that the CNM algorithm resulted in a better communitystructure — evidenced by fewer communities with few links between them —we chose the OL algorithm because of its vastly superior run time. This was ofparticular importance due to the sheer size and number of the SAT instanceswe processed.