Impact of Community Structure on SAT Solver Performance (1185835), страница 2
Текст из файла (страница 2)
Our initial experiments were conducted with an implementationof the CNM algorithm, then repeated with the OL algorithm. The results wepresent in Section 3.4 were observed, regardless of the choice of algorithm.2.2Linear RegressionIn this paper we make use of linear regression techniques for the result thatcorrelates the Q value and number of communities with the running time of theMiniSAT CDCL SAT solver. In case the reader is not familiar with this topic,we provide a very brief description of the basic ideas involved.Given multiple independent factors and a single dependent variable, linearregression can be used to determined the relationship between the factors and thevariables based on a provided model. For the scope of this paper the dependentvariable will always be log(time), while the independent factors (such as Q value,number of communities, variables, and clauses) will be appropriately specifiedfor each experiment in Section 3.4.
This model can either look only at the maineffects of the factors specified, or at both the effects of factors and the interactionsbetween them.We provide a few important definitions below:ANOVA stands for analyses of variance. In the scope of this paper, it is generated by the linear regression, and used to understand the influence that specificfactors and interactions between factors have on the dependent variable.R2 represents the amount of variability in the data that has been accounted forby the model and is used to measure the goodness of fit of the model.
It rangesfrom zero to one with one representing a perfect model. Due to the nature ofthe calculation, the R2 value will increase when additional factors are added tothe model. In this paper, we refer to the Adjusted R2 which is modified to onlyincrease if an added factor contributes positively to the model.Confidence Levels are used to specify a certain level of confidence that a givenstatement is true.
They can be used to calculate the likelihood of a given setof input values resulting in a given output (for example time), or they can beused to estimate the likelihood that a factor in a model is significant. Theyare measured in percent, typical values are 99.9, 99 and 95. Any result witha confidence level below 95% is considered unimportant in the context of thispaper.Confidence Intervals are used to provide a range for a value at a given confidence level, which is usually set to 95% or 99%. They show that with a givenpercentage probability, an estimated value will lie between a certain range.Kolmogorov–Smirnov test is used to provide quantitative assurances that aprovided sample belongs to a specified distribution, it results in a value betweenzero and one, with values approaching zero indicating that the provided sampledoes belong to the specified distribution.Residuals is the difference between a fitted dependent variable and the corresponding provided dependent variable.
It represents the amount of error for agiven set of input factors when calculating the output.3Experimental ResultsIn this Section we describe our experimental results that correlate Q value/thenumber of communities with the running time of two CDCL solvers we considered in our experiments, namely, MiniSAT [8] and Glucose [5].3.1Correlation between LBD and Community StructureIn this section, we propose to link the number of communities contained in aclause with an efficient measure used in recent CDCL (Conflict-Driven ClauseLearning) solvers to score learnt clause usefulness.
Interestingly, this measure isused in most of today’s best SAT solvers.We assume that the reader is familiar with the basic concepts and techniquesrelevant to CDCL SAT solvers. Briefly, these solvers branch by making decisionson the value of literals, and at any step of the search, ensure that all the unitclauses w.r.t the current partial assignment are correctly propagated until eitheran empty clause is found (the input formula is UNSAT) or a complete assignmentis found (the input formula is SAT). The important point to be emphasised isthat solvers learn clauses at a fast rate (generally around 5000 per seconds),which can overwhelm the memory of the system unless some steps are taken toroutinely get rid of some of them.
Such learnt clause deletion policies have cometo be recognised as crucial to the efficiency of solvers. The trick to the success ofsuch deletion policies is that somehow the solver has to differentiate good clausesfrom the ones that are not so good.Prior to the 2009 version of the Glucose solver [5], deletion policies wereprimarily based on the past VSIDS activity of the clauses, i.e., learnt clauseswith low VSIDS scores were deleted. However in their paper [5], the authorsproposed that a better scoring mechanism for learnt clauses is to rank themby the number of distinct decision levels the variables in these clauses belongedto.
This measure is called the Literal Block Distance (LBD) of a clause. Thesmaller the LBD score of a clause, the higher its rank. The intuition behind thisscoring mechanism is the following: The lower the LBD score of a clause, thefewer the number of decision levels needed for this clause to be unit propagatedor falsified again during the search. Clauses with LBD score of 2 are called glueclauses, because it allows for merging (glue) of two blocks of propagations withthe aim of removing one decision variable (this notion in turn inspired the nameof the Glucose solver). It is important to note that this behaviour is more likelyto happen when using the phase saving heuristic for branching [14], because allvariables are set to their last propagated value when possible.The hypothesis we test in this sub-section is that the notion of blocks ofpropagations (i.e. a decision variable and all the propagated variables by unitpropagation at the same decision level) is highly correlated to the idea of communities.
To be more precise, if an input instance has high-quality communitystructure (communities with few inter-community edges) then we hypothesisethat the conflict clauses that are shared between fewer communities are likely tocause more propagation per decision and hence are likely to have a lower LBDscore. We verify our hypothesis, namely, that there is indeed a strong relationship between the number of communities of a clause (initial or learnt) and itsLBD score computed by Glucose.Intuitively, we consider clauses that are shared between very few communitiesas higher quality than the ones shared between many communities, because suchclauses are localised to a small set of communities possibly enabling the solver toin-effect partition the problem into many “small set of communities” and solvethem one at a time.Experiment Set up: We limit our study to the set of industrial instances ofthe SAT 2013 competition (Applications category).
This is in line with our observation that SAT instances obtained from real-world applications have goodcommunity structure, and consequently the notion of LBD scoring will likelyhave the biggest impact on performance for such instances than otherwise. Putit differently, if their is indeed a relationship between LBD and community structure of SAT instances, we hope to characterise it in this set of problems first. Forour experiment, we store, for each formula of the 189 instances (SAT’13 Competition, Application Track), the value of the LBD and the number of differentcommunities of each learnt clauses.
(There are 300 application instances in theSAT’13, however we were able to compute the communities of only 189 of themdue to resource constraints.)We would like to emphasise few points here: (1) First, all instances were preprocessed using the SatELite simplifier before any experiments were conductedfor this study.
All CDCL solvers have pre-processing simplifying routines inthem. It is very likely that these kind of simplified formulas are representativeof the inherent structure of formulas CDCL solvers are efficient on. (2) Second,we computed the community structure using the Newman algorithm [12] (akaCNM algorithm) on the variable-incidence graph representation of the formula.Results were stored in a separate file once for all the experiments.
For each SATinstance, the corresponding communities were first labeled. Then, for every instance we maintained a map from variables occurring in that instance to thelabel of the corresponding community the variable belonged to. (3) Third, Glucose was launched on these instances without any information regarding theircommunity structure computed in step (2). Glucose computed the LBD valuesfor the input instances, and stored them in a large trace file that was analysedlater: for each instance, for each learnt clause, we compared the LBD value ofthat learnt clause and computed, thanks to (2), the number of distinct communities the learnt clause contained.
Finally, note that we used a maximum numberof conflicts for our study, not a time out. In Section 3.2, the maximum conflictsstudied is set to the first 20,000 conflicts. In the Section 3.3, it is set to 100,000.Those values were chosen w.r.t. the statistical tools we used.3.2Observing the clear relationship between Communities andLBD by HeatmapsIt is not trivial to express a relationship between thousands of values, followingunknown distribution functions, on hundreds of problems. Hence, to show the(a) dated-5-13-u (138,808 / (b) aaai10-planning (50,277 (c) rbcl xits 14 (2,220 / 72597,775 / 0.9)/ 12776 / 0.91)/ 0.53)Fig.
1: Relationship between Literal Block Distance (LBD) and communities for aselection of instances. The x-axis corresponds to LBD, the y-axis to communities.Blue intensity represents the frequency of learnt clauses with the considered LBDand community value. For each instance, we provide in parenthesis, the numberof variables, the number of communities, and the quality value (Q). The figureis analysed in Section 3.2general trend, we chose to build one heatmap per instance: we compute thenumber of occurrences of each couple (LBD, Community) on the consideredproblem and assign the intensity of a colour with respect to this number ofoccurrences.
The result is shown for some characteristic instances Figure 1. Aswe can see, there is an obvious and clear relationship on the first two instances(a, b) which are the most frequent cases. Intensive colours follow the diagonal:many clauses have approximatively the same LBD and the same number ofcommunities. This behaviour appears in most cases. All heatmaps are availableon the web (a temporary url for the reviewing process available on request), withmore or less a strong diagonal shape.From these figures we can conclude that there has to be a strong relationship between LBD and number of communities.