Forgetting variables in d4v2 model counter circuits

As part of my work for my MSc thesis, I've recently started to play with d4v2, a model counter that also supports compilation in d-DNNF, and I managed to extend it to support a very specific use case. As I find this topic very interesting, I decided to write this post, so let's dive in!

A gentle introduction to knowledge compilation

It's well known that the Boolean Satisfiability (SAT) problem is NP-complete, and despite having incredibly fast and sophisticated SAT solvers, sometimes extracting information out of a formula is still very complex and can require too much time. By information, I'm specifically referring to some properties of the formula such as satisfiability, validity, its number of models, equivalence with other formulas or clauses, and so on.

Knowledge compilation is a field that studies ways of compiling formulas in a language (for example, CNF) to a target language that allows for a quicker extraction of some information. In particular, the target language I'm interested in is the d-DNNF, or deterministic-Decomposable Negation Normal Form.

NNF, DNNF and d-DNNF

NNF is a simple language in which a formula is structured as follows:

There are only conjunctions ( $\land$ ) and disjunctions ( $\lor$ ).
All negations ( $\neg$ ) appear only in front of variables.

An example of a formula in NNF is

\phi = (A \lor \neg B) \land (C \land (\neg D \lor E))

To transform it in DNNF we have to add decomposability, which means that, for every conjunction, the conjuncts don't share variables. To give some examples:

$C \land (\neg D \lor E)$ , which is decomposable because $\{C\} \cap \{D, E\} = \empty$ .
$A \land (\neg A \lor B)$ , which is not decomposable because $\{A\} \cap \{A,B\} = \{A\}$ .

Lastly, we need to add determinism, which means that for every disjunction, each two disjuncts are logically contradicting. For example:

$(A \land \neg B) \lor (\neg A \land B)$ is deterministic because $(A \land \neg B) \land (\neg A \land B) = \bot$ .
$A \lor (A \land \neg B)$ is not deterministic because the disjuncts are not mutually exclusive as their conjunction $A \land (A \land \neg B)$ is satisfiable by $\mu = \{A, \neg B\}$ .

If a formula satisfies all these previous properties, we can say that it is in d-DNNF.

Forgetting variables

Another useful technique is called forgetting, and it is used to remove some variables in a formula. Consider, for example,

\phi = (A \land B) \lor (\neg A \land C)

Suppose we want to forget about variable $A$ , what we can do is to suppose that either $A=\top$ or $A = \bot$ . So,

\text{forget}(\phi, A) = \phi_{A=\top} \lor \phi_{A=\bot} \\ = \Big( (\top \land B) \lor (\neg \top \land C) \Big) \lor \Big((\bot \land B) \lor (\neg \bot \land C)\Big) \\ = \Big( (B) \lor (\bot \land C) \Big) \lor \Big((\bot \land B) \lor (\top \land C)\Big) \\ = B \lor C

As you can see, we managed to reduce $\phi$ without variable $A$ to just $B \lor C$ .

D4v2, and what I've done

D4v2 is a tool that allows to do model counting and compiling formulas in d-DNNF. It accepts inputs in either DIMACS or BC-S1.2, which is a custom circuit format they made that allows for faster compilation and resulting in smaller formulas. After all this, you are probably asking: what is the problem? Well, you can forget variables with DIMACS but not with BC-S1.2. So, I looked into the source code and realized it was possible to add this functionality simply by extending the format and parser! I decided to add a new optional statement P v1 ... vn to the format and the parser, which specifies which variables we want to keep, or in other terms which variables we want the others to be projected on. It's basically a way to forget variables that are not listed. The result? it worked! In fact, consider this simple circuit:

c BC-S1.2
I a
I b
P a
G g1 := O a b
G g2 := O -b a
G g3 := A g1 g2
T g3

It is equivalent to the formula

\phi = (a \lor b) \land (\neg b \lor a)

But, due to the P a statement, we are telling D4 to forget variable $b$ . In fact, the output of the tool is

o 1 0
t 2 0
1 2 1 0

This is a specific format that the tool uses, and it's saying that the final formula is a DAG starting from node 1 and going to node 2, where in node 2 we have variable 1 (= $a$ ) set to true. It's a success!

In contrast, if we omit the P a statement, the output would be

o 1 0
t 2 0
1 2 1 2 4 5 0

As you can see, it contains many more variables.