(With: Aditya Nori, Sriram Rajamani and Akash Lal)
Statistical relational learning consists of inferring new relations from an existing data corpus using probabilistic formulas as specifications. We advance the state-of-the-art in relational learning significantly using techniques developed in program verification. In particular, our technique efficiently deals with axioms which are probabilistic formulas with probability 0 or 1. These axioms are required to be satisfied by any inferred valuations to the relations (each valuation to the relations is called a world). Our technique is inspired by the popular Counterexample Guided Abstraction Refinement (CEGAR) scheme, which is widely used in software model checking and theorem proving. We show how to apply CEGAR in this new setting of relational learning. We also show that our algorithm, which adds axioms lazily and iteratively in the spirit of CEGAR using an underlying relational learner, is guaranteed to produce optimal results assuming that the underlying relational learner used by our technique is optimal.
We draw parallels between using CEGAR in verification, and CEGAR in relational inference. Techniques such as generalization used for accelerating convergence of CEGAR in verification have counterparts in CEGAR for relational inference. Checking if a world inferred by a relational learner is consistent with the axioms is non-trivial, since the size of the world can be large. We show how to check for such consistency by using database querying techniques.
We also present an implementation of our algorithm in a tool SOFT-CEGAR. We empirically compare our approach to state-of-the-art statistical relational learning approaches such as TUFFY and ALCHEMY over four real-world applications. We find that SOFT-CEGAR significantly outperforms these approaches both in terms of runtime and quality of results