E-graph

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.

Definition and operations

[edit | edit source]

Let Σ be a set of uninterpreted functions, where Σn is the subset of Σ consisting of functions of arity n. Let 𝕚𝕕 be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of fΣn to e-class IDs i1,i2,,in𝕚𝕕 is denoted f(i1,i2,,in) and called an e-node.

The e-graph then represents equivalence classes of e-nodes, using the following data structures:[1]

  • A union-find structure U representing equivalence classes of e-class IDs, with the usual operations find, add and merge. An e-class ID e is canonical if find(U,e)=e; an e-node f(i1,,in) is canonical if each ij is canonical (j in 1,,n).
  • An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
    • a hashcons H (i.e. a mapping) from e-nodes to e-class IDs, and
    • an e-class map M that maps e-class IDs to e-classes, such that M maps equivalent IDs to the same set of e-nodes: i,j𝕚𝕕,M[i]=M[j]find(U,i)=find(U,j)

Invariants

[edit | edit source]

In addition to the above structure, a valid e-graph conforms to several data structure invariants.[2] Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes f(i1,,in),f(j1,,jn) are congruent when find(U,ik)=find(U,jk),k{1,,n}. The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.

Operations

[edit | edit source]

E-graphs expose wrappers around the add, find, and merge operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.

Equivalent formulations

[edit | edit source]

An e-graph can also be formulated as a bipartite graph G=(Nid,E) where

  • id is the set of e-class IDs (as above),
  • N is the set of e-nodes, and
  • E(id×N)(N×id) is a set of directed edges.

There is a directed edge from each e-class to each of its members, and from each e-node to each of its children.[3]

E-matching

[edit | edit source]

Let V be a set of variables and let Term(Σ,V) be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, Term(Σ,V) is the smallest set such that VTerm(Σ,V), Σ0Term(Σ,V), and when x1,,xnTerm(Σ,V) and fΣn, then f(x1,,xn)Term(Σ,V). A term containing variables is called a pattern, a term without variables is called ground.

An e-graph E represents a ground term tTerm(Σ,) if one of its e-classes represents t. An e-class C represents t if some e-node f(i1,,in)C does. An e-node f(i1,,in)C represents a term g(j1,,jn) if f=g and each e-class M[ik] represents the term jk (k in 1,,n).

e-matching is an operation that takes a pattern pTerm(Σ,V) and an e-graph E, and yields all pairs (σ,C) where σV×𝕚𝕕 is a substitution mapping the variables in p to e-class IDs and C𝕚𝕕 is an e-class ID such that the term σ(p) is represented by C. There are several known algorithms for e-matching,[4][5] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.[6]

Extraction

[edit | edit source]

Given an e-class and a cost function that maps each function symbol in Σ to a natural number, the extraction problem is to find a ground term with minimal total cost that is represented by the given e-class. This problem is NP-hard.[7] There is also no constant-factor approximation algorithm for this problem, which can be shown by reduction from the set cover problem. However, for graphs with bounded treewidth, there is a linear-time, fixed-parameter tractable algorithm.[8]

Complexity

[edit | edit source]
  • An e-graph with n equalities can be constructed in O(n log n) time.[9]

Equality saturation

[edit | edit source]

Equality saturation is a technique for building optimizing compilers using e-graphs.[10] It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.

Applications

[edit | edit source]

E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3[11] and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers.[12] In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates.[13] E-graphs are also used in the Simplify theorem prover of ESC/Java.[14]

Equality saturation is used in specialized optimizing compilers,[15] e.g. for deep learning[16] and linear algebra.[17] Equality saturation has also been used for translation validation applied to the LLVM toolchain.[18]

E-graphs have been applied to several problems in program analysis, including fuzzing,[19] abstract interpretation,[20] and library learning.[21]

References

[edit | edit source]
  1. ^ (Willsey et al. 2021)
  2. ^ (Willsey et al. 2021)
  3. ^ (Goharshady, Lam & Parreaux 2024)
  4. ^ (de Moura & Bjørner 2007)
  5. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  6. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  7. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  8. ^ (Goharshady, Lam & Parreaux 2024)
  9. ^ (Flatt et al. 2022, p. 2)
  10. ^ (Tate et al. 2009)
  11. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  12. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  13. ^ (Flatt et al. 2022, p. 2)
  14. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  15. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  16. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  17. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  18. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  19. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  20. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  21. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  • Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  • Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  • Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  • Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  • Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
[edit | edit source]