Formulating Extensional Equality for Prehistories in Lean

AHart · November 24, 2025, 10:50pm

In a meeting, I talked about a problem and solution I had while developing the model checker for the semantics of HBB. Several people were interested and it was suggested I write it up here for future reference. It’s now successful enough that I’m confident I did the right thing.

Some background: The HBB paper has, as part of its semantics, a datatype called a prehistory. In the paper, it’s described as a set of triples of “places” (elements of an arbitrary set, P, of places), events (again, from an arbitrary set, E, we label “events”), and prehistories. This means it’s an inductively defined set; the least fixed point of X = FinSet (P × E × X). These triples are called “Worlds” in the paper, so you’ll see them in a few places in the formalization under that name.

Another way to formulate this is to say that the type of prehistories is the cofree comonad of the finite set functor specialized to pairs of Ps and Es. Not that this is that enlightening of a description, but it’s something I noticed.

So, what’s the problem? Well, my job is to formalize stuff in Lean, and while Lean does have a finite sets functor. it has terrible support for it. In particular, it does not accept using finite sets as recursive layers in an inductive type. The technical reason is that Lean’s positivity checker, something used to make sure every fully instantiated instance of an inductive type is, in fact, finite, can’t see through the FinSet functor, nor quotients, nor anything obviously equivalent. So, what to do?

Thankfully, the HBB paper already has a solution; use lists instead. As it turns out, nothing in the HBB paper is sensitive to using finite sets vs lists, so if we swap FinSet for List, we can write

inductive PreHistory (P Event : Type*) where
  | mk : List (P × MaybeEvent Event × PreHistory P Event) → PreHistory P Event

Oh ya, we also have to use Maybe E instead of raw E; not important for this post, but that is there.

And the paper is right, this is fine. The complete formalization of every theorem in the paper goes through just fine using lists; morally, it’s a fine decision.

So, what’s the problem I’m writing this post about? Well, my new job is to write a model checker for this semantics. The strategy I’m persuing involved taking a formula from the HBB paper semantics, and generating a CNF boolean SAT formula such that a model decoded from an instance is satisfying iff and only iff it satisfies the high-level hbb semantics. The exact theorem I’m going for looks like this

lemma encodeFormula_adequate (b : Bounds S) (φ : Formula S) (w : WId b)
    (st : EncState b) (σ : SAT.Assignment (Var b)) (hWF : WF b σ)
    (hClauses : (encodeFormula b φ w st).2.clauses.all (SAT.Clause.eval σ) = true) :
    (σ (FVar.toVar b (encodeFormula b φ w st).1) = true) ↔
    Sat (modelOf b σ hWF) w.p (b.decodeMaybeEvent w.ei) (decodePre b σ hWF w.ti) φ

To give some guidance, (b : Bounds S) contains our events and participants as finite, enumerated types. (w : WId b) locates the world of the semantic problem as an id within the SAT formula. (st : EncState b) is used to keep track of variables as they generate; it contains a gensym and other stuff. hWF are side conditions; other parts of the CNF that makes sure models are the correct types but aren’t sensitive to the formula, while hClauses is the CNF for the formula. The overall CNF is the conjunction of hWF and hClauses. The full details aren’t important, but, at a high level, this is the goal. I won’t describe the encoding in any detail in this post, as it’s like a hundred times more complicated than this one problem, but Jamie floated the idea of making a paper out of it. I’ll consider it. For now;

Notice that this is an equivalence. This cannot be weakened to simple simplication due to implication itself being a constructor of the underlying logic. Because it’s contrapositive in its first argument, proving one direction requires the other. At that point, the whole thing needs to be an equivalence.

Okay, so, what’s the actual problem? From context, it has something to do with CNF encoding lists. Why is that an issue?

The issue is, CNF encoding a fixed point of lists is hard. At every layer, we have to have enough bits in our CNF to specify the permutation at that layer and connect each index to the appropriate child. If we had to do this, this wouldn’t be a problem; it’s just the job. The issue is that, morally, we don’t want to care about these extra bits. They make the problem harder and bigger while basically encoding noise. Very undesirable. Morally, we want to treat Prehistories as sets. But Lean won’t let us. What to do?

The solution I hit on was to define a notion of existential equality for Prehistories. This would treat them as being made of sets. I then reformulated the semantics to be invariant to this specific notion of equality. Ultimately, the goal was to prove

lemma congr_histEq
    {M : Model S P}
    {p : P} {evt : MaybeEvent S.EventType}
    {H H' : PreHistory P (S.EventType)}
    (hHist : histEq H H')
    (φ : Formula S) :
    Sat M p evt H φ ↔ Sat M p evt H' φ

where histEq is my new notion of equality. If this is proven, and the SAT encoding provably treats prehistories the way that histEq does, then we can encode prehistories as sets with no problem.

There were two technical problems to solve.

How to define histEq in the first place.
How to modify the definition of Sat so it was actually invariant, despite prehistories actually being lists.

Conceptually, the definition of histEq is easy. The conceptual challenge is realizing that we must enforce being set-like at all layers, in a histEq-invarient way. If we just state that every layer is a set, that’s not enough, each element must be represented up to the fact that the elements themselves could be different but histEq. What we want is a mutually recursive definition like

mutual

def histEqAt : PreHistory P Event → PreHistory P Event → Prop
  | PreHistory.mk xs, PreHistory.mk ys =>
      (∀ w₁ ∈ xs, ∃ w₂ ∈ ys, worldEqAt n w₁ w₂) ∧
      (∀ w₂ ∈ ys, ∃ w₁ ∈ xs, worldEqAt n w₁ w₂)

def worldEqAt : World P Event → World P Event → Prop
  fun w₁, w₂ =>
      World.place w₁ = World.place w₂ ∧
      World.event w₁ = World.event w₂ ∧
      histEqAt n (World.time w₁) (World.time w₂)

end

this expresses exactly what we want. So, what’s wrong with it? The issue is termination. Lean cannot see that these functions terminate. The issue is the existential quantifier. When we have ∃ w₂ ∈ ys, the termination checker sees this as a fresh variable, and doesn’t generate the hypotheses needed to establish that the history in w₂ is syntactically smaller than ys. This means we can’t manually prove termination either; the needed hypotheses aren’t generated at all. So it’s impossible to convince lean that these functions are terminating.

I tried a few different things to get around this. I tried writing a prehistory canonizer; I tried making sophisticated permuting relations that represented permutation constructions inductively. Both ran into the same issue; if this were a multiset it wouldn’t be so hard, but because we need to handle de-duplication, well, it’s a giant mess that makes most obvious things fail. And even if I succeeded, I’d need to prove these equivalent to what I actually want in the form of the above characterization. Eventually I realized that I was overcomplicating this.

What we can instead do is use a “fuel” construction. This is a very common method to prove functions terminating. What you do is define a function whose output obviously structurally decreases at each call. You then define the functions you want by setting the fuel (which can vary from input to input) high enough that it never runs out in real calls. We can define

mutual
  private def heightList {P Event} :
      List (World P Event) → Nat
    | []      => 0
    | t :: ts => Nat.max (height t.2.2) (heightList ts)

  def height {P Event} : PreHistory P Event → Nat
    | mk l => heightList l + 1
end

then define

mutual

/-- Extensional equality of histories bounded by `n` unfolding steps. -/
def histEqAt : Nat → PreHistory P Event → PreHistory P Event → Prop
  | 0, _, _ => False
  | Nat.succ n, PreHistory.mk xs, PreHistory.mk ys =>
      (∀ w₁ ∈ xs, ∃ w₂ ∈ ys, worldEqAt n w₁ w₂) ∧
      (∀ w₂ ∈ ys, ∃ w₁ ∈ xs, worldEqAt n w₁ w₂)

/-- Extensional equality of worlds bounded by `n` unfolding steps. -/
def worldEqAt : Nat → World P Event → World P Event → Prop
  | 0, _, _ => False
  | Nat.succ n, w₁, w₂ =>
      World.place w₁ = World.place w₂ ∧
      World.event w₁ = World.event w₂ ∧
      histEqAt n (World.time w₁) (World.time w₂)

end

/-- Semantic history equality instantiated with a sufficient fuel bound. -/
def histEq (h₁ h₂ : PreHistory P Event) : Prop :=
  histEqAt (height h₁ + height h₂) h₁ h₂

/-- Semantic world equality instantiated with a sufficient fuel bound. -/
def worldEq (w₁ w₂ : World P Event) : Prop :=
  worldEqAt (World.height w₁ + World.height w₂) w₁ w₂

then, with some effort, we can prove that the fuel never runs out, and get the original defining equations as theorems

lemma worldEq_spec (w₁ w₂ : World P Event) :
    worldEq w₁ w₂ ↔
      World.place w₁ = World.place w₂ ∧
      World.event w₁ = World.event w₂ ∧
      histEq (World.time w₁) (World.time w₂)

lemma histEq_spec (h₁ h₂ : PreHistory P Event) :
    histEq h₁ h₂ ↔
      (∀ w₁ ∈ h₁, ∃ w₂ ∈ h₂, worldEq w₁ w₂) ∧
      (∀ w₂ ∈ h₂, ∃ w₁ ∈ h₁, worldEq w₁ w₂)

Great! That’s problem 1. Now for problem 2. As it turns out, this issue is much easier. There are a handful of areas in the semantics which care about the exact history through one of two means;

We start with an input history and ask for that history, verbatim.
It literally asks for two histories to be equal.

To exemplify both, we have

/-- Strict accessibility: `t' ≪ t` when `t'` belongs to the time component of `t`. -/
def accessible (t' t : World P Event) : Prop :=
  t' ∈ World.time t

/-- Participant `p` is sequential in `h` when its event-tuples are linearly ordered by accessibility. -/
def isSequential (p : P) (h : PreHistory P Event) : Prop :=
  ∀ {t₁ t₂ : World P Event},
    t₁ ∈ h →
    t₂ ∈ h →
    World.place t₁ = p →
    World.place t₂ = p →
      (t₁ ≪ t₂ ∨
        t₂ ≪ t₁ ∨
        t₁ = t₂)

The solutions are pretty simple. In the case that we use exact equality, simply replace it with histEq. In cases where a specific history is asked for, replace it with an assertion that there exists an equivalent history. So the new versions are

def accessible (t' t : World P Event) : Prop :=
  ∃ v ∈ World.time t, worldEq t' v

def isSequential (p : P) (h : PreHistory P Event) : Prop :=
  ∀ {t₁ t₂ : World P Event},
    t₁ ∈ h →
    t₂ ∈ h →
    World.place t₁ = p →
    World.place t₂ = p →
      (t₁ ≪ t₂ ∨
        t₂ ≪ t₁ ∨
        worldEq t₁ t₂)

And that’s it! There were maybe half a dozen or so places such modifications needed to be made, but, once done, the semantics truely treated prehistories as sets, even though they were lists. The modle checker still isn’t done, but this unblocked the hardest part of the development so far. Hopefully those reading this found this interesting, and maybe useful.

Jamie pointed out that this is a generic way to get around Leans limitations; if we have a function of any kind that’s supposed to be invariant to some quotienting, we can use this to accomplish that in practice without actually using a quotient anywhere.

Thanks for reading!

Topic		Replies	Views
Data & function encoding Protocol Design	37	720	December 18, 2024
Towards a general proof type Protocol Design	22	365	August 30, 2024
Resources as purely functional objects RM	1	83	April 29, 2025
Proper abstraction of instruction sets / virtual machines Protocol Design	19	295	September 5, 2024
Simple Merkleization System Protocol Design	12	326	May 11, 2024

Formulating Extensional Equality for Prehistories in Lean

Related topics