How to implement dependent type theory III

I spent a week trying to implement higher-order pattern unification. I looked at couple of PhD dissertations, talked to lots of smart people, and failed because the substitutions were just getting in the way all the time. So today we are going to bite the bullet and implement de Bruijn indices and explicit substitutions.

The code is available on Github in the repository andrejbauer/tt (the blog-part-III branch).

People say that de Bruijn indices and explicit substitutions are difficult to implement. I agree, I spent far too long debugging my code. But because every bug crashed and burnt my program immediately, I at least knew I was not done. In contrast, “manual” substitutions hide their bugs really well, and so are even more difficult to get right. I am convinced that my implementation from part II is still buggy.

Blitz introduction to de Bruijn indices and explicit substitution

If you do not know about de Bruijn indices and explicit substitutions you should first read the relevant Wikipedia pages, and perhaps the original paper on explicit substitutions, written by a truly impressive group of authors. Here is an inadequate short explanation for those who cannot be bothered to click on links.

We keep looking up variables in a context by their names, which seems a bit inefficient. We might have the bright idea of referring to positions in the context directly. We can indeed do this, and because a context is like a stack there are two choices:

  • de Bruijn levels are positions as counted from the bottom of the stack,
  • de Bruijn indices are positions as counter from the top of the stack.

We will use the indices. Thus, when the context grows all the old indices have to be shifted by one, which sounds more horrible than it is, as levels bring their own problems (which?). For instance, the $\lambda$-term $\lambda x \,.\, \lambda y \,.\, x$ is written with de Bruijn indices as $\lambda \, (\lambda \, 1)$, whereas $\lambda x \,.\, \lambda y \,.\, y$ is written as $\lambda \, (\lambda \, 0)$. (Just go read the Wikipedia article on de Bruijn indices if you have not seen this before.)

The shifting and pushing of new things onto the context is expressed with explicit substitutions:

type substitution =
  | Shift of int
  | Dot of expr * substitution

Read Shift k as “add $k$ to all indices” and Dot(e,s) as “push $e$ and use $s$”. In mathematical notation we write $\uparrow^n$ instead of Shift n and $e \cdot \sigma$ instead of Dot(e,sigma). An explicit substitution $\sigma$ acts on an expression $e$ to give a new expression $[\sigma] e$. For example:

  • $[\uparrow^k] (\mathtt{Var}\, m) = \mathtt{Var} (k + m)$
  • $[e \cdot \sigma)] (\mathtt{Var}\, 0) = e$
  • $[e \cdot \sigma)] (\mathtt{Var}\, (k+1)) = [\sigma](\mathtt{Var}\, k).$

Below we will read off the other equations from the source code. Substitutions are performed on demand, which means that $[\sigma] e$ is an expression that needs to be accounted for in the syntax.

Splitting the syntax

The user is going to type in syntax with names, which we have to convert to an internal syntax that uses the indices. We should also keep the original names around for pretty-printing purposes. Therefore we need a datatype Input.exp for parsing,

(** Abstract syntax of expressions as given by the user. *)
type expr = expr' * Common.position
and expr' =
  | Var of Common.variable
  | Universe of int
  | Pi of abstraction
  | Lambda of abstraction
  | App of expr * expr

(** An abstraction [(x,t,e)] indicates that [x] of type [t] is bound in [e]. *)
and abstraction = Common.variable * expr * expr

and a datatype Syntax.expr for the internal syntax:

(** Abstract syntax of expressions, where de Bruijn indices are used to represent
    variables. *)
type expr = expr' * Common.position
and expr' =
  | Var of int                   (* de Briujn index *)
  | Subst of substitution * expr (* explicit substitution *)
  | Universe of universe
  | Pi of abstraction
  | Lambda of abstraction
  | App of expr * expr

(** An abstraction [(x,t,e)] indicates that [x] of type [t] is bound in [e]. We also keep around
    the original name [x] of the bound variable for pretty-printing purposes. *)
and abstraction = Common.variable * expr * expr

(** Explicit substitutions. *)
and substitution =
  | Shift of int
  | Dot of expr * substitution

Conversion from one to the other is done by Desugar.desugar. Notice that we do not throw away variable names, but rather keep them around in the internal syntax so that we can print them out later. Strangely enough, beautify.ml gets shorter with de Bruijn indices.

Explicit substitutions

The Syntax module contains a couple of functions for handling explicit substitutions. First we have Syntax.composition which tells us how substitutions are composed:

let rec compose s t =
  match s, t with
    | s, Shift 0 -> s
    | Dot (e, s), Shift m -> compose s (Shift (m - 1))
    | Shift m, Shift n -> Shift (m + n)
    | s, Dot (e, t) -> Dot (mk_subst s e, compose s t)

In mathematical notation:

  • $\sigma \circ \uparrow^0 = \sigma$
  • $(e \cdot \sigma) \circ \uparrow^{m} = s \circ \uparrow^{m-1}$
  • $\uparrow^{m} \circ \uparrow^{n} = \uparrow^{m + n}$
  • $\sigma \circ (e \cdot \tau) = [\sigma] e \cdot (\sigma \circ \tau)$

Of course, composition $\circ$ is the operation characterized by the equation $[\sigma \circ \tau] e = [\sigma]([\tau] e)$. Next we have Syntax.subst which explains how substitutions are performed:

(** [subst s e] applies explicit substitution [s] in expression [e]. It does so
    lazily, i.e., it does just enough to expose the outermost constructor of [e]. *)
let subst =
  let rec subst s ((e', loc) as e) =
    match s, e' with
      | Shift m, Var k -> Var (k + m), loc
      | Dot (e, s), Var 0 -> subst idsubst e
      | Dot (e, s), Var k -> subst s (Var (k - 1), loc)
      | s, Subst (t, e) -> subst s (subst t e)
      | _, Universe _ -> e
      | s, Pi a -> Pi (subst_abstraction s a), loc
      | s, Lambda a -> Lambda (subst_abstraction s a), loc
      | s, App (e1, e2) -> App (mk_subst s e1, mk_subst s e2), loc
  and subst_abstraction s (x, e1, e2) =
    let e1 = mk_subst s e1 in
    let e2 = mk_subst (Dot (mk_var 0, compose (Shift 1) s)) e2 in
      (x, e1, e2)
  in
    subst

The code is not very readable, but in mathematical notation the interesting bits say:

  • $[\uparrow^m](\mathtt{Var}\,k) = \mathtt{Var}\,(k + m)$
  • $[e \cdot \sigma] (\mathtt{Var}\,0) = e$
  • $[e \cdot \sigma] (\mathtt{Var}\,k) = [\sigma](\mathtt{Var}(k-1))$
  • $[\sigma](\lambda\, e) = \lambda \, ([\mathtt{Var}\,0 \cdot (\uparrow^1 \circ \sigma)] e)$
  • $[\sigma](e_1\,e_2) = ([\sigma]e_1)([\sigma]e_2)$

There is also Syntax.occurs which checks whether a given index appears freely in an expression. This is not entirely trivial because explicit substitutions and abstractions change the indices, so the function has to keep track of what is what.

You may wonder what happened to $\beta$-reduction. If you look at Norm.norm you will discover it burried in the code for normalization of applications:
$$(\lambda \, e_1)\, e_2 = [e_2 \cdot \uparrow^0] e_1.$$

Normalization

In the last part we demonstrated normalization by evaluation. We always normalized everything all the way, which is an overkill. For example, during equality checking the weak head normal form suffices to get the comparison started, and then we normalize on demand. So I replaced normalization by evaluation with direct normalization, as done in norm.ml. We still need normal forms when the user asks for them. Luckily, a single function can perform both kinds of normalization.

Optimization

The source contains no optimizations at all because its purpose is to be as clear as possible. The whole program is still pretty small, we are at 824 lines while the core is just 247 lines. The speed is comparable to the previous version, but with a bit of effort we should be able to speed it up considerably. Here are some opportunities:

  • we normalize a definition every time we look it up in the context,
  • explicit substitutions tend to cancel out, and it is a good idea to look for common special cases, like composition with the identity substitution,
  • there is a lot of shifting happening when we look things up in the context, perhaps some of those could be avoided

If anyone wants to work on these, I would be delighted to make a pull request.

I really have to do some serious math and stop playing around, so do not expect the next part anytime soon.

5 thoughts on “How to implement dependent type theory III

  1. When i implemented a dependent type checker the best test i found was to take old Coq modules (old versions that didn’t have inductive types) and run then thru the checker… Really awesome feeling when you get it to work out some complicated examples…

  2. As far as I know, it doesn’t matter if you use de Bruijn levels or de Bruijn indices. When you use indices, you have to shift the indices of unbound variables. When you use levels, you have to shift the levels of bound variables. So the two are completely dual. However, if you write an interpreter without full normalization (as in you don’t normalize under abstractions), then you never see unbound variables. Therefore you never need to shift, which is why de Bruijn indices are preferable in that situation.

  3. Might you ever want to include a notation for the explicit substitutions in the input syntax? For instance, does it make sense to think of “let x := e in …” as an explicit substitution?

  4. I’m implementing an explicitly substituting lambda-calculus in form of self-rewriting tree. The goal is to make almost everything (module loading, function definition, program parsing + desugaring) on the core var-val-fun-app language.

    As Ali say, there are no unbounds (and the funarg-problem with tail-rec optimization doesn’t exist too) – but I will disagree about shifts: I cannot imagine what to do with “(//0 1) (/0)”. Being reduced, it becomes “(/ (/0) 0)” and we need to perform real or virtual shifting to distinct these pointers-at-zero.

    I want to ask, is there any significant difference between de Brujin indices and named bindings? Seems like in both cases we use distinct tokens (ints and strings) to determine, which hole suits for this value, but in case of names we need no shift at all.

    By the way, does anyone know any method to optimize explicit substitution other that store at every tree node a set of not-yet-bound varnames?

  5. Hi Mike, it sure does make sense to have lets for explicit substitutions. If you want to see a categorical model for a calculus of explicit substitutions (in the style of Abadi e at’s lambda-sigma) check Categorical Models of Explicit Substitutions (with Neil Ghani and Eike Ritter), Proc FOSSACS-99, LNCS 1578, Springer-Verlag, 1999, from http://www.cs.bham.ac.uk/~vdp/publications/papers.html.

Leave a Reply

Your email address will not be published. Required fields are marked *

2 + 7 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>