In a paper accepted at POPL 2016 Matt Brown and Jens Palsberg constructed a self-interpreter for System $F_\omega$, a strongly normalizing typed $\lambda$-calculus. This came as a bit of a surprise as it is “common knowledge” that total programming languages do not have self-interpreters.

Thinking about what they did I realized that their conditions allow a self-interpreter for practically any total language expressive enough to encode numbers and pairs. In the PDF note accompanying this post I give such a self-interpreter for Gödel’s System T, the weakest such calculus. It is clear from the construction that I abused the definition given by Brown and Palsberg. Their self-interpreter has good structural properties which mine obviously lacks. So what we really need is a better definition of self-interpreters, one that captures the desired structural properties. Frank Pfenning and Peter Lee called such properties **reflexivity**, but only at an informal level. Can someone suggest a good definition?

Where does this “common knowledge” come from? Is it some kind of (mis?)interpretation of the halting problem?

Nice post and pdf file, Andrej.

Mike: this common knowledge comes from the assumption that codes are in a type of natural numbers (or an isomorphic, discrete type of syntax trees), and this Andrej shows indeed not to be possible, confirming common knowledge.

The unfortunate thing about this common knowledge is that the assumptions usually aren’t made clear. For example,

https://existentialtype.wordpress.com/2014/03/20/old-neglected-theorems-are-still-theorems/

says:

“One limitation of total programming languages is that they are not universal: you cannot write an interpreter for T within T (see Chapter 9 of PFPL for a proof). More importantly, this limitation extends to any total language whatever.”

Andrej’s note and the paper of Brown and Palsberg make it clear that there are implicit assumptions in such statements. Nice work!

I’m curious about the relationship to similar statements. For example, this

https://mail.haskell.org/pipermail/haskell-cafe/2003-May/004343.html

very nice message by Conor McBride says “any total language misses some total programs”, and gives a diagonalization argument that “eval” can’t be implemented, assuming that programs are coded using the type Nat.

Thanks for this succinct note!

I have an off-topic question: I quite like the formatting in the PDF. What LaTeX package did you use? I often write down notes in LaTeX, but I haven’t been able to find a style set that I like yet.

I am using

There is another source for this, “common knowledge,” I think, which is that when people say, “self-interpreter,” they mean what the Brown-Palsberg paper calls a, “self-reducer,” which they have not defined. This requires encoding types $\mathsf{Exp}\,t$ and reduction functions $\mathsf{norm}_t : \mathsf{Exp}\,t \to \mathsf{Exp}\, t$, where each reduction function is required to yield encodings of a beta normal expression that is beta equivalent to the original encoded expression. Functions of type $\mathsf{Exp}\, t \to t$ are only half of a normalization-by-evaluation procedure for this.

Also, it seems unlikely that self-reduction is possible in total languages (regardless of whether you have one expression type or a family), because the functions $\mathsf{norm}_t$ are by hypothesis terminating functions that compute normal forms for the language. So the language has demonstrated its own weak normalization, and weak normalization implies consistency of the language interpreted as a logic. So it is dubious this is possible in an actual normalizing/consistent language.

I haven’t read PFPL, but I wouldn’t be surprised if this is the kind of thing it’s talking about, and the interpreters given in these two papers don’t actually qualify as, well, interpreters.

Sorry for the double comment….

Having thought about it some more, I think some of the “dubious” part above may actually be false intuition. For instance, if we consider Brown-Palsberg self-interpreters, we have a function $\mathsf{Exp}\, ? \to ?$ (assuming a type worthy of being called $?$ exists, which isn’t true of System T, but is true of System F?, for instance). But this looks even more obviously like, “I am consistent.” So, there must be something about dividing the expression type up that means ‘second incompleteness theorem through the lens of propositions as types’ has no force over it. So that might leave open self-reducers in the same style as well.

Of course, I also thought of an example of a ‘total language’ that, if it could self-represent (which might not be possible), would also be able to self-reduce. But it’s rather different than most total languages in use (intersection types).

Thanks Andrej ! I will give it a try.

Re Dan Christensen: it seems that the proof idea of McBride and of Bauer’s Thm. 2.2/Cor. 2.3 are very close, even though Bauer’s statement is interestingly different and the proof more elegant and general. Below are the gory details I’ve seen.

For instance, McBride $\mathsf{evil}$ is Bauer’s g, with $f = \mathsf{succ}$ inlined. If you match up the equations by renaming the matching ingredient, McBride proves $\mathsf{evil}\, 666 = \mathsf{succ} (\mathsf{evil}\, 666)$, that is (strictly speaking) becomes $f(u_{\nu \to \tau}\, n\, n) = f(f(u_{\nu \to \tau}\, n\, n))$, while Bauer uses $u_{\nu \to \tau}\, n\, n$ directly. Of course, once the proof is complete, these are all the same number, so it’s a very minor difference.

Critically, the assumed setting is different: McBride uses only what Bauer would call $u_{N \to N}: N \to N \to N$ (that is, $u_{\tau}: \nu \to \tau$ with $\nu = N$ and $\tau = N \to N$), while Bauer is more general (you could modify his proof to give only fixpoints for types $\tau$ where $u_\tau$ exists, if you wanted to do this weakening).

I think another problem with the abusive interpreter is that there no internal evidence that it actually returns the Gödel number of the term. We don’t really know if a ? × ? was created through quoting or just a random piece of data.

By the way, the same abusive interpreter was proposed [here](https://www.reddit.com/r/haskell/comments/3s93w1/a_selfinterpreter_for_fomega/cwwhn0d?context=3) before.

The definition you give of an interpreter in your linked note would disqualify anything written in “finally tagless” style as actual interpreters. [Neil Jones had the same issue with us calling our interpreters/partial evaluators by those names, for the same reason.]

But what we call an ‘interpreter’ in finally tagless works exactly as an interpreter does, observationally.

In other words, I think your definition of what an interpreter is, is too narrow.