# Formal proofs are not just deduction steps

I have participated in a couple of lengthy discussions about formal proofs. I realized that an old misconception is creeping in. Let me expose it.

In traditional mathematical logic (by which I mean first-order logic, as established by Hilbert, Ackermann, Skolem, Gödel and others) the concepts of logical formula and formal proof are the central notions. This is so because the main goal of traditional logic is the meta-mathematical study of provability, i.e., what can be proved in principle. Other concerns, such as what can be computed in principle, are relegated to other disciplines, such as computability theory.

It is too easy to forget that mathematical logic is only an idealization of what mathematicians actually do. Indeed, a bizarre reversal has occurred in which mathematicians have adopted the practice of dressing up their activity as a series of theorems with proofs, even when a different kind of presentation is called for. Definitions are allowed but seen as just convenient abbreviations, and logicians enforce this view with the Conservativity theorem. Some even feel embarrassed about placing too much motivation and explanatory text in between the theorems, and others are annoyed by a speaker who spends a moment on motivation instead of plunging right into a series of unexplained technical moves.

To show what I am talking about let us consider a typical situation when the theorem-proof form is inappropriate. Often we see a statement and a proof of the form

Theorem: There exists a gadget $x$ such that $\phi(x)$.

Proof. We construct $x$ as follows. (An explicit construction $C$ is given). QED

but then the rest of the the text clearly refers to the particular construction $C$ from the proof. At a formal level this is wrong because the theorem states $\exists x . \phi(x)$ and it therefore abstracts away the construction in the proof (this is not about excluded middle at all, in case you are wondering). Whatever is done inside the proof is inaccessible because proofs are irrelevant.

Lately Vladimir Voevodsky has been advocating a different style of writing in which we state Problems which are then solved by giving constructions (see for instance page 3 here). This is a strict generalization of traditional logic because a theorem with a proof can be seen as the problem/construction “construct the proof of the given statement”. Vladimir Voevodsky may have been motivated by Martin-Löf’s type theory, where this is the common view, but let us also note that Euclid did it as well. Remembering Euclid and paying attention to Martin-Löf’s teaching is a very positive development, but is not the one I would like to talk about.

Another crucial component of mathematical activity which is obscured by traditional logic is computation. Traditional logic, and to some extent also type theory, hides computation behind equality. Would you like to compute $2 + 2$? Just make a series of deduction steps whose conclusion is $2 + 2 = 4$. But how do we know what we are supposed to prove if we have not calculated the result yet? Computation is not about proving equalities, it is a process which leads from inputs to outputs. Moreover, I claim that computation is a fundamental process which requires no expression in terms of another activity, nor does it need an independent justification.

Another word for computation is manipulation of objects. Even in traditional logic we must admit that before logic itself comes manipulation of syntax. One has to be able not only to build and recognize syntactic objects, but also manipulate them in non-trivial ways by performing substitution. Once substitution is on the table we’re only a step away from $\lambda$-calculus.

The over-emphasis on formal derivations is making difficult certain discussions and design decisions about computer-verified mathematics. Some insist that formal derivations must be accessible, either explicitly as objects stored in memory or implicitly through applications of structural recursion, for independent proof-checking or proof transformations. I think this is fine as far as derivations and constructions go, but let us not forget computation. It is a design error to encode computations as chains of equations glued together by applications of transitivity. An independent verification of a computation involves independently re-running the computation – not verifying someone else’s trace of it encoded as a derivation. A transformation of a computation is not a transformation of a chain of equations – it is something else, but what? I am not sure.

Once computation is recognized as essential, irreducible and fundamental, we can start asking the right questions:

1. What is computation in general?
2. What form of computation should be allowed in proof checkers?
3. How do we specify computation in proof objects so that it can be independently verified by proof checkers?

We have a pretty good idea about the answer to the first question.

A good answer to the second question seems difficult to accept. Several modern proof assistants encode computation in terms of normalization of terms, which shows that they have not quite freed themselves from the traditional view that computation is about proving equalities. If we really do believe that computation is basic then proof checkers should allow general and explicit computation inside the trusted core. After all, if you do not trust your own computer to compute correctly, why would you trust it to verify proofs?

The third question is about design. Coq has Mtac, HOL and Andromeda essentially are metalevel programming languages, and Agda has rewrite rules. I suppose I do not have to explain my view here: there is little reason to make the user jump through hoops by having them encode computation as normalization, or application of tactics, or whatnot. Just give them a programming language!

Lest someone misunderstands me, let me conclude by a couple of disclaimers.

First, I am not saying that anything was wrong with the 20th century logic. It was amazing, it was a revolution, a pinnacle of human achievement. It’s just that the current century (and possible all the subsequent ones) belongs to computers. The 20th century logicians thought about what can be formally proved in principle, while we need to think about how to formally prove in practice.

Second, I am not advocating untrusted or insecure proof checkers. I am advocating flexible trusted proof checkers that allow users a direct expression of their mathematical activity, which is not possible as long as we stick to the traditional notion of formal derivation.

Supplemental: I think I should explain a bit more precisely how I imagine basic computations would be performed in a trusted kernel. A traditional kernel checks that the given certificate is valid evidence of derivability of some judgment. (Note: I did not say that kernels check formal derivations because they do not do that in practice. Not a single one I know.) For instance, in Martin-Löf type theory a typing judgment $\Gamma \vdash e : A$ contains enough information to decide whether it is derivable, so it can be used as a certificate for its own derivability. Now, sometimes it makes sense to compute parts of the judgment on the fly (typically $e$) instead of giving it explicitly, for various reasons (efficiency, modularity, automation). In such cases it should be possible to provide a program $p$ which computes those parts, and the kernel should know how to run $p$. (It is irrelevant whether $p$ is total,  but that is a separate discussion.) There is of course the  question of how we can trust computations. There are in fact several such questions:

1. Can we trust the kernel to faithfully execute programs? For instance, if the kernel uses the CPU to compute sums of 64-bit integers, can that be trusted? And what if the language interpreter has a bug? This is the same sort of trust as general trust in the kernel, so it is not really new: in order to know that the kernel works correctly we need to certify all components that it depends on (the CPU, the operating system, the compiler used to compile the kernel, the source code of the kernel, etc.)
2. Can the programs executed by the kernel perform illegal instructions that corrupt the it or trick it into doing something bad? This is a standard question about programming languages that is addressed by safety theorems.
3. Can we trust that the given program $p$ actually computes the intended object? In some situations this question is irrelevant because the evidence will be checked later on anyway. An example of this would be a program which computes (parts of) a witness $(a,b)$ for a statement $\sum_{x : A} B(x)$. We do not care where $(a,b)$ came from because the kernel is going to use them as certificates of $\sum_{x : A} B(x)$ and discover potential problems anyhow. In other situations we are very much interested in knowing that the program does the right thing, but this is a standard situation as well: if you need to know that your program works correctly, state and prove the correctness criterion.

So I think there’s nothing new or fishy about trust and correctness in what I am proposing. The important thing is that we let the kernel run arbitrary programs that the user can express directly the way programs are normally written in an general-purpose programming language. Insisting that computation take on a particular form (a chain of equations tied together by transitivity, prolog-like proof search, a confluent and terminating normalization procedure) is ultimately limiting.

## 24 thoughts on “Formal proofs are not just deduction steps”

1. I wonder what meaning to attach to this bit: “If we really do believe that computation is basic then proof checkers should allow general and explicit computation inside the trusted core. After all, if you do not trust your own computer to compute correctly, why would you trust it to verify proofs?”

For example, I trust my computer to compute, for two 32 binary natural numbers x and y, a third one, z, such that x+y is congruent to z modulo 2^32. Do you have in mind that we would manually link that computational ability to an appropriate formalization?

2. Computation, generally construed, can be non-deterministic. In the case of non-deterministic computation, if someone gives you the input and the output you also need the trace in order to reproduce the computation. Otherwise, reproducing the computation may require an exponential search. There is (almost) nothing wrong to look at computation as a trace of equations. What is missing is the direction: computation rules are equations plus a direction of evaluation. So in general a computation can (and should) be presented as a sequence of inequations. If the computation is a mix of deterministic and non-deterministic rules then the only inequations that cannot be easily recovered are the non-deterministic ones.

1. Why does computation have to be deterministic? For instance, suppose Coq impliements a multi-core kernel which runs things in parallel. That is going to be non-deterministic. Of course, my entire post is devoted to saying that you do not need the entire trace. What does the trace buy you, exactly? It does not buy you trust, nor does it buy you efficiency. You say “if somebody gives you the input and the output” but I do not understand where that came from. I explicitly talk about “somebody giving you a program to run”.

The argument that non-deterministic computation may be exponential is bogus. Deterministic computation can be exponential as well.

Non-determinstic computation may be quite desirable, especially in situations where you do not care which witness is found. Correctness of such computations is a separate issue which needs to be addressed – but certainly not by forcing everything to be deterministic.

As for computation being a chain of equations: everything is wrong with that because it prevents us from thinking about computation as an independent concept. Consequently, we do not have a good notion of transformation of computation (like we do for inductively defined structures such as formal derivations).

In a non-deterministic computation I can assert that some value Y can be produced by the program X but just by running X you cannot check the truth of my assertion. The only way you can check it is if I tell you what the choices whenever nondeterminism kicked in.

Do you really dispute this?

1. There are many things you can state as a criterion for a non-deterministic computation: there exists a value such that $\phi$, all values satisfy $\phi$, and so on (you know this better than I do). Your example is “it is possible that the value will be $Y$” if I understand you correctly. I am not making any assumptions about what is or should be possible to state about computations. I am just saying that computations should be there. Even computations for which no correctness criterion is stated can still be very useful.

1. Matt Oliveri says:

I think Dan Ghica is successfully refuting a very particular thing you said:

An independent verification of a computation involves independently re-running the computation – not verifying someone else’s trace of it encoded as a derivation.

In case “verifying a computation” means confirming that foo is a possible outcome, and the computation is nondeterministic, re-running the computation is a bad way to verify it.

Of course this doesn’t mean nondeterministic computations cannot be used, it just means that possible outcomes are generally hard to find. So a proof that an outcome of a computation is possible should not just be the computation itself.

2. I think you’re right. I should have said: “An independent performance of a computation involves indepndently re-running the computation – not verifying someone else’s trace of it encoded as a derivation.” My point is that computation just is, without any verification/condition/correctness/guarantee attached to it. And in this pure form, computation is useful. Let me give you an example. I will give you an example.

I am thinking of a number. I will tell you which number it is. It’s the one that is printed by the program:

k = 0
n = 1000000000000
while n <> 1:
if n % 2 == 0: n = n / 2
else: n = 3 * n + 1
k = k + 1
print(k)


Now we are both thinking of the same number. Now, in this particular case there are other, more efficient ways of communicating the same number, but that is not going to be the case in general. The most efficient way to communicate numbers (or any kind of information) is to send programs.

3. If we really do believe that computation is basic then
proof checkers should allow general and explicit computation inside
the trusted core.

Certainly not!

I have moved away from the LCF/Nuprl style[*] and now actively prefer
what Bob would call axiomatic or formal type theory with a normalizing
notion of judgmental equality.

To be usable in practice, a proof assistant needs a bunch of
convenience features like type inference, pattern matching,
idea to implement these features by LCF style elaboration on top of a
Nuprl-style core. This approach works great for verifying that a
complete proof is correct, but most of the time, our “proof” objects
are either incomplete or actively wrong. So we need very strong
guarantees about broken proofs even more than we do for correct
proofs.

Type inference offers a very good example of the issues
involved. Suppose we are doing type inference, and know that a
variable has type A ? B and that it is being used at X ? Y. To check
this, we must check for the judgmental equality of these types, and
algorithmically we want to proceed by checking that A = X and B = Y.
That is, we want judgemental equality to be injective for the function
type constructor, which in Nuprl is obviously false: for example,
consider (x:0) ? String and (x:0) ? Bool, which are equated in Nuprl’s
model of the universe.

In contrast, in pure MLTT, the injectivity of the ?-former is an
admissible property of judgemental equality. As a result, a type
inference procedure that relies upon this property will be complete
(at least in this respect), and so a failure of type inference will
reflect a genuine type error rather than a mere failure of a
heuristic.

The fragility resulting from heuristic tactics is not hypothetical:
George Gonthier’s proof of the four colour theorem no longer builds on
modern versions of Coq, due to small changes in how elaboration
works. (Still more inconceivable would be porting his proof to a
system like Matita, despite it having the same kernel logic as Coq!)

Basically, for a proof to be bearable, we need a bunch of convenience
features. Specifying these features in a reasonably declarative way
requires specifying when they *don’t* work just as much as when they
do, and doing this in a implementable way requires a bunch of
admissible properties from the kernel type theory, most of which do
not hold in Nuprl-style calculi.

[*] I have not, of course, rejected Nuprl-style semantics. For
example, it would be impossible to prove the decidability of
typechecking for the formal theories I prefer without logical
relations/realizability models. (Alas, usually two such models!)

1. You misunderstand me. Which bit of what I wrote made you think I am asking for NuPrl-style approach? So all of what you say about not wanting NuPrl-style is misdirected. In fact, I am for the axiomatic approach, as you call it. When I say that the trusted kernel must be able to run code that does not imply that the formalism is somehow tied to the code. Perhaps the code is there for other purposes, for instance to implement special-purpose decision procedures and such.

Regarding fragility of formal developements and non-portability of proof scripts: when we know how to solve that problem we’ll also know how to solve many, many other problems in software engineering.

1. Matt Oliveri says:

I think Neel was arguing against a Nuprl-style formal system, based on his example of injectivity of (->) not being admissible. (Computational type theory is a semantic approach, but Nuprl still has a formal system for the intended semantics.) Is that rule admissible in ETT? I would guess not, in which case his argument is not completely off the mark. I guess he’s assuming that an LCF-style proof assistant is only desirable for certain formal systems, which means that you may want to avoid certain formal systems in order to avoid an LCF-style assistant.

(I wouldn’t agree with this assumption.)

2. Hi Andrej, what made me think of the Nuprl approach is this comment from your post:

I suppose I do not have to explain my view here: there is little reason to make the user jump through hoops by having them encode computation as normalization, or application of tactics, or whatnot. Just give them a programming language!

I disagree with this, because I think there are very good reasons to encode computation as normalization: by keeping full computation out of the kernel, we get stronger properties for it (such as the decidability of typechecking), which makes giving reasonable declarative specifications of important features like type inference feasible.

Regarding fragility of formal developements and non-portability of proof scripts: when we know how to solve that problem we’ll also know how to solve many, many other problems in software engineering.

We do actually have a pretty good idea of how to solve such problems — type theory!

1. Matt Oliveri says:

I disagree with this, because I think there are very good reasons to encode computation as normalization: by keeping full computation out of the kernel, we get stronger properties for it (such as the decidability of typechecking), which makes giving reasonable declarative specifications of important features like type inference feasible.

I think you’re confusing properties of the logic with properties of the proof assistant. A logic may or may not have decidable judgments; a proof assistant may or may not allow arbitrary computation in the kernel. But maybe I’m being too pedantic; they seem to be correlated.

Do you dislike undecidable type-checking? Do you have any defense of decidable type-checking besides “feasible” type inference? Do you have an argument that type inference cannot be “feasible” for any type system that’s not decidable? Or was decidability just an example of a property you get by crippling the computational content of type theory? (These are not rhetorical questions, and I don’t mean to pick on you. This is the first time I’ve run into anyone who seems willing to defend decidable MLTT in a discussion.)

1. I don’t know if I dislike undecidable typechecking or not — that seems too broad a question to me. Based on what Andrej is saying, though, I have misunderstood the point of his post sufficiently that this discussion would be off-topic. Why don’t you email me your questions, and I can turn it into an interview/blog post?

2. You still misunderstand what I am saying. All of what I am saying is completely agnostic about what particular formalism you want to use. Perhaps this will help: I am saying, among other things, that the user should be allowed to write programs that generate (missing parts of) terms which then get type-checked by the kernel. Perhaps that’ll help clear up the confusion.

The requirement that the kernel runs a terminating algorithm on its input (whatever it is) is bogus. There might be some advantage to knowing that the kernel’s object type theory has decidable type checking, but that is orthogonal to termination properties of the kernel, because the kernel is much more than just a decision procedure for type checking.

Look, this is all very practical. People “invent” techniques for Agda and Coq where they proudly present some trickery that makes Agda normalize less stuff and therefore it works faster (various forms of reflection, for instance). It may all be very clever, but it begs the question: are they not just fighting the built-in normalization procedure?

1. You still misunderstand what I am saying. All of what I am saying is completely agnostic about what particular formalism you want to use. Perhaps this will help: I am saying, among other things, that the user should be allowed to write programs that generate (missing parts of) terms which then get type-checked by the kernel.

I do indeed disagree with this. If you need to write a program which generates proof terms, then the proof assistant’s logic has failed to match a natural mode of reasoning. What about new logics? Well, you basically answer the question yourself:

People “invent” techniques for Agda and Coq where they proudly present some trickery that makes Agda normalize less stuff and therefore it works faster (various forms of reflection, for instance). It may all be very clever, but it begs the question: are they not just fighting the built-in normalization procedure?

All this amounts to is building a model for some little logic, working internally for a bit, and then taking an external view at the end. You and I do all the time on paper (for instance, we feel free to define functions with lambda-notation, because Set is a CCC). I think this is a reasonable thing to do, and should actually be more convenient than it currently is in Agda/Coq.

In fact, I wish people did it more on paper, too — eg, see Terry Tao’s notes on probability theory, which read to me as if he really just wants to work in the internal logic of the presheaf category on the category of probability spaces and extensions. (It really looks like the semantics of Algol!)

2. There is no logic which matches every possible natural mode of reasoning. We are therefore either doomed to have many proof assistants, or to have extensible proof assistants. If the latter, what for do such extensions take? Building internal models inside some logic is just one way of providing such flexibility. Another is to let the user describe to the proof assistant how to do things. That’s computation.

3. Matt Oliveri says:

We are therefore either doomed to have many proof assistants, or to have extensible proof assistants. If the latter, what [form] do such extensions take? Building internal models inside some logic is just one way of providing such flexibility.

How do internal models extend the proof assistant? It seems like you can always do this–modulo size issues–and it doesn’t affect proof assistant functionality.

Indeed, it seems that with non-extensible proof assistants, working internally to some defined logic is a prime example of something that’s harder than it ought to be. For example, setioid hell is working internally to the setoid model the hard way.

2. Matt Oliveri says:

However, I think it is a bad idea to implement these features by LCF style elaboration on top of a Nuprl-style core.

First of all, I think you got Andrej all confused by mentioning Nuprl. We’re talking about choice of proof system technology, not choice of theories or models. Nuprl is hardly ever brought up for being LCF-style, since it really kind of isn’t (it builds proof trees), and since the HOLs are the famous, modern LCF-style systems.

“LCF style elaboration” technically makes no sense, since “LCF-style” describes the style of trusted kernel, and has nothing to do with elaboration. Perhaps you meant elaboration like with Ltac-style tactics, which build some other proof format. Andrej’s not talking about that.

4. Louis Garde says:

If I understand you correctly, you are saying that we need to address more seriously the consequences of the Curry-Howard correspondance identifying proofs as programs.

The list of execution steps of a program is a very restrictive view of what the program is; reducing proofs to deductions steps would be a very restrictive limitation of what a proof can be.

1. No, you completely misunderstand me. This has nothing to do with the Curry-Howard correspondence. I would really like to know (from you and from Neel) what in the blog post makes you think I am talking about a particular formal system, or a family of them. Everything I say is completely uncommitted with respect to the underlying formalism. It applies equally well to an intuitionistic and classical setting.

1. Louis Garde says:

In my understanding, your post identifies 3 sorts of formal proofs: proofs as derivations (deduction steps), as constructions, and as computation : for instance, you wrote “I think this is fine as far as derivations and constructions go, but let us not forget computation”.
Derivations, constructions and computation are here about proofs, we are working in the meta-theory: the proof is an object, we want not only to prove a proposition, but also to prove that the proof is correct.
The meta-proof that a proof is correct can be done by checking derivation steps, or constructions.
But in a more general way, a meta-proof is a meta-program (by the Curry Howard correspondence), that is to say a program that is able to build a correct proof.
Limiting meta-proofs to the checking of deduction steps is therefore restrictive: this is how I understand the title of your post.

Does that make sense ?

5. TLA+ lets you to logically assert that a computation (written in TLA+, as a temporal-logic-of-actions formula, the model of which is a set of traces) behaves in a certain way (computes a certain result, terminates within so many steps, deadlocks etc.). Computation as such — not just as a means to reduce to an equality — is part of the core logic. Instead of proving the result deductively, you can compute it in the model checker. The process isn’t fully automatic — i.e., you still need to write: “PROOF OMITTED” and add a comment saying “check in the model checker”, but that’s a UI issue.

6. Günter Rote says:

This is a remark completely beside the main topic of this discussion and concerns only the introductory example of the post, regarding WRITING STYLE.
Clearly it is bad style to refer to the particulars of the proof when *using* a lemma/result. It is not just formally wrong, but it is also an annoyance for the reader.

1. Would the proposed “Problem/Construction” style change anything? Would it now be OK to refer to details of the Construction from “outside”?

2. I have been wondering why such “existential” statements like in the example are so ubiquitous. I would personally tend to replace “Theorem: There exists a gadget x such that ?(x).” by the statement “Construction C produces a gadget x such that ?(x)” because it is more specific, and it also states the constructive aspect of the result (as opposed to a purely existential result). But maybe Construction C is only of local interest for this theorem and does not deserve a name?
There seems to be no good tradition of how to write algorithmic (and constructive) results, and as Andrej points out, the enshrined Theorem-Proof style does cater well for this situation. For curiosity, I have checked a recent paper of mine, and we wrote “Theorem 1. We can decide … in O(n log n) time and O(n) space”, and in the arxiv version we did not even state a theorem: “We solve this problem in O(n log n) time in 4 dimensions.” The “Problem/Construction” style would also not work here, because the “Construction” is the main result of the paper, spanning many pages.