**Free access to the paper: ** Five stages of accepting constructive mathematics (PDF)

The purpose of the zoo is to demonstrate design and implementation techniques, from dirty practical details to lofty theoretical considerations:

- functional, declarative, object-oriented, and procedural languages
- source code parsing with a parser generator
- recording of source code positions
- pretty-printing of values
- interactive shell (REPL) and non-interactive file processing
- untyped, statically and dynamically typed languages
- type checking and type inference
- subtyping, parametric polymorphism, and other kinds of type systems
- eager and lazy evaluation strategies
- recursive definitions
- exceptions
- interpreters and compilers
- abstract machine

There is still a lot of room for improvement and new languages. Contributions are welcome!

]]>

In traditional mathematical logic (by which I mean first-order logic, as established by Hilbert, Ackermann, Skolem, Gödel and others) the concepts of *logical formula* and *formal **proof* are the central notions. This is so because the main goal of traditional logic is the meta-mathematical study of *provability,* i.e., what can be proved in principle. Other concerns, such as what can be computed in principle, are relegated to other disciplines, such as computability theory.

It is too easy to forget that mathematical logic is only an idealization of what mathematicians actually do. Indeed, a bizarre reversal has occurred in which mathematicians have adopted the practice of dressing up their activity as a series of theorems with proofs, even when a different kind of presentation is called for. Definitions are allowed but seen as just convenient abbreviations, and logicians enforce this view with the Conservativity theorem. Some even feel embarrassed about placing too much motivation and explanatory text in between the theorems, and others are annoyed by a speaker who spends a moment on motivation instead of plunging right into a series of unexplained technical moves.

To show what I am talking about let us consider a typical situation when the theorem-proof form is inappropriate. Often we see a statement and a proof of the form

Theorem:There exists a gadget $x$ such that $\phi(x)$.Proof. We construct $x$ as follows. (An explicit construction $C$ is given). QED

but then the rest of the the text clearly refers to the particular construction $C$ from the proof. At a formal level this is wrong because the theorem states $\exists x . \phi(x)$ and it therefore *abstracts away* the construction in the proof (this is *not* about excluded middle at all, in case you are wondering). Whatever is done inside the proof is inaccessible because proofs are irrelevant.

Lately Vladimir Voevodsky has been advocating a different style of writing in which we state *Problems* which are then solved by giving *constructions* (see for instance page 3 here)*.* This is a strict generalization of traditional logic because a theorem with a proof can be seen as the problem/construction “construct the proof of the given statement”. Vladimir Voevodsky may have been motivated by Martin-Löf’s type theory, where this is the common view, but let us also note that Euclid did it as well. Remembering Euclid and paying attention to Martin-Löf’s teaching is a very positive development, but is not the one I would like to talk about.

Another crucial component of mathematical activity which is obscured by traditional logic is *computation*. Traditional logic, and to some extent also type theory, hides computation behind equality. Would you like to compute $2 + 2$? Just make a series of deduction steps whose conclusion is $2 + 2 = 4$. But how do we know what we are supposed to prove if we have not calculated the result yet? Computation is *not* about proving equalities, it is a *process* which leads from inputs to outputs. Moreover, I claim that computation is a *fundamental* process which requires no expression in terms of another activity, nor does it need an independent justification.

Another word for computation is *manipulation of objects*. Even in traditional logic we must admit that before logic itself comes manipulation of syntax. One has to be able not only to build and recognize syntactic objects, but also manipulate them in non-trivial ways by performing substitution. Once substitution is on the table we’re only a step away from $\lambda$-calculus.

The over-emphasis on formal derivations is making difficult certain discussions and design decisions about computer-verified mathematics. Some insist that formal derivations must be accessible, either explicitly as objects stored in memory or implicitly through applications of structural recursion, for independent proof-checking or proof transformations. I think this is fine as far as derivations and constructions go, but let us not forget computation. It is a design error to encode computations as chains of equations glued together by applications of transitivity. An independent verification of a computation involves independently re-running the computation – not verifying someone else’s trace of it encoded as a derivation. A transformation of a computation is not a transformation of a chain of equations – it is something else, but what? I am not sure.

Once computation is recognized as essential, irreducible and fundamental, we can start asking the right questions:

*What is computation in general?**What form of computation should be allowed in proof checkers?**How do we specify computation in proof objects so that it can be independently verified by proof checkers?*

We have a pretty good idea about the answer to the first question.

A good answer to the second question seems difficult to accept. Several modern proof assistants encode computation in terms of normalization of terms, which shows that they have not quite freed themselves from the traditional view that computation is about proving equalities. If we really do believe that computation is basic then proof checkers should allow *general* and *explicit* computation inside the trusted core. After all, if you do not trust your own computer to compute correctly, why would you trust it to verify proofs?

The third question is about design. Coq has Mtac, HOL and Andromeda essentially *are* meta–level programming languages, and Agda has rewrite rules. I suppose I do not have to explain my view here: there is little reason to make the user jump through hoops by having them encode computation as normalization, or application of tactics, or whatnot. Just give them a programming language!

Lest someone misunderstands me, let me conclude by a couple of disclaimers.

First, I am *not* saying that anything was wrong with the 20th century logic. It was amazing, it was a revolution, a pinnacle of human achievement. It’s just that the current century (and possible all the subsequent ones) belongs to computers. The 20th century logicians thought about what *can be formally proved in principle*, while we need to think about *how to formally prove in practice*.

Second, I am *not* advocating untrusted or insecure proof checkers. I am advocating *flexible* trusted proof checkers that allow users a *direct expression* of their mathematical activity, which is not possible as long as we stick to the traditional notion of formal derivation.

**Supplemental:** I think I should explain a bit more precisely how I imagine basic computations would be performed in a trusted kernel. A traditional kernel checks that the given certificate is valid evidence of derivability of some judgment. (Note: I did not say that kernels check formal derivations because they do not do that in practice. Not a single one I know.) For instance, in Martin-Löf type theory a typing judgment $\Gamma \vdash e : A$ contains enough information to decide whether it is derivable, so it can be used as a certificate for its own derivability. Now, sometimes it makes sense to compute parts of the judgment on the fly (typically $e$) instead of giving it explicitly, for various reasons (efficiency, modularity, automation). In such cases it should be possible to provide a program $p$ which computes those parts, and the kernel should know how to run $p$. (It is irrelevant whether $p$ is total, but that is a separate discussion.) There is of course the question of how we can trust computations. There are in fact several such questions:

*Can we trust the kernel to faithfully execute programs?*For instance, if the kernel uses the CPU to compute sums of 64-bit integers, can that be trusted? And what if the language interpreter has a bug? This is the same sort of trust as general trust in the kernel, so it is not really new: in order to know that the kernel works correctly we need to certify all components that it depends on (the CPU, the operating system, the compiler used to compile the kernel, the source code of the kernel, etc.)*Can the programs executed by the kernel perform illegal instructions that corrupt the it or trick it into doing something bad?*This is a standard question about programming languages that is addressed by safety theorems.*Can we trust that the given program $p$ actually computes the intended object?*In some situations this question is irrelevant because the evidence will be checked later on anyway. An example of this would be a program which computes (parts of) a witness $(a,b)$ for a statement $\sum_{x : A} B(x)$. We do not care where $(a,b)$ came from because the kernel is going to use them as certificates of $\sum_{x : A} B(x)$ and discover potential problems anyhow. In other situations we are very much interested in knowing that the program does the right thing, but this is a standard situation as well: if you need to know that your program works correctly, state and prove the correctness criterion.

So I think there’s nothing new or fishy about trust and correctness in what I am proposing. The important thing is that we let the kernel run arbitrary programs that the user can express directly the way programs are normally written in an general-purpose programming language. Insisting that computation take on a particular form (a chain of equations tied together by transitivity, prolog-like proof search, a confluent and terminating normalization procedure) is ultimately limiting.

]]>

Just as Mike, I am discussing here formal proofs from the point of view of proof assistants, i.e., what criteria need to be satisfied by the things we call “formal proofs” for them to serve their intended purpose, which is: to convince machines (and indirectly humans) of mathematical truths. Just as Mike, I shall call a (formal) proof a *complete* derivation tree in a formal system, such as type theory or first-order logic.

What Mikes calls an *argument* I would prefer to call a *proof representation*. This can be any kind of concrete representation of the actual formal proof. The representation may be very indirect and might require a lot of effort to reconstruct the original proof. Unless we deal with an extremely simple formal system, there is always the possibility to have *invalid representations*, i.e., data of the correct datatype which however does not represent a proof.

I am guaranteed to reinvent the wheel here, at least partially, since many people before me thought of the problem, but here I go anyway. Here are (some) criteria that formal proofs should satisfy:

**Reproducibility:**it should be possible to replicate and communicate proofs. If I have a proof it ought to be possible for me to send you a copy of the proof.**Objectivity:**all copies of the same proof should represent the same piece of information, and there should be no question what is being represented.**Verifiability:**it should be possible to recognize the fact that something is a proof.

There is another plausible requirement:

**Falsifiability:**it should be possible to recognize the fact that something is*not*a proof.

Unlike the other three requirements, I find falsifiability questionable. I have received too many messages from amateur mathematicians who could not be convinced that their proofs were wrong. Also, mathematics is a cooperative activity in which mistakes (both honest and dishonest) are easily dealt with – once we expand the resources allocated to verifying a proof we simply give up. An adversarial situation, such as proof carrying code, is a different story with a different set of requirements.

The requirements impose conditions on how formal proofs in a proof assistant might be designed. Reproducibility dictates that proofs should be easily accessible and communicable. That is, they should be pieces of digital information that are commonly handled by computers. They should not be prohibitively large, of if they are, they need to be suitably compressed, lest storage and communication become unfeasible. Objectivity is almost automatic in the era of crisp digital data. We will worry about Planck-scale proof objects later. Verifiability can be ensured by developing and implementing algorithms that recognize correct representations of proofs.

This post grew out of a comment that I wanted to make about a particular statement in Mike’s post. He says:

“… for a proof assistant to honestly call itself an

implementationof that formal system, it ought to include, somewhere in its internals, some data structure that represents those proofs reasonably faithfully.”

This requirement is too stringent. I think Mike is shooting for some combination of reproducibility and verifiability, but explicit storage of proofs in raw form is only one way to achieve them. What we need instead is *efficient communication* and *verification *of (communicated) proofs. These can both be achieved without storage of proofs in explicit form.

Proofs may be stored and communicated in implicit form, and proof assistants such as Coq and Agda do this. Do not be fooled into thinking that Coq gives you the “proof terms”, or that Agda aficionados type down actual complete proofs. Those are not the derivation trees, because they are missing large subtrees of equality reasoning. Complete proofs are too big to be communicated or stored in memory (or some day they will be), and little or nothing is gained by storing them or re-verifying their complete forms. Instead, it is better to devise compact representations of proofs which get *elaborated* or *evaluated* into actual proofs on the fly. Mike comments on this and explains that Coq and Agda both involve a large amount of elaboration, but let me point out that even the elaborated stuff is still only a shadow of the actual derivation tree. The data that gets stored in the Coq .vo file is really a bunch of instructions for the proof checker to easily reconstruct the proof using a specific algorithm. The *actual* derivation tree is implicit in the execution trace of the proof checker, stored in the space-time continuum and inaccessible with pre-Star Trek technology. It does not matter that we cannot get to it, because the whole process is replicable. If we feel like going through the derivation tree again, we can just run the proof assistant again.

I am aware of the fact that people strongly advocate some points which I am arguing against, two of which might be:

- Proofs assistants must provide proofs that can be independently checked.
- Proof checking must be
*decidable*, not just*semi-decidable.*

As far as I can tell, nobody actually subscribes to these in practice. (Now that the angry Haskell mob has subsided, I feel like I can take a hit from an angry proof assistant mob, which the following three paragraphs are intended to attract. What I *really* want the angry mob to think about deeply is how their professed beliefs match up with their practice.)

First, nobody downloads compiled .vo files that contain the proof certificates, we all download other people’s original .v files and compile them ourselves. So the .vo files and proof certificates are a double illusion: they do not contain actual proofs but half-digested stuff that may still require a lot of work to verify, and nobody uses them to communicate or verify proofs anyhow. They are just an optimization technique for faster loading of libraries. The *real* representations of proofs are in the .v files, and those can only be *semi-*checked for correctness.

Second, in practice it is irrelevant whether checking a proof is decidable because the elaboration phase and the various proof search techniques are possibly non-terminating anyhow. If there are a couple of possibly non-terminating layers on top of the trusted kernel, we might as well let the kernel be possibly non-terminating, too, and instead squeeze some extra expressivity and efficiency from it.

Third, and still staying with decidability of proof checking, what actually *is* annoying are uncontrollable or unidentifiable sources of inefficiency. Have you ever danced a little dance around Coq or Agda to cajole its *terminating* normalization procedure into finishing before getting run over by Andromeda? Bow to the gods of decidable proof checking.

It is far more important that *cooperating* parties be able to communicate and verify proofs efficiently, than it is to be able to tell whether an *adversary* is wasting our time. Therefore, proofs should be, and in practice are communicated in the most flexible manner possible, as programs. LCF-style proof assistants embrace this idea, while others move slowly towards it by giving the user ever greater control over the internal mechanisms of the proof assistant (for instance, witness Coq’s recent developments such as partial user control over the universe mechanism, or Agda’s rewriting hackery). In an adversarial situations, such as proof carrying code, the design requirements for formalized proofs are completely different from the situation we are considering.

We do not expect humans to memorize every proof of every mathematical statement they ever use, nor do we imagine that knowledge of a mathematical fact is the same thing as the proof of it. Humans actually memorize “proof ideas” which allow them to replicate the proofs whenever they need to. Proof assistants operate in much the same way, for good reasons.

]]>

Let us look at the matter a bit closer. The Haskell wiki page on Hask says:

The objects of Hask are Haskell types, and the morphisms from objects `A`

to `B`

are Haskell functions of type `A -> B`

. The identity morphism for object `A`

is `id :: A -> A`

, and the composition of morphisms `f`

and `g`

is `f . g = \x -> f (g x)`

.

Presumably “function” here means “closed expression”. It is then immediately noticed that there is a problem because the supposed identity morphisms do not actually work correctly: `seq undefined () = undefined`

and `seq (undefined . id) () = ()`

, therefore we do not have `undefined . id = undefined`

.

The proposed solution is to equate `f :: A -> B`

and `g :: A -> B`

when `f x = g x`

for all `x :: A`

. Again, we may presume that here `x`

ranges over all closed expressions of type `A`

. But this begs the question: *what does f x = g x mean?* Obviously, it cannot mean “syntactically equal expressions”. If we had a notion of observational or contextual equivalence then we could use that, but there is no such thing until somebody provides an operational semantics of Haskell. Written down, in detail, in standard form.

The wiki page gives two references. One is about the denotational semantics of Haskell, which is just a certain category of continuous posets. That is all fine, but such a category is not the syntactic category we are looking for. The other paper is a fine piece of work that uses denotational semantics to prove cool things, but does not speak of any syntactic category for Haskell.

There are several ways in which we could resolve the problem:

- If we define a notion of observational or contextual equivalence for Haskell, then we will know what it means for two expressions to be indistinguishable. We can then use this notion to equate indistinguishable morphisms.
- We could try to define the equality relation more carefully. The wiki page does a first step by specifying that at a function type equality is the extensional equality. Similarly, we could define that two pairs are equal if their components are equal, etc. But there are a lot of type constructors (including recursive types) and you’d have to go through them, and define a notion of equality on all of them. And after that, you need to show that this notion of equality actually gives a category. All the while, there will be a nagging doubt as to what it all means, since there is no operational semantics of Haskell.
- We could import a category-theoretic structure from a denotational semantics. It seems that denotational semantics of Haskell actually exists and is some sort of a category of domains. However, this would just mean we’re restricting attention to a subcategory of the semantic category on the definable objects and morphisms. There is little to no advantage of doing so, and it’s better to just stick with the semantic category.

Until someone actually does some work, **there is no Hask**! I’d delighted to be wrong, but I have not seen a complete construction of such a category yet.

Perhaps you think it is OK to pretend that something is a category when it is not. In that case, you would also pretend that the Haskell monads are actual category-theoretic monads. I recall a story from one of my math professors: when she was still a doctoral student she participated as “math support” in the construction of a small experimental nuclear reactor in Slovenia. One of the physicsts asked her to estimate the value of the harmonic series $1 + 1/2 + 1/3 + \cdots$ to four decimals. When she tried to explain the series diverged, he said “that’s ok, let’s just pretend it converges”.

**Supplemental: ** Of the three solutions mentioned above I like the best the one where we give Haskell an operational semantics. It’s more or less clear how we would do this, after all Haskell is more or less a glorified PCF. However, the thing that worries me is `seq`

. Because of it `undefined`

and `undefined . id`

are *not* observationally equivalent, which means that we cannot use observational equivalence for equality of morphisms. We could try the wiki definition: `f :: A -> B`

and `g :: A -> B`

represent the same morphisms if `f x`

and `g x`

are observationally equivalent for all closed expressions `x :: A`

. But then we need to prove something after that to know that we really have a category. For instance, I do not find it obvious anymore that programs which involve seq behave nicely. And what happens with higher-order functions, where observational equivalence and extensional equality get mixed up, is everything still holding water? There are just too many questions to be answered before we have a category.

**Supplemental II:** Now that the mob is here, I can see certain patterns in the comments, so I will allow myself replying to them en masse by supplementing the post. I hope you all will notice this. Let me be clear that I am not arguing against the usefulness of category-theoretic thinking in programming. In fact, I support programming that is informed by abstraction, as it often leads to new insights and helps gets things done correctly. (And anyone who knows my work should find this completely obvious.)

Nor am I objecting to “fast & loose” mode of thinking while investigating a new idea in Haskell, that is obviously quite useful as well. I am objecting to:

- The fact the the Haskell wiki claims there is such a thing as “the category Hask” and it pretends that everything is ok.
- The fact that some people find it acceptable to defend broken mathematics on the grounds that it is useful. Non-broken mathematics is also useful, as well as correct. Good engineers do not rationalize broken math by saying “life is tough”.

Anyhow, we do not need the Hask category. There already are other categories into which we can map Haskell, and they explain things quite well. It is ok to say “you can think of Haskell as a sort of category, but beware, there are technical details which break this idea, so you need to be a bit careful”. It is not ok to write on the Haskell wiki “Hask is a category”. Which is why I put up this blog post, so when people Google for Hask they’ll hopefully find the truth behind it.

**Supplemental III**: On Twitter people have suggested some references that provide an operational semantics of Haskell:

- John Launchbury: A natural semantics for lazy evaluation
- Alan Jeffrey: A fully abstract semantics for concurrent graph reduction

Can we use these to define a suitable notion of equality of morphisms? (And let’s forget about `seq`

for the time being.)