Recently I reviewed a paper in which most proofs were done in a proof assistant. Yes, the machine guaranteed that the proofs were correct, but I still had to make sure that the authors correctly formulated their definitions and theorems, that the code did not contain hidden assumptions, that there were no unfinished proofs, and so on.

In a typical situation an author submits a paper accompanied with some source code which contains the formalized parts of the work. Sometimes the code is enclosed with the paper, and sometimes it is available for download somewhere. ** It is easy to ignore the code! **The journal finds it difficult to archive the code, the editor naturally focuses on the paper itself, the reviewer trusts the authors and the proof assistant, and the authors are tempted not to mention dirty little secrets about their code. If the proponents of formalized mathematics want to avert a disaster that could destroy their efforts in a single blow, they must adopt a set of rules that will ensure high standards. There is much more to trusting a piece of formalized mathematics than just running it through a proof checker.

Before I say anything about reviewing formalized mathematics, let me just point out that being anonymous does not give the referee the right to be impolite or even abusive. A good guiding principle is to never write anything in a review that you would not say to the author’s face. You can be harsh and you can criticize, but do it politely and ground your opinions in arguments. After all, you expect no less of the author.

Let us imagine a submission to a journal in which the author claims to have checked proofs using a computer proof assistant or some other such tool. Almost everything I write below follows from the simple observation that * the code contains proofs and that proofs are an essential part of the paper*. If as a reviewer or an editor you are ever in doubt, just imagine how you would act if the formalized part were actually an appendix written informally as ordinary math.

Here is a set of guidelines that I think should be followed when formalized mathematics is reviewed.

### The rules for the author

*Enclose the code with the paper submission.**Provide information on how to independently verify the correctness of the code.**License the code so that anyone who wants to check the correctness of the proofs is free to do so.**Provide explicit information on what parts of the code correspond to what parts of the paper.*

Comments:

- It is not acceptable to just provide a link where the code is available for download, for several reasons:
- When the anonymous reviewer downloads the code, he will give away his location and therefore very likely his identity. The anonymity of the reviewer must be respected. While there are methods that allow the reviewer to download the code anonymously, it is not for him to worry about such things.
- There is no guarantee that the code will be available from the given link in the future. Even if the code is on Github or some other such established service, in the long run the published paper is likely going to outlive the service.
- It must be perfectly clear which version of the code goes with the paper. Code that is available for download is likely going to be updated and change, which will put it out of sync with the paper. The author is of course always free to mention that the code is
*also*available on some web site.

- Without instructions on how to verify correctness of the code, the reviewer and the readers may have a very hard time figuring things out. The information provided must:
- List the prerequisites: which proof assistant the code works with and which libraries it depends on, with exact version information for all of them.
- Include step-by-step instructions on how to compile the code.
- Provide an outline of how the code is organized.

- Formalized mathematics is a form of software. I am not a copyright expert, but I suspect that the rules for software are not the same as those for published papers. Therefore, the code should be licenced separately. I strongly urge everybody to release their code under an open source license, otherwise the evil journals will think of ways to hide the code from the readers, or to charge extra for access to the code.
- The reviewer must check that all theorems stated in the paper have actually been proved in the code. To make his job possible the author should make it clear how to pair up the paper theorems with the machine proofs. It is
*not*easy for the reviewer to wade through the code and try to figure out what is what. Imagine a paper in which all proofs were put in an appendix but they were not numbered, so that the reader would have to figure out which theorem goes with which proof.

### The rules for the reviewer

*Review the paper part according to established standards.**Verify that the code compiles as described in the provided instructions.**Check that the code correctly formulates all the definitions.**Check that the code proves each theorem stated in the paper and that the machine version of the theorem is the same as the paper version.**Check that the code does not contain unfinished proofs or hypotheses that are not stated in the paper.**Review the code for quality.*

Comments:

- Of course the reviewer should not forget the traditional part of his job.
- It is absolutely critical that the reviewer independently compile the code. This may require some effort, but skipping this step is like not reading proofs.
- Because the work is presented in two separate parts, the paper and the code, there is potential for mismatch. It is the job of the reviewer to make sure that the two parts fit together. The reviewer can reject the paper if he cannot easily figure out which part of the code corresponds to which part of the paper.
- The code is part of the paper and is therefore subject to reviewing. Just because a piece of code is accepted by a proof checker, that does not automatically make it worthy. Again, think how you would react to a convoluted paper proof which were nevertheless correct. You would most definitely comment on it and ask for an improvement.

### The rules for the journal

*The journal must archive the code and make it permanently available with the paper, under exactly the same access permissions as the paper itself.*

This is an extremely difficult thing to accomplish, so the journal should do whatever it can. Here are just two things to worry about:

- It is unacceptable to make the code less accessible than the paper because the code
*is*the paper. - The printed version of the journal should have the code enclosed on a medium that lasts as long as paper.
- If the code is placed on a web site, it is easy for it do disappear in the future when the site is re-organized.

### The rules for the editor

*The editor must find a reviewer who is not only competent to judge the math, but can also verify that the code is as advertised.**The editor must make sure that the author, the journal, and the reviewer follow the rules.*

Comments:

- It may be quite hard to find a reviewer that both knows the math and can deal with the code. In such as a case the best strategy might be to find two reviewers whose joint abilities cover all tasks. But it is a very bad idea for the two reviewers to work independently, for who is going to check that the paper theorems really correspond to the computer proofs? It is not enough to just have a reviewer run the code and report back as “it compiles”.
- In particular, the editor must:
- insist that the code be enclosed with the paper
- convince the journal that the code be archived appropriately
- require that the reviewer explicitly describe to what extent he verified the code: did he run it, did he check it corresponds to the paper theorems, etc? (See the list under “The rules for the reviewer”.)

### Can we trust formalized mathematics?

I think the answer is *not without a diligent reviewing process.* While a computer verified proof can instill quite a bit of confidence in a mathematical result, there are things that the machine simply cannot check. So even though the reviewer need not check the proofs themselves, there is still a lot for him to do. Trust is in the domain of humans. Let us not replace it with blind belief in the power of machines.

This is a lot of work to do, on both sides, and I guarantee nobody’s going to work this way. The idea, however, is of course correct. Here are some UI suggestions, based on your observation that we’re talking about code here.

In the world of non-formalized-mathematics software projects, there are known practices such as continuous integration servers and cloud deployment and virtual machines, to remove the burden of all these administrative tasks done by hand. For example, it is standard practice to do something like set up a GitHub project and link it to Travis. This is just the tip of an iceberg, of course. Basically, if code for research papers is to be taken seriously, the entire process of developing it and sharing it and replicating it should be taken seriously, as seriously as developing a commercial Web app.

Well said, and useful!

@Franklin: I agree that the burden of independently compiling the code could be made a bit lighter by good infrastructure. But I still think that the reviewer needs to see with his own eyes that the code compiled. And you say that’s a lot of work. Well, reviewing

isa lot of work, when done honestly. All in all, I would say the total amount of work in checking formalized mathematics will be smaller than with checking informal mathematics. After all, you just need to check that they correctly formulated the definitions and theorems (something you do automatically when reviewing informal mathematics). On the author side, is it really too much to ask the author to tell you where you can find the proofs of the theorems?There is a good answer to the problem of preservation of the availability of the code: open archives sites. arxiv.org for example is dedicated to the long-term preservation of scientific articles, including software and mechanized proofs that are part of the work. If you publish your article and accompanying code to arxiv, it will remain accessible in the future.

Note that I am not sure this is a problem that arises at review-time. What is the issue with hosting one’s code on an institutional webpage (that is bound to become dead in a few years) for the time of the review? I recently had to consider a similar question (for software rather than a mechanical proof), and decided to wait for potential acceptance of the article before submitting the short version to arxiv (there is an accompanying research report that is already available with the code).

I think questions such as “will that piece of code remain available at the future?” become more pertinent after the article is accepted for publication — so after the work of the reviewer itself.

I’m also not quite sure about the quality standards that must be applied to the proofs in the mechanized version. You are not really precise about that point in your requirements, but you say that they “should be reviewed just as proofs on paper”, which might suggest similar clarity standards. In my experience this has been a target very difficult to meet; that could be distinguishing quality of an outstanding formalization work, but requiring that constantly could have the adverse effect of people reverting to pen&paper proofs. Something that I found works better is to preserve an informal explanation of the mechanized proof, for example in an annex of the article, that has roughly the level of detail of a good “proof sketch” of a pen&paper proof.

This last remark is also driven by pragmatics of the current publication process in communities that actively use mechanized proofs: conferences with short deadline, short page number limits, etc. This may change; I understand you’ve been careful to talk about reviewing and editing in an absolute sense, not necessarily tied to how today’s actors do things. In that perspective, I’d agree your rules would be a target to attempt to reach — but that requires process changes not only on the author’s part. In any case I think the main point, that the mechanized sources be available along with the article, is essential and must already be required today.

I did not discuss at all how to make sure that the mechanized proofs can still be checked a couple of decades from now. I think that is a separate (and important) issue that must be solved as well. Putting things on arxiv.org is of course a good idea, but I focused on the review process, so that is kind of irrelevant for this post.

But I do insist that a reviewer cannot simply trust the authors that they mechanized the proofs correctly. Are you saying that at conferences such as POPL people review papers and just blindly trust the authors when they claim the proofs have been formalized?

Well said. The point that forcing the reviewer to download the code from a web site compromises his or her anonymity is important, and I suspect very under-appreciated. I do wonder, though, how many academics have access to the server logs for their institutional web sites.

One interesting possibility to consider is the use of “literate programming”, where the same source file(s) can be run through LaTeX to produce the published paper and also through the proof checker to verify the proofs. Agda has this capability now, although I don’t think that Coq does.

@Mike: Yes, literate programming is an option, but it may be hard to convince the publisher that there should be gobs of code in the text.

You write that mechanized proofs must be subject to as rigorous review as pen-and-paper proof. But just how carefully are the latter currently being reviewed? In informal discussions, I have heard several computer scientists say that when they review they often don’t have time to check the proofs and just take the theorem statements on faith…

At venues like POPL the common practice is to put all the proofs into technical reports (which the reviewers usually have to track down themselves if they wish to read them), and the instructions for submission say that reviewers are only obliged to look at the paper, not the supporting material.

Well, hopefully the increasing use of mechanized proofs should be helpful here. As you say, even when the compilation trouble is factored in, it’s a lot less work to verify the correctness of a mechanized proof than a pen-and-paper proof.

Well, I understand that things are different at conferences, where computer scientists have to review too many papers in too short a time, and mathematicians seem to accept everybody and their mother because conferences are devoted more to “current work”. But I am still troubled by what you (@Vilhelm) describe, namely a systematic deterioration of importance of proofs. If proofs are hidden in code and technical reports and people do not check them, in what sense is it still mathematics? There is no math without proofs. Proofs are sacred!

Am I asking for too much, say for POPL? At least the authors shoudl enclose the code, and at least the reviewer should make sure it compiles, and have a glance at what the code is doing. With current behavior we deseve our own Sokal fiasco.

Andrej, I realize the case of formalized mathematics is a unique one, but you might want to check out the ongoing struggles in the natural science communities in which many have urged and tried to provide a process for data and code to go hand in hand with published research. For example, this report from Software Carpentry might be of interest to you: http://software-carpentry.org/blog/2013/08/intecol13-good-code.html

Here is a related question on which you might like to give as your thoughts in a new blog post:

If a narrative paper is accompanied by a machine-verified formal proof, what is level of detail of the proofs is it appropriate to give in the narrative?

I am arguably guilty of spelling out proofs in excessive detail in my own papers. On the other hand, I have had too many experiences of blunders that resulted from not giving every detail of every proof.

A possible answer to this is that a mathematical paper should be like a tourist guide: the audience want to get a general picture and can go into whatever level of detail they wish, but the guide must actually have been down all of the alleyways him/herself.

Formalised and mechanised mathematics will change the nature of peer review. To check the most difficult proof for correctness you don’t need to understand maths, just to know how to install and run Coq (or Agda etc). A failed formal mechanical proof can happen only through the gross violation of the proof assistant protocol, basically tantamount to cheating.

The role of the reviewer may be to value the originality and the significance of the result. Perhaps the originality can be also automated using tools like Turnitin. So all the reviewer can contribute is her highly dubious highly biased judgment regarding the ‘significance’ of the result or the beauty of the prose surrounding the maths.

But I am not sure we need this kind of judgement at all. We can simply publish on the Arxiv or on personal blogs (all code attached) and simply let the readers be the judges of what they read.

And hiring and promotion committee will just need to invent different heuristics by which to judge how good someone is. Or perhaps they would need to read their actual papers and judge for themselves if they are any good rather than relying on proxies.

Formalised and mechanised maths will render peer review obsolete.

I think Andrej did make clear that a big part of the work he did was to verify the correspondence between the human-language description of the problem, and the formalized problem. It’s all very well to have a program that compiles and runs without crashing, but we also want to be sure that the way we first construed its purpose is reliably what it actually does. That part of the review cannot be machine-verifiable, because it is dependent upon what is not mechanical.

Jesse, not necessarily. Using literate programming the distinction between the code and the paper disappears. Agda/Coq code can be rendered nicely (and mechanically) into LaTeX. Here is an example:

M.H. Escardo. Continuity of Godel’s system T functionals via effectful forcing. (MFPS’2013)

http://www.cs.bham.ac.uk/~mhe/dialogue/dialogue.pdf

Excuse me, Dan, but when you say “not necessarily”, what part of my contention are you contending with? Will a computer read the beautifully typset English comments and construe the maths better because of it?

[I realize this is over two years old.] At first I agreed with Andrej. Then reading Dan Ghica’s replies changed my mind. My first reaction to his replies was that the post addressed Dan’s concerns and clearly gave motivation, but as I read and thought more, I realized Dan was right. Andrej is working from an incorrect premise that may be true only transitionally. In fact, in my opinion, Dan underestimates the reality. Literate programming is irrelevant.

Proofs will be (and always were) software. What’s new is they will start having real clients. “Proofs are sacred!” is nice to say, but it was never actually practiced. Mathematicians cared about pretty and persuasive arguments which often referred to things labelled “proofs”. Only now are proofs actually being created and they aren’t sacred, they’re

valuable. Programmers are going to be using these things to machine check that their airfoil simulation code actually accurately simulates the mathematical model. Other mathematicians are going to be using Coq et al and are going to be using exact real libraries. They aren’t going to rewrite that stuff every time. Proofs are going to be upgraded. You will have Heine-Borel v1.3.4. There’s little need to preserve Heine-Borel v0.1. Virtually no one is going to care about a published paper. The proof will be in use before any paper is published. Papers will still be published, just as papers are published about software. But no one reads those papers to decide if the code actually works. I don’t care if your proofs prove what your paper says they do; I care if they prove the premise of my theorem. The Millenium problems will be propositions awaiting their proof terms. Indeed, often the theorems to be proved will be fixed before some would-be author even decides to attempt to prove it. They’ll be Feature Requests.As a minor reprieve, elegance/beauty will still be valued, but probably much less so… A less minor reprieve is mathematics will probably get a lot more funding and interest.