Here are the slides with speaker notes and the video recording of the talk.
Abstract:
It has often been said that all of mathematics can in principle be formalized in a suitably chosen foundation, such as first-order logic with set theory, higher-order logic, or type theory. When one attempts to actually do so on a large scale, the true meaning of the qualifier “in principle” is revealed: mathematical practice consists not only of text written on paper, however detailed they might be, but also of unspoken conventions and techniques that enable efficient communication and understanding of mathematical texts. While students may be able to learn these through observation and imitation, the same cannot be expected of computers, yet.
In this talk we will first review some of the informal mathematical practices and relate them to corresponding techniques in proof assistants, such as implicit arguments, type classes, and tactics. We shall then ask more generally whether these need be just a bag of tricks, or can they be organized into a proper mathematical theory.
]]>Katja Berčič made a super cool logo for my talk:
Thank you, Katja! If you are a Trekkie you should figure it out.
Abstract:
In the 19th century Carl Friedrich Gauss, Nikolai Lobachevsky, and János Bolyai discovered geometries that violated the parallel postulate. Initially these were considered inferior to Euclid's geometry, which was generally recognized as the true geometry of physical space. Subsequently, the work of Bernhard Riemann, Albert Einstein, and others, liberated geometry from the shackles of dogma, and allowed it to flourish beyond anything that the inventors of non-euclidean geometry could imagine.
A century later history repeated itself, this time with entire worlds of mathematics at stake. The ideal of one true mathematics was challenged by the schism between intuitionstic and classical mathematics, as personified in the story of rivalry between L.E.J. Brouwer and David Hilbert. Not long afterwards, Kurt Gödel's work in logic implied the inevitability of a multitude of worlds of mathematics. These could hardly be dismissed as logical sophistry, as they provided answers to fundamental questions about set theory and foundations of mathematics. The second half of the 20th century brought gave us many more worlds of mathematics: Cohen's set-theoretic forcing, Alexander Grothnedieck's sheaves, F. William Lawvere's and Myles Tierney's elementary toposes, Martin Hyland's effective topos, and a plethora of others.
We shall explore but a small corner of the vast multiverse of mathematics, observing in each the quintessential mathematical object, the field of real numbers. There is a universe in which the reals contain Leibniz's infinitesimals, in another they are all computable, there is one in which they are cannot be separated into two disjoint subsets, and one in which all subsets are measurable. There is even a universe in which the reals are countable. The spectrum of possibilities is bewildering, but also inspiring. It leads to the idea of synthetic mathematics: just like geometers and physicists choose a geometry that is best for the situation at hand, mathematicians can choose to work in a mathematical universe made to order, or synthesized, that best captures the essence and nature of the topic of interest.
]]>On the occasion Steve Awodey assembled selected works by Dana Scott at CMU-HoTT/scott
repository. It is an amazing collection of papers that had deep impact on logic, set theory, computation, and programming languages. I hope in the future we can extend it and possibly present it in better format.
As a special treat, I recount here the story the invention of the famous $D_\infty$ model of the untyped $\lambda$-calculus. I heard it first when I was Dana's student. In 2008 I asked Dana to recount it in the form of a short interview.
These days domain theory is a mature branch of mathematics. It has had profound influence on the theory and practice of programming languages. When did you start working on it and why?
Dana Scott: I was in Amsterdam in 1968/69 with my family. I met Strachey at IFIP WG2.2 in summer of 1969. I arranged leave from Princeton to work with him in the fall of 1969 in Oxford. I was trying to convince Strachey to use a type theory based on domains.
One of your famous results is the construction of a domain $D_\infty$ which is isomorphic to its own continuous function space $D_\infty \to D_\infty$. How did you invent it?
D. S.: $D_\infty$ did not come until later. I remember it was a quiet Saturday in November 1969 at home. I had proved that if domains $D$ and $E$ have a countable basis of finite elements, then so does the continuous function space $D \to E$. In understanding how often the basis for $D \to E$ was more complicated than the bases for $D$ and $E$, I then thought, “Oh, no, there must exist a bad $D$ with a basis so 'dense' that the basis for $D \to D$ is just as complicated – in fact, isomorphic.” But I never proved the existence of models exactly that way because I soon saw that the iteration of $X \mapsto (X \to X)$ constructed a suitable basis in the limit. That was the actual $D_\infty$ construction.
Why do you say “oh no”? It was an important discovery!
D. S.: Since, I had claimed for years that the type-free $\lambda$-calculus has no “mathematical” models (as distinguished from term models), I said to myself, “Oh, no, now I will have to eat my own words!”
The existence of term models is guaranteed by the Church-Rosser theorem from 1936 which implies that the untyped lambda calculus is consistent?
D. S.: Yes.
The domain $D_\infty$ is an involved construction which gives a model for the calculus with both $\beta$- and $\eta$-rules. Is it easier to give a model which satisfies the $\beta$-rule only?
D. S.: Since the powerset of natural numbers $P\omega$ (with suitable topology) is universal for countably-based $T_0$-spaces, and since a continuous lattice is a retract of every superspace, it follows that $P\omega \to P\omega$ is a retract of $P\omega$. This gives a non-$\eta$ model without any infinity-limit constructions. But continuous lattices had not yet been invented in 1969 – that I knew of.
Where can the interested readers read more about this topic?
D.S.: I would recommend these two:
Thank you very much!
Dana Scott: You are welcome.
]]>Abstract: The raw syntax of a type theory, or more generally of a formal system with binding constructs, involves not only free and bound variables, but also meta-variables, which feature in inference rules. Each notion of variable has an associated notion of substitution. A syntactic translation from one type theory to another brings in one more level of substitutions, this time mapping type-theoretic constructors to terms. Working with three levels of substitution, each depending on the previous one, is cumbersome and repetitive. One gets the feeling that there should be a better way to deal with syntax.
In this talk I will present a relative monad capturing higher-rank syntax which takes care of all notions of substitution and binding-preserving syntactic transformations in one fell swoop. The categorical structure of the monad corresponds precisely to the desirable syntactic properties of binding and substitution. Special cases of syntax, such as ordinary first-order variables, or second-order syntax with variables and meta-variables, are obtained easily by precomposition of the relative monad with a suitable inclusion of restricted variable contexts into the general ones. The meta-theoretic properties of syntax transfer along the inclusion.
The relative monad is sufficiently expressive to give a notion of intrinsic syntax for simply typed theories. It remains to be seen how one could refine the monad to account for intrinsic syntax of dependent type theories.
Talk notes: Here are the hand-written talk notes, which cover more than I could say during the talk.
Formalization:
I have the beginning of a formalization of the higher-rank syntax, but it hits a problem, see below. Can someone suggest a solution? (You can download Syntax.agda
.)
{-
An attempt at formalization of (raw) higher-rank syntax.
We define a notion of syntax which allows for higher-rank binders,
variables and substitutions. Ordinary notions of variables are
special cases:
* order 1: ordinary variables and substitutions, for example those of
λ-calculus
* order 2: meta-variables and their instantiations
* order 3: symbols (term formers) in dependent type theory, such as
Π, Σ, W, and syntactic transformations between theories
The syntax is parameterized by a type Class of syntactic classes. For
example, in dependent type theory there might be two syntactic
classes, ty and tm, corresponding to type and term expressions.
-}
module Syntax (Class : Set) where
{- Shapes can also be called “syntactic variable contexts”, as they assign to
each variable its syntactic arity, but no typing information.
An arity is a binding shape with a syntactic class. The shape specifies
how many arguments the variable takes and how it binds the argument's variables.
The class specifies the syntactic class of the variable, and therefore of the
expression formed by it.
We model shapes as binary trees so that it is easy to concatenate
two of them. A more traditional approach models shapes as lists, in
which case one has to append lists.
-}
infixl 6 _⊕_
data Shape : Set where
𝟘 : Shape -- the empty shape
[_,_] : ∀ (γ : Shape) (cl : Class) → Shape -- the shape with precisely one variable
_⊕_ : ∀ (γ : Shape) (δ : Shape) → Shape -- disjoint sum of shapes
infix 5 [_,_]∈_
{- The de Bruijn indices are binary numbers because shapes are binary
trees. [ δ , cl ]∈ γ is the set of variable indices in γ whose arity
is (δ, cl). -}
data [_,_]∈_ : Shape → Class → Shape → Set where
var-here : ∀ {θ} {cl} → [ θ , cl ]∈ [ θ , cl ]
var-left : ∀ {θ} {cl} {γ} {δ} → [ θ , cl ]∈ γ → [ θ , cl ]∈ γ ⊕ δ
var-right : ∀ {θ} {cl} {γ} {δ} → [ θ , cl ]∈ δ → [ θ , cl ]∈ γ ⊕ δ
{- Examples:
postulate ty : Class -- type class
postulate tm : Class -- term class
ordinary-variable-arity : Class → Shape
ordinary-variable-arity c = [ 𝟘 , c ]
binary-type-metavariable-arity : Shape
binary-type-metavariable-arity = [ [ 𝟘 , tm ] ⊕ [ 𝟘 , tm ] , ty ]
Π-arity : Shape
Π-arity = [ [ 𝟘 , ty ] ⊕ [ [ 𝟘 , tm ] , ty ] , ty ]
-}
{- Because everything is a variable, even symbols, there is a single
expression constructor _`_ which forms and expression by applying
the variable x to arguments ts. -}
-- Expressions
infix 9 _`_
data Expr : Shape → Class → Set where
_`_ : ∀ {γ} {δ} {cl} (x : [ δ , cl ]∈ γ) →
(ts : ∀ {θ} {B} (y : [ θ , B ]∈ δ) → Expr (γ ⊕ θ) B) → Expr γ cl
-- Renamings
infix 5 _→ʳ_
_→ʳ_ : Shape → Shape → Set
γ →ʳ δ = ∀ {θ} {cl} (x : [ θ , cl ]∈ γ) → [ θ , cl ]∈ δ
-- identity renaming
𝟙ʳ : ∀ {γ} → γ →ʳ γ
𝟙ʳ x = x
-- composition of renamings
infixl 7 _∘ʳ_
_∘ʳ_ : ∀ {γ} {δ} {η} → (δ →ʳ η) → (γ →ʳ δ) → (γ →ʳ η)
(r ∘ʳ s) x = r (s x)
-- renaming extension
⇑ʳ : ∀ {γ} {δ} {Θ} → (γ →ʳ δ) → (γ ⊕ Θ →ʳ δ ⊕ Θ)
⇑ʳ r (var-left x) = var-left (r x)
⇑ʳ r (var-right y) = var-right y
-- the action of a renaming on an expression
infixr 6 [_]ʳ_
[_]ʳ_ : ∀ {γ} {δ} {cl} (r : γ →ʳ δ) → Expr γ cl → Expr δ cl
[ r ]ʳ (x ` ts) = r x ` λ { y → [ ⇑ʳ r ]ʳ ts y }
-- substitution
infix 5 _→ˢ_
_→ˢ_ : Shape → Shape → Set
γ →ˢ δ = ∀ {Θ} {cl} (x : [ Θ , cl ]∈ γ) → Expr (δ ⊕ Θ) cl
-- side-remark: notice that the ts in the definition of Expr is just a substituition
-- We now hit a problem when trying to define the identity substitution in a naive
-- fashion. Agda rejects the definition, as it is not structurally recursive.
-- {-# TERMINATING #-}
𝟙ˢ : ∀ {γ} → γ →ˢ γ
𝟙ˢ x = var-left x ` λ y → [ ⇑ʳ var-right ]ʳ 𝟙ˢ y
{- What is the best way to deal with the non-termination problem? I have tried:
1. sized types: got mixed results, perhaps I don't know how to use them
2. well-founded recursion: it gets messy and unpleasant to use
3. reorganizing the above definitions, but non-structural recursion always sneeks in
A solution which makes the identity substitition compute is highly preferred.
The problem persists with other operations on substitutions, such as composition
and the action of a substitution.
-}
Philipp's thesis An Effective Metatheory for Type Theory has three parts:
A formulation and a study of the notion of finitary type theories and standard type theories. These are closely related to the general type theories that were developed with Peter Lumsdaine, but are tailored for implementation.
A formulation and the study of context-free finitary type theories, which are type theories without explicit contexts. Instead, the variables are annotated with their types. Philipp shows that one can pass between the two versions of type theory.
A novel effectful meta-language Andromeda meta-language (AML) for proof assistants which uses algebraic effects and handlers to allow flexible interaction between a generic proof assistant and the user.
Anja's thesis Meta-analysis of type theories with an application to the design of formal proofs also has three parts:
A formulation and a study of transformations of finitary type theories with an associated category of finitary type theories.
A user-extensible equality checking algorithm for standard type theories which specializes to several existing equality checking algorithms for specific type theories.
A general elaboration theorem in which the transformation of type theories are used to prove that every finitary type theory (not necessarily fully annotated) can be elaborated to a standard type theory (fully annotated one).
In addition, Philipp has done a great amount of work on implementing context-free type theories and the effective meta-language in Andromeda 2, and Anja implemented the generic equality checking algorithm. In the final push to get the theses out the implementation suffered a little bit and is lagging behind. I hope we can bring it up to speed and make it usable. Anja has ideas on how to implement transformations of type theories in a proof assistant.
Of course, I am very happy with the particular results, but I am even happier with the fact that Philipp and Anja made an important step in the development of type theory as a branch of mathematics and computer science: they did not study a particular type theory or a narrow family of them, as has hitherto been the norm, but dependent type theories in general. Their theses contain interesting non-trivial meta-theorems that apply to large classes of type theories, and can no doubt be generalized even further. There is lots of low-hanging fruit out there.
]]>We are going to work in pure Martin-Löf type theory and the straightforward propostions-as-types interpretation of logic, so no univalence, propostional truncation and other goodies are available. Our primary objects of interest are setoids, and Agda's setoids in particular. The content of the post has been formalized in this gist. I am not going to bother to reproduce here the careful tracking of universe levels that the formalization carries out (because it must).
In general, a type, set, or an object $X$ of some sort is said to satisfy choice when every total relation $R \subseteq X \times Y$ has a choice function: $$(\forall x \in X . \exists y \in Y . R(x,y)) \Rightarrow \exists f : X \to Y . \forall x \in X . R(x, f\,x). \tag{AC}$$ In Agda this is transliterated for a setoid $A$ as follows:
satisfies-choice : ∀ c' ℓ' r → Set (c ⊔ ℓ ⊔ suc c' ⊔ suc ℓ' ⊔ suc r)
satisfies-choice c' ℓ' r = ∀ (B : Setoid c' ℓ') (R : SetoidRelation r A B) →
(∀ x → Σ (Setoid.Carrier B) (rel R x)) → Σ (A ⟶ B) (λ f → ∀ x → rel R x (f ⟨$⟩ x))
Note the long arrow in A ⟶ B
which denotes setoid maps, i.e., the choice map $f$ must respect the setoid equivalence relations $\sim_A$ and $\sim_B$.
A category theorist would instead prefer to say that $A$ satisfies choice if every epi $e : B \to A$ splits: $$(\forall B . \forall e : B \to A . \text{$e$ epi} \Rightarrow \exists s : A \to B . e \circ s = \mathrm{id}_A. \tag{PR}.$$ Such objects are known as projective. The Agda code for this is
surjective : ∀ {c₁ ℓ₁ c₂ ℓ₂} {A : Setoid c₁ ℓ₁} {B : Setoid c₂ ℓ₂} → A ⟶ B → Set (c₁ ⊔ c₂ ⊔ ℓ₂)
surjective {B = B} f = ∀ y → Σ _ (λ x → Setoid._≈_ B (f ⟨$⟩ x) y)
split : ∀ {c₁ ℓ₁ c₂ ℓ₂} {A : Setoid c₁ ℓ₁} {B : Setoid c₂ ℓ₂} → A ⟶ B → Set (c₁ ⊔ ℓ₁ ⊔ c₂ ⊔ ℓ₂)
split {A = A} {B = B} f = Σ (B ⟶ A) (λ g → ∀ y → Setoid._≈_ B (f ⟨$⟩ (g ⟨$⟩ y)) y)
projective : ∀ c' ℓ' → Set (c ⊔ ℓ ⊔ suc c' ⊔ suc ℓ')
projective c' ℓ' = ∀ (B : Setoid c' ℓ') (f : B ⟶ A) → surjective f → split f
(If anyone can advise me how to to avoid the ugly Setoid._≈_ B
above using just what is available in the standard library, please do. I know how to introduce my own notation, but why should I?)
Actually, the above code uses surjectivity in place of being epimorphic, so we should verify that the two notions coincide in setoids, which is done in Epimorphism.agda
. The human proof goes as follows, where we write $=_A$ or just $=$ for the equivalence relation on a setoid $A$.
Theorem: A setoid morphism $f : A \to B$ is epi if, and only if, $\Pi (y : B) . \Sigma (x : A) . f \, x =_B y$.
Proof. (⇒) I wrote up the proof on MathOverflow. That one works for toposes, but is easy to transliterate to setoids, just replace the subobject classifier $\Omega$ with the setoid of propositions $(\mathrm{Type}, {\leftrightarrow})$.
(⇐) Suppose $\sigma : \Pi (y : B) . \Sigma (x : A) . f \, x =_B y$ and $g \circ f = h \circ f$ for some $g, h : B \to C$. Given any $y : B$ we have $$g(y) =_C g(f(\mathrm{fst}(\sigma\, y))) =_C h(f(\mathrm{fst}(\sigma\, y))) =_C h(y).$$ QED.
Every type $T$ may be construed as a setoid $\Delta T = (T, \mathrm{Id}_T)$, which is setoid
in Agda.
Say that a setoid $A$ has canonical elements when there is a map $c : A \to A$ such that $x =_A y$ implies $\mathrm{Id}_A(c\,x , c\,y)$, and $c\, x =_A x$ for all $x : A$. In other words, the map $c$ takes each element to a canonical representative of its equivalence class. In Agda:
record canonical-elements : Set (c ⊔ ℓ) where
field
canon : Carrier → Carrier
canon-≈ : ∀ x → canon x ≈ x
canon-≡ : ∀ x y → x ≈ y → canon x ≡ canon y
Based on my experience with realizability models, I always thought that the following were equivalent:
But there is a snag! The implication (2 ⇒ 3) seemingly requires extra conditions that I do not know how to get rid of. Before discussing these, let me just point out that SetoidChoice.agda
formalizes (1 ⇔ 2) and (3 ⇒ 4 ⇒ 1) unconditionally. In particular any $\Delta T$ is projective.
The implication (2 ⇒ 3) I could prove under the additional assumption that the underlying type of $A$ is an h-set. Let us take a closer look. Suppose $(A, {=_A})$ is a projective setoid. How could we get a type $T$ such that $A \cong \Delta T$? The following construction suggests itself. The setoid map
\begin{align}
r &: (A, \mathrm{Id}_A) \to (A, {=_A}) \notag \\\
r &: x \mapsto x \notag
\end{align}
is surjective, therefore epi. Because $A$ is projective, the map splits, so we have a setoid morphism $s : (A, {=_A}) \to (A, \mathrm{Id}_A)$ such that $r \circ s = \mathrm{id}$. The endomap $s \circ r : A \to A$ is a choice of canonical representatives of equivalence classes of $(A, {=_A})$, so we expect $(A, {=_A})$ to be isomorphic to $\Delta T$ where $$T = \Sigma (x : A) . \mathrm{Id}_A(s (r \, x), x).$$ The mediating isomorphisms are
\begin{align}
i &: A \to T & j &: T \to A \notag \\\
i &: x \mapsto (s (r \, x), \zeta \, x) & j &: (x, \xi) \mapsto x \notag
\end{align}
where $\zeta \, x : \mathrm{Id}(s (r (s (r \, x))), s (r \, x)))$ is constructed from the proof that $s$ splits $r$. This almost works! It is easy to verify that $j (i \, x) =_A x$, but then I got stuck on showing that $\mathrm{Id}_T(i (j (x, \xi), (x, \xi))$, which amounts to inhabiting $$ \mathrm{Id}_T((x, \zeta x), (x, \xi)). \tag{1} $$ There is no a priori reason why $\zeta x$ and $\xi$ would be equal. If $A$ is an h-set then we are done because they will be equal by fiat. But what do to in general? I do not know and I leave you with an open problem:
Egbert Rijke and I spent one tea-time thinking about producing a counter-example by using circles and other HoTT gadgets, but we failed. Just a word of warning: in HoTT/UF the map $1 \to S^1$ from the unit type to the circle is onto (in the HoTT sense) but $\Delta 1 \to \Delta S^1$ is not epi in setoids, because that would split $1 \to S^1$.
Here is an obvious try: use the propositional truncation and define $$ T = \Sigma (x : A) . \|\mathrm{Id}_A(s (r \, x), x) \|. $$ Now (1) does not pose a problem anymore. However, in order for $\Delta T$ to be isomorphic to $(A, {=_A})$ we will need to know that $x =_A y$ is an h-proposition for all $x, y : A$.
This is as far as I wish to descend into the setoid hell.
]]>I have therefore created a proposal for a new “Proof assistants” StackExchange site. I believe that such a site would complement very well various Zulips dedicated to specific proof assistants. If you favor the idea, please support it by visiting the proposal and
To pass the first stage, we need 60 followers and 40 questions with at least 10 votes to proceed to the next stage.
]]>Well, it was a few seconds for the computer in steps (3)-(4), but three years for us in steps (1)-(2).
Most people formalize mathematics (in Automath, NuPrl, Coq, Agda, Lean, ...) to get confidence in the correctness of mathematics - or so they claim. The reality is that formalizing mathematics is intellectually fun.
Entertaining considerations aside, my initial motivation for computer formalization, about 10 years ago, was to write algorithms derived from work on game theory with Paulo Oliva. In particular, this had applications to proof theory, such as getting programs from classical proofs. Our first version of a (manually) extracted program from a classical proof was written in Haskell, in a train journey coming back from a visit to our collaborators Monika Seisenberger and Ulrich Berger in Swansea. The train journey was long enough for us to be able to complete the program. But when we ran it, it didn't work. I had been learning Agda for about one year by then, and I told Paulo that it would be easier to write the mathematics in Agda, and hence be sure it will work before we ran it, than to debug the Haskell program. And that was the case.
Before then I was the kind of person who dismissed formalization, and would say so to people who did formalization (it is probably too late to apologize now). I trusted my own mathematics, and if I wanted to derive programs from my mathematical work, I would just write them manually. Since then, my attitude has changed considerably.
I now use Agda as a "blackboard" to develop my work. For example, the following were conceived and developed directly in Agda before they were written in mathematical vernacular: Injective types in univalent mathematics, Compact, totally separated and well-ordered types in univalent mathematics, The Cantor-Schröder-Bernstein Theorem for ∞-groupoids, The Burali-Forti argument in HoTT/UF and Continuity of Gödel's system T functionals via effectful forcing.
Other people will have different reasons to formalize. For example, wouldn't it be wonderful if the whole undergraduate mathematical curriculum were formalized? Wouldn't it be wonderful to archive all mathematical knowledge not just as text but in a more structured way, so that it can be used by both people and computers? Wouldn't it be wonderful if when we submit a paper, the referee didn't need to check correctness, but only novelty, significance and so on? Did you ever woke up in the middle of the night after you submitted a paper, with doubts about the crucial lemma? Or worse, after it was published?
But for the purposes of this post, I will concentrate on only one aspect of formalization: a formalized piece of constructive mathematics is automatically a computer program that you can run in practice.
Constructive mathematics begins by removing the principle of excluded middle, and therefore the axiom of choice, because choice implies excluded middle. But why would anybody do such an outrageous thing?
I particularly like the analogy with Euclidean geometry. If we remove the parallel postulate, we get absolute geometry, also known as neutral geometry. If after we remove the parallel postulate, we add a suitable axiom, we get hyperbolic geometry, but if we instead add a different suitable axiom we get elliptic geometry. Every theorem of neutral geometry is a theorem of these three geometries, and more geometries. So a neutral proof is more general.
When I say that I am interested in constructive mathematics, most of the time I mean that I am interested in neutral mathematics, so that we simply remove excluded middle and choice, and we don't add anything to replace them. So my constructive definitions and theorems are also definitions and theorems of classical mathematics.
Occasionally, I flirt with axioms that contradict the principle of excluded middle, such as Brouwerian intuitionistic axioms that imply that "all functions $(\mathbb{N} \to 2) → \mathbb{N}$ are uniformly continuous", when we equip the set $2$ with the discrete topology and $\mathbb{N} \to 2$ with the product topology, so that we get the Cantor space. The contradiction with classical logic, of course, is that using excluded middle we can define non-continuous functions by cases. Brouwerian intuitionistic mathematics is analogous to hyperbolic or elliptic geometry in this respect. The "constructive" mathematics I am talking about in this post is like neutral geometry, and I would rather call it "neutral mathematics", but then nobody would know what I am talking about. That's not to say that the majority of mathematicians will know what I am talking about if I just say "constructive mathematics".
But it is not (only) the generality of neutral mathematics that I find attractive. Somehow magically, constructions and proofs that don't use excluded middle or choice are automatically programs. The only way to define non-computable things is to use excluded middle or choice. There is no other way. At least not in the underlying type theories of proof assistants such as NuPrl, Coq, Agda and Lean. We don't need to consider Turing machines to establish computability. What is a computable sheaf, anyway? I don't want to pause to consider this question in order to use a sheaf topos to compute a number. We only need to consider sheaves in the usual mathematical sense.
Sometimes people ask me whether I believe in the principle of excluded middle. That would be like asking me whether I believe in the parallel postulate. It is clearly true in Euclidean geometry, clearly false in elliptic and in hyperbolic geometries, and deliberately undecided in neutral geometry. Not only that, in the same way as the parallel postulate defines Euclidean geometry, the principle of excluded middle and the axiom of choice define classical mathematics.
The undecidedness of excluded middle in my neutral mathematics allows me to prove, for example, "if excluded middle holds, then the Cantor-Schröder-Bernstein Theorem for ∞-groupoids holds". If excluded middle were false, I would be proving a counter-factual - I would be proving that an implication is true simply because its premise is false. But this is not what I am doing. What I am really proving is that the CSB theorem holds for the objects of boolean ∞-toposes. And why did I use excluded middle? Because somebody else showed that there is no other way. But also sometimes I use excluded middle or choice when I don't know whether there is another way (in fact, I believe that more than half of my publications use classical logic).
So, am I a constructivist? There is only one mathematics, of which classical and constructive mathematics are particular branches. I enjoy exploring the whole landscape. I am particularly fond of constructive mathematics, and I wouldn't practice it, however useful it may be for applications, if I didn't enjoy it. But this is probably my bad taste.
Toposes are generalized (sober) spaces. But also toposes can be considered as provinces of the mathematical world.
Hyland's effective topos is a province where "everything is computable". This is an elementary topos, which is not a Grothendieck topos, built from classical ingredients: we use excluded middle and choice, with Turing machines to talk about computability. But, as it turns out, although everybody agrees which functions $\mathbb{N} \to \mathbb{N}$ are computable and which ones aren't, there is disagreement among classical mathematicians working on computability theory about what counts as "computable" for more general mathematical objects, such as functions $(\mathbb{N} \to \mathbb{N}) \to \mathbb{N}$. No problem. Just consider other provinces, called realizability toposes, which include the effective topos as an example.
Johnstone's topological topos is a topos of spaces. It fully embeds a large category of topological spaces, where the objects outside the image of the embedding can be considered as generalized spaces (which include the Kuratowski limit spaces and more). In this province of the mathematical world, "all functions are continuous".
There are also provinces where there are infinitesimals and "all functions are smooth".
A more boring, but important, province, is the topos of classical sets. This is where classical mathematics takes place.
These provinces of mathematics have an internal language. We use a certain subobject classifier to collect the things that count as truth values in the province, and we devise a kind of type theory whose types are interpreted as objects and whose mathematical statements are interpreted as truth values in the province. Then a mathematical statement in this type theory is true in some toposes, false in other toposes, and undecided in yet other toposes. This internal language, or type theory, is very rich. Starting from natural numbers we can construct the integers, the rationals, the real numbers, free groups etc., and then do e.g. analysis and group theory and so on. The internal language of the free topos can be considered as a type theory for neutral mathematics: whatever we prove in the free type theory is true in all mathematical provinces, including classical set theory.
In the above first three provinces, the principle of excluded middle fails, but for different reasons, with respectively computability, continuity and infinitesimals to blame.
Now our plot has a twist: we work within neutral mathematics to build a province of "biased" constructive mathematics where Brouwerian principles hold, such as "all functions $(\mathbb{N} \to 2) → \mathbb{N}$ are uniformly continuous".
Johnstone's topological topos (1979) would do the job, except that it is built using classical ingredients. This topos has siblings by Mike Fourman (1982) and van der Hoeven and Moerdijk (1984) with aims similar to ours, as explained in our own 2013 and 2016 papers, which give a third sibling.
Johnstone's topological topos is very easy to describe: take the monoid of continuous endomaps of the one-point compactification of the discrete natural numbers, considered as a category, then take sheaves for the canonical coverage. Van der Hoeven and Moerdijk's topos is similar: this time take the monoid of continuous endomaps of the Baire space, with the "open-cover coverage". Fourman's topos is constructed from a site of formal spaces or locales, with a similar coverage.
Our topos is also similar: we take the monoid of uniformly continuous endomaps of the Cantor space. Because it is not provable in neutral mathematics that continuous functions on the Cantor space are automatically uniformly continuous, we explicitly ask for uniform continuity rather than just continuity. As for our coverage, we initially considered coverings of finitely many jointly surjective maps. But an equivalent, smaller coverage makes the mathematics (and the formalization) simpler: for each natural number $n$ we consider a cover with $2^n$ functions, namely the concatenation maps $(\mathbb{N} \to 2) \to (\mathbb{N} \to 2)$ defined by $\alpha \mapsto s \alpha$ for each finite binary sequence $s$ of length $n$. These functions are jointly surjective, and, moreover, have disjoint images, considerably simplifying the checking of the sheaf condition. Moreover, the coverage axiom is not only satisfied, but also is equivalent to the fact that the morphisms in our site are uniformly continuous functions. So this is a sort of "uniform-continuity coverage". Our slides (2014) illustrate these ideas with pictures and examples.
The details of the mathematics can be found in the above paper, and the Agda formalization can be found at Chuangjie's page. A few years later, we ported part of this formalization to Cubical Agda to deal properly with function extensionality (which we originally dealt with in ad hoc ways).
After we construct the sheaf topos, we define a simple type theory and we interpret it in the topos. We define a "function" $(\mathbb{N} \to 2) \to \mathbb{N}$ in this type theory, without proving that it is uniformly continuous, and apply the interpretation map to get a morphism of the topos, which amounts to a uniformly continuous function. From this morphism we get the modulus of uniform continuity, which is the integer we are interested in. The interested reader can find the details in the above paper and Agda code for the paper or the substantially more comprehensive Agda code for Chuangjie's thesis.
]]>We use the Burali-Forti argument to show that, in homotopy type theory and univalent foundations, the embedding $$ \mathcal{U} \to \mathcal{U}^+$$ of a universe $\mathcal{U}$ into its successor $\mathcal{U}^+$ is not an equivalence. We also establish this for the types of sets, magmas, monoids and groups. The arguments in this post are also written in Agda.
The Burali-Forti paradox is about the collection of all ordinals. In set theory, this collection cannot be a set, because it is too big, and this is what the Burali-Forti argument shows. This collection is a proper class in set theory.
In univalent type theory, we can collect all ordinals of a universe $\mathcal{U}$ in a type $\operatorname{Ordinal}\,\mathcal{U}$ that lives in the successor universe $\mathcal{U}^+$: $$ \operatorname{Ordinal}\,\mathcal{U} : \mathcal{U}^+.$$ See Chapter 10.3 of the HoTT book, which uses univalence to show that this type is a set in the sense of univalent foundations (meaning that its equality is proposition valued).
The analogue in type theory of the notion of proper class in set theory is that of large type, that is, a type in a successor universe $\mathcal{U}^+$ that doesn't have a copy in the universe $\mathcal{U}$. In this post we show that the type of ordinals is large and derive some consequences from this.
We have two further uses of univalence, at least:
to adapt the Burali-Forti argument from set theory to our type theory, and
to resize down the values of the order relation of the ordinal of ordinals, to conclude that the ordinal of ordinals is large.
There are also a number of uses of univalence via functional and propositional extensionality.
Propositional resizing rules or axioms are not needed, thanks to (2).
An ordinal in a universe $\mathcal{U}$ is a type $X : \mathcal{U}$ equipped with a relation $$ - \prec - : X \to X \to \mathcal{U}$$
required to be
proposition valued,
transitive,
extensional (any two points with same lower set are the same),
well founded (every element is accessible, or, equivalently, the principle of transfinite induction holds).
The HoTT book additionally requires $X$ to be a set, but this follows automatically from the above requirements for the order.
The underlying type of an ordinal $\alpha$ is denoted by $\langle \alpha \rangle$ and its order relation is denoted by $\prec_{\alpha}$ or simply $\prec$ when we believe the reader will be able to infer the missing subscript.
Equivalence of ordinals in universes $\mathcal{U}$ and $\mathcal{V}$, $$ -\simeq_o- : \operatorname{Ordinal}\,\mathcal{U} \to \operatorname{Ordinal}\,\mathcal{V} \to \mathcal{U} \sqcup \mathcal{V},$$ means that there is an equivalence of the underlying types that preserves and reflects order. Here we denote by $\mathcal{U} \sqcup \mathcal{V}$ the least upper bound of the two universes $\mathcal{U}$ and $\mathcal{V}$. The precise definition of the type theory we adopt here, including the handling of universes, can be found in Section 2 of this paper and also in our Midlands Graduate School 2019 lecture notes in Agda form.
For ordinals $\alpha$ and $\beta$ in the same universe, their identity type $\alpha = \beta$ is canonically equivalent to the ordinal-equivalence type $\alpha \simeq_o \beta$, by univalence.
The lower set of a point $x : \langle \alpha \rangle$ is written $\alpha \downarrow x$, and is itself an ordinal under the inherited order. The ordinals in a universe $\mathcal{U}$ form an ordinal in the successor universe $\mathcal{U}^+$, denoted by $$ \operatorname{OO}\,\mathcal{U} : \operatorname{Ordinal}\,\mathcal{U}^+,$$ for ordinal of ordinals.
Its underlying type is $\operatorname{Ordinal}\,\mathcal{U}$ and its order relation is denoted by $-\triangleleft-$ and is defined by $$\alpha \triangleleft \beta = \Sigma b : \langle \beta \rangle , \alpha = (\beta \downarrow b).$$
This order has type $$-\triangleleft- : \operatorname{Ordinal}\,\mathcal{U} \to \operatorname{Ordinal}\,\mathcal{U} \to \mathcal{U}^+,$$ as required for it to make the type $\operatorname{\operatorname{Ordinal}} \mathcal{U}$ into an ordinal in the next universe.
By univalence, this order is equivalent to the order defined by $$\alpha \triangleleft^- \beta = \Sigma b : \langle \beta \rangle , \alpha \simeq_o (\beta \downarrow b).$$ This has the more general type $$ -\triangleleft^-- : \operatorname{\operatorname{Ordinal}}\,\mathcal{U} \to \operatorname{\operatorname{Ordinal}}\,\mathcal{V} \to \mathcal{U} \sqcup \mathcal{V},$$ so that we can compare ordinals in different universes. But also when the universes $\mathcal{U}$ and $\mathcal{V}$ are the same, this order has values in $\mathcal{U}$ rather than $\mathcal{U}^+$. The existence of such a resized-down order is crucial for our corollaries of Burali-Forti, but not for Burali-Forti itself.
For any $\alpha : \operatorname{Ordinal}\,\mathcal{U}$ we have $$ \alpha \simeq_o (\operatorname{OO}\,\mathcal{U} \downarrow \alpha),$$ so that $\alpha$ is an initial segment of the ordinal of ordinals, and hence $$ \alpha \triangleleft^- \operatorname{OO}\,\mathcal{U}.$$
We adapt the original formulation and argument from set theory.
Theorem. No ordinal in a universe $\mathcal{U}$ can be equivalent to the ordinal of all ordinals in $\mathcal{U}$.
Proof. Suppose, for the sake of deriving absurdity, that there is an ordinal $\alpha \simeq_o \operatorname{OO}\,\mathcal{U}$ in the universe $\mathcal{U}$. By the above discussion, $\alpha \simeq_o \operatorname{OO}\,\mathcal{U} \downarrow \alpha$, and, hence, by symmetry and transitivity, $\operatorname{OO}\,\mathcal{U} \simeq_o \operatorname{OO}\,\mathcal{U} \downarrow \alpha$. Therefore, by univalence, $\operatorname{OO}\,\mathcal{U} = \operatorname{OO}\,\mathcal{U} \downarrow \alpha$. But this means that $\operatorname{OO}\,\mathcal{U} \triangleleft \operatorname{OO}\,\mathcal{U}$, which is impossible as any accessible relation is irreflexive. $\square$
Some corollaries follow.
We say that a type in the successor universe $\mathcal{U}^+$ is small if it is equivalent to some type in the universe $\mathcal{U}$, and large otherwise.
Theorem. The type of ordinals of any universe is large.
Proof. Suppose the type of ordinals in the universe $\mathcal{U}$ is small, so that there is a type $X : \mathcal{U}$ equivalent to the type $\operatorname{Ordinal}\, \mathcal{U} : \mathcal{U}^+$. We can then transport the ordinal structure from the type $\operatorname{Ordinal}\, \mathcal{U}$ to $X$ along this equivalence to get an ordinal in $\mathcal{U}$ equivalent to the ordinal of ordinals in $\mathcal{U}$, which is impossible by the Burali-Forti theorem.
But the proof is not concluded yet, because we have to say how we transport the ordinal structure. At first sight we should be able to simply apply univalence. However, this is not possible because the types $X : \mathcal{U}$ and $\operatorname{Ordinal}\,\mathcal{U} :\mathcal{U}^+$ live in different universes. The problem is that only elements of the same type can be compared for equality.
In the cumulative universe hierarchy of the HoTT book, we automatically have that $X : \mathcal{U}^+$ and hence, being equivalent to the type $\operatorname{Ordinal}\,\mathcal{U} : \mathcal{U}^+$, the type $X$ is equal to the type $\operatorname{Ordinal}\,\mathcal{U}$ by univalence. But this equality is an element of an identity type of the universe $\mathcal{U}^+$. Therefore when we transport the ordinal structure on the type $\operatorname{Ordinal}\,\mathcal{U}$ to the type $X$ along this equality and equip $X$ with it, we get an ordinal in the successor universe $\mathcal{U}^+$. But, in order to get the desired contradiction, we need to get an ordinal in $\mathcal{U}$.
In the non-cumulative universe hierarchy we adopt here, we face essentially the same difficulty. We cannot assert that $X : \mathcal{U}^+$ but we can promote $X$ to an equivalent type in the universe $\mathcal{U}^+$, and from this point on we reach the same obstacle as in the cumulative case.
So we have to transfer the ordinal structure from $\operatorname{Ordinal}\,\mathcal{U}$ to $X$ manually along the given equivalence, call it $$f : X \to \operatorname{Ordinal}\,\mathcal{U}.$$ We define the order of $X$ from that of $\operatorname{Ordinal}\,\mathcal{U}$ by $$ x \prec y = f(x) \triangleleft f(y). $$ It is laborious but not hard to see that this order satisfies the required axioms for making $X$ into an ordinal, except that it has values in $\mathcal{U}^+$ rather than $\mathcal{U}$. But this problem is solved by instead using the resized-down relation $\triangleleft^-$ discussed above, which is equivalent to $\triangleleft$ by univalence. $\square$
By a universe embedding we mean a map $$f : \mathcal{U} \to \mathcal{V}$$ of universes such that, for all $X : \mathcal{U}$, $$f(X) \simeq X.$$ Of course, any two universe embeddings of $\mathcal{U}$ into $\mathcal{V}$ are equal, by univalence, so that there is at most one universe embedding between any two universes. Moreover, universe embeddings are automatically type embeddings (meaning that they have propositional fibers).
So the following says that the universe $\mathcal{U}^+$ is strictly larger than the universe $\mathcal{U}$:
Theorem. The universe embedding $\mathcal{U} \to \mathcal{U}^+$ doesn't have a section and therefore is not an equivalence.
Proof. A section would give a type in the universe $\mathcal{U}$ equivalent to the type of ordinals in $\mathcal{U}$, but we have seen that there is no such type. $\square$
(However, by Theorem 29 of Injective types in univalent mathematics, if propositional resizing holds then the universe embedding $\mathcal{U} \to \mathcal{U}^+$ is a section.)
The same argument of the above theorem shows that there are more sets in $\mathcal{U}^+$ than in $\mathcal{U}$, because the type of ordinals is a set. For a universe $\mathcal{U}$ define the type $$\operatorname{hSet}\,\mathcal{U} : \mathcal{U}^+$$ by $$ \operatorname{hSet}\,\mathcal{U} = \Sigma A : \mathcal{U} , \text{$A$ is a set}.$$ By an hSet embedding we mean a map $$f : \operatorname{hSet}\,\mathcal{U} → \operatorname{hSet}\,\mathcal{V}$$ such that the underlying type of $f(\mathbb{X})$ is equivalent to the underlying type of $\mathbb{X}$ for every $\mathbb{X} : \operatorname{hSet}\,\mathcal{U}$, that is, $$ \operatorname{pr_1} (f (\mathbb{X})) ≃ \operatorname{pr_1}(\mathbb{X}). $$ Again there is at most one hSet-embedding between any two universes, hSet-embeddings are type embeddings, and we have:
Theorem. The hSet-embedding $\operatorname{hSet}\,\mathcal{U} \to \operatorname{hSet}\,\mathcal{U}^+$ doesn't have a section and therefore is not an equivalence.
This is because the type of ordinals is a monoid under addition with the ordinal zero as its neutral element, and hence also a magma. If the inclusion of the type of magmas (respectively monoids) of one universe into that of the next were an equivalence, then we would have a small copy of the type of ordinals.
Theorem. The canonical embeddings $\operatorname{Magma}\,\mathcal{U} → \operatorname{Magma}\,\mathcal{U}^+$ and $\operatorname{Monoid}\,\mathcal{U} → \operatorname{Monoid}\,\mathcal{U}^+$ don't have sections and hence are not equivalences.
This case is more interesting.
The axiom of choice is equivalent to the statement that any non-empty set can be given the structure of a group. So if we assumed the axiom of choice we would be done. But we are brave and work without assuming excluded middle, and hence without choice, so that our results hold in any $\infty$-topos.
It is also the case that the type of propositions can be given the structure of a group if and only if the principle of excluded middle holds. And the type of propositions is a retract of the type of ordinals, which makes it unlikely that the type of ordinals can be given the structure of a group without excluded middle.
So our strategy is to embed the type of ordinals into a group, and the free group does the job.
First we need to show that the inclusion of generators, or the universal map into the free group, is an embedding.
But having a large type $X$ embedded into a type $Y$ is not enough to conclude that $Y$ is also large. For example, if $P$ is a proposition then the unique map $P \to \mathbb{1}$ is an embedding, and the unit type $\mathbb{1}$ is small but $P$ may be large.
So more work is needed to show that the group freely generated by the type of ordinals is large. We say that a map is small if each of its fibers is small, and large otherwise. De Jong and Escardo showed that if a map $X \to Y$ is small and the type $Y$ is small, then so is the type $X$, and hence if $X$ is large then so is $Y$. Therefore our approach is to show that the universal map into the free group is small. To do this, we exploit the fact that the type of ordinals is locally small (its identity types are all equivalent to small types).
But we want to be as general as possible, and hence work with a spartan univalent type theory which doesn't include higher inductive types other than propositional truncation. We include the empty type, the unit type, natural numbers, list types (which can actually be constructed from the other type formers), coproduct types, $\Sigma$-types, $\Pi$-types, identity types and a sequence of universes. We also assume the univalence axiom (from which we automatically get functional and propositional extensionality) and the axiom of existence of propositional truncations.
We construct the free group as a quotient of a type of words following Mines, Richman and Ruitenburg. To prove that the universal map is an embedding, one first proves a Church-Rosser property for the equivalence relation on words. It is remarkable that this can be done without assuming that the set of generators has decidable equality.
Quotients can be constructed from propositional truncation. This construction increases the universe level by one, but eliminates into any universe.
To resize back the quotient used to construct the group freely generated by the type of ordinals to the original universe, we exploit the fact that the type of ordinals is locally small.
As above, we have to transfer manually group structures between equivalent types of different universes, because univalence can't be applied.
Putting the above together, and leaving many steps to the Agda code, we get the following in our spartan univalent type theory.
Theorem. For any large, locally small set, the free group is also large and locally small.
Corollary. In any successor universe $\mathcal{U}^+$ there is a group which is not isomorphic to any group in the universe $\mathcal{U}$.
Corollary. The canonical embedding $\operatorname{Group}\,\mathcal{U} → \operatorname{Group}\,\mathcal{U}^+$ doesn't have a section and hence is not an equivalence.
Can we formulate and prove a general theorem of this kind that specializes to a wide variety of mathematical structures that occur in practice?
]]>