Mathematicians are often confused about the meaning of variables. I hear them say “a free variable is implicitly universally quantified”, by which they mean that it is ok to equate a formula $\phi$ with a free variable $x$ with its universal closure $\forall x \,.\, \phi$. I am addressing this post to those who share this opinion.
I will give several reasons, which are all essentially the same, why “there is no difference between $\phi$ and $\forall x \,.\, \phi$” is a really bad opinion to have.
Reason 1: you wouldn’t equate a function with its definite integral
You would not claim that a real-valued function $f : \mathbb{R} \to \mathbb{R}$ is “the same thing” as its definite integral $\int_{\mathbb{R}} f(x) \, d x$, would you? One is a real function, the other is a real number. Likewise, $\phi$ is a truth
Reason 2: functions are not their own values
To be quite precise, the expression $\phi$ by itself is not a function, just like the expression $x + \sin x$ is not a function. To make it into a function we must first fun x -> x +. sin x. In logic we indicate the fact that $\phi$ is a function by putting it in a
Why is all this nit-picking necessary? Try answering these questions with “yes” and “no” consistently:
- Is $x + \sin x$ a function in variable $x$?
- Is $x + \sin x$ a function in variables $x$ and $y$?
- Is $y – y + x + \sin x$ a function in variables $x$ and $y$?
- Is $x + \sin x = y – y + x + \sin x$?
A similar sort of mistake happens in algebra where people think that polynomials are functions. They are not. They are elements of a certain freely generated ring.
Reason 3: They are not logically equivalent
It is absurd to claim that $\phi$ and $\forall x \in \mathbb{R} \,.\, \phi$ are logically equivalent statements. Suppose $\forall x \in \mathbb{R} \,.\, x > 2$ were equivalent to $x > 2$. Then I could replace one by the other in any formula I wish. So I choose the formula $\exists x \in \mathbb{R} \,.\, x > 2$. It must be equivalent to $\exists x \in \mathbb{R} \,.\, \forall x \in \mathbb{R} \,.\, x > 2$, but since $\forall x \in \mathbb{R} \,.\, x > 2$ is false, we get $\exists x \in \mathbb{R} \,.\, \bot$, which is false. We proved that there is no number larger than 2.
Reason 4: They are not inter-derivable
If you can tell the difference between an implication and logical entailment, perhaps you might try to counter reason 3 by pointing out that $\phi$ and $\forall x \,.\, \phi$ are either both derivable, or both not derivable. That is to say, we can prove one if, and only if, we can prove the other. But again, this is not the case. We can prove $\forall x \in \emptyset \,.\, \bot$ but we cannot prove $\bot$.
Reason 5: Bound variables can be renamed but free variables cannot
The formula $x > 2$ is obviously not the same thing as the formula $y > 2$. But the formula $\forall x \in \mathbb{R} . x > 2$ is actually the same as $\forall y \in \mathbb{R} . y > 2$. If you find this confusing it is because you were never taught properly how to handle free and bound variables.
Reason 6: You cannot prove $\forall x \,.\, \phi$ without allowing $x$ to become free
Perhaps we can just forbid free variables altogether and
“Consider any $x \in \mathbb{R}$. Then bla bla bla, therefore $\phi$.”
is now forbidden because the first sentence introduces $x$ as a free variable.
We can abolish variables altogether if we wish, by resorting to combinators, but it makes no sense to keep variables and make them all bound all the time.
Epilogue: so in what sense are they the same?
There is a theorem in model theory:
Let $\phi$ be a formula in context $x_1, \ldots, x_n$ and $M$ a structure in which we can interpret $\phi$. The following are equivalent:
- the universal closure $\forall x_1, \ldots, x_n \,.\, \phi$ is valid in $M$,
- for every valuation $\nu : \lbrace x_1, \ldots, x_n \rbrace \to M$, $\phi[\nu]$ is valid in $M$.
This is sometimes abbreviated (quite inaccurately) as “a formula and its universal closure are semantically equivalent”. This theorem is causing a lot of harm because mathematicians interpret it as “free variables are implicitly universally bound”. But the theorem itself clearly distinguishes a formula from its universal closure. It has a limited range of applications in model theory. It is not a general reasoning principle that would allow you to dispose of thinking about free variables.
You are in good company. Philosophers have thought about free variables for millennia, although they phrase the problem in the language of universals and particulars. They wonder whether “dog” is the same thing as the set of all dogs, or perhaps there is an ideal dog which is “pure dogness”, but then do we need two ideal dogs to make ideal pups, etc. The answer is simple: a free variable is a projection from a cartesian product.

I think two things lead people astray on this, and are things that working with type theory sets you straight on.
The primary thing is that most mathematicians work in situations where all variables range over a single, non-empty domain of discourse, and bounded quantification is secondary. This means you don’t have to worry about #4, say. Two is that, because every variable ranges over the same domain, they’re just not put in the context (assuming you even work with a notion of context at all). This leads you to wonder what the status of free variables are, when free variables are ‘obviously’ bound by the context (and then their semantics become pretty clear, at least in the categorical treatment).
This also gets you the weird (to a type theorist): $(\forall x. P(x)) \Rightarrow \exists x . P(x)$. “What if you have an empty model?” “We’ve ruled those out, just because.” The type theorist says instead: $y:A \vdash (\forall x:A . P(x)) \Rightarrow \exists x:A . P(x)$.
(Oh, and to cover myself, those $A$’s and $P$’s are metavariables in a schema, not free variables; I think.)
@Dan: I fixed your math and took the liberty to delete the second comment, as it is not obsolete. You can just write LaTeX code, with dollar signs at all. The blog uses MathJax.
That “a function is an expression in a context” from point 2 makes me wonder why lambda calculus has lambdas at all rather than reinterpret function applications as suspended substitutions.
The representations of LC with explicit substitutions that I remember still have lambdas though, I guess for higher-order cases it’s more convenient to have proper function types, especially when polymorphism is involved.
My guess is that people feel the need to bind variables in the way that you describe because they have an instinct for logical syntax without the training, and $\forall$ (and $\int$) are the only binding operators that they know. If the variables are not bound then they are loose, just as the threads in a piece of cloth that is not bound with a hem are loose and will become frayed.
A careful analysis of syntax uses an expression tree, which is like a piece of cloth in that it has threads. But when weaving cloth it is more useful to keep the warp on the loom so that it can be extended. Likewise, we keep hold of the free variables in an expression tree rather than binding them so that we can extend the expression by substituting for (and into) them.
So the answers to your first two questions are no, but
$$ x:X \quad \vdash\quad x+\sin x $$
is a function of (expression in) $x$, whilst
$$ x:X,\ y:Y \quad \vdash\quad x+\sin x $$
is an expression in $x$ and $y$.
These may be seen as examples of morphisms $X\to R$ and $X\times Y\to R$ in the category of contexts and substitutions.
I’m afraid that Dan is wrong in saying that Type Theory gets this straight, because for them $X\to R$ is another type and not a hom or a sequent. This is like confusing the first expression above with
$$ \lambda x.X+\sin x. $$
This is the same confusion as the one between $\phi(x)$ and $\forall x.\phi(x)$ about which Andrej has written, but it is still present in what he has written.
So, passing from arithmetic to logical examples, there is a three-way confusion amongst
$$ \Gamma,\ x:X \quad\vdash\quad \phi x, $$
$$ \Gamma \quad\vdash\quad \lambda x.\phi x $$
and
$$ \Gamma \quad\vdash\quad \forall x.\phi x, $$
by each of which I mean that the formula on the right is asserted to be well formed rather than true.
Does this matter? What does $\lambda$-abstraction actually do?
If you only have one (“arithmetical”) type, such as $R$, so $R\to R$ can only mean a hom and is not a type in itself, then $\lambda x.f(x)$ is either ill-formed or just means $x:X\vdash f(x)$. This is like working in a category with products but not exponentials, or a logic with conjuction and no implication.
It is only when we introduce these higher types that there is a distinction.
For the terms, if the only thing you can do with $\lambda x.f(x)$ is apply it then this is the same as substituting into $x:X\vdash f(x)$. Only when this function is reified (made into a thing in itself) as the value of some higher-type function is there any difference.
Returning to the examples from “school calculus”, when does the distinction between an expression and its $\lambda$-abstraction become significant? I was going to say, only when we study the Calculus of Variations, but it does come in earlier, with the calculation of the volume of a figure of revolution, given the function that provides its profile.
Recently, in the work of someone who definitely does understand these syntactic distinctions, I saw the commutative law written as
$$ \forall x, y. x \star y = y \star x. $$
Now, I consider that this is wrong and that it should be written instead as
$$ x,\ y :X \quad\vdash\quad x\star y = y\star x. $$
However, as I have explained, the reasons for my opinion are quite subtle.
I think there are a couple of reasons for this.
1) Some presentations of logic actually have that, in certain circumstances. I can point to the discussion in chapter 8 of GEB as a specific example:
“Rule of Generalization: Suppose x is a theorem in which u, a variable, occurs free. Then, ?u: x is a theorem. (Restrinction: No generalization is allowed in a fantasy on any variable which appeared free in the fantasy’s premise.”
2) I’d guess that if you asked them about particular examples, they’d agree that particular examples weren’t universally quantified, or that the variables are implicitly bound. The issue here is that mathematical english doesn’t have a linear translation into explicit logical syntax, so the scope of bound variables isn’t necesarilt obvious.
A couple of comments on some of your specific points.
1) “x+sin x” can certainly be the notation for a function, even if it is somewhat sloppy and ambiguous. And I am sure it causes endless confusion for undergraduates. But that doesn’t change the fact that in the appropriate context (i.e, when talking about functions on the complex plain), refering to the function “x + sin x” is mostly unambiguous
2) I wouldn’t say that “Consider any x??. Then bla bla bla, therefore ?.” introduces a free variable, it just moves it to the context. Certainly, it occurs in formulas as free, but it isn’t globally free.
How exactly does type theory get this wrong? It has (conceivably; depending on what system we’re talking about) all three of these things:
$\Gamma, x:S \vdash \phi \, x : T$
$\Gamma \vdash (\lambda x : S. \phi \, x) : S \rightarrow T$
$\Gamma \vdash \forall x : S. \phi \, x type$
Depending on what $\phi$ is. And they’re all different (though related by inference rules). The first two correspond to something like:
$\Gamma \times S \to T$
$\Gamma \to T^S$
And the last is a rather different animal. It’s written differently in the category theory, but it’s the same distinction.
If you’re referring to the syntax people use in something like Coq or Agda, a lot of this is elided because it can be reconstructed by a machine, and would be a burden to write down explicitly. People do tend to be less precise about the distinction when working in this format, but the vast majority of papers trying to be at all precise use the more formal proof trees with contextual judgments and whatnot, and I’d expect people studying type theory to be familiar with them.
Dan, I’m not saying that Type Theory makes a major error, just that the arrow syntax elides a distinction that is in fact more or less the same one that is the subject of Andrej’s original posting,
If you want to say that type theorists write $A\to B$ where categorists have $B^A$ and $x:A\vdash f(x):B$ where categorist write $f:A\to B$ then I suppose that settles the question.
Dear Paul,
Speaking as a type theorist who occasionally uses a little category theory, I do something pretty close to what you suggest. When I use categories, I tend to give the types of morphisms with the single-arrow $A \to B$, and use the double-arrow $A \Rightarrow B$ to write the exponential. This is mainly because the superscript notation $B^A$ gets unwieldy when $A$ is a syntactically large expression.
I do agree that axioms are typically better given in the form:
$$x, y: A \vdash P(x,y)$$
where $P(x,y)$ is an atomic proposition. There are good proof-theoretic reasons for this, in that cut-elimination fails if you give the axiom as:
$$\cdot \vdash \forall x,y:A.\; P(x,y)$$
since this introduces a forall without using the forall-introduction rule. I don’t quite know how to render this fact categorically, since it seems to lead to modelling contexts with multicategories, and I’ve never had the stamina for dealing with them.
I am a math/CS nub, so please forgive me if what I am about to say is really stupid, BUT… Is it not easier to simply declare free variables evil, and force every variable to be bound to either a definition (an existential proof) or an universal quantifier? Would this not make the language simpler to understand?
@Eduardo: it is possible to do away with all variables (look up “combinatory alegbra”), but humans are not good at using that sort of thing. As I said in the post, once you decide to have bound variables, you are more or less forced to have free variables as well. This is so because inside a universal quantifier $\forall x . \phi$ the variable $x$ is free. Or to put it in terms of computer science: a local variable is not visible outside of its scope, but inside its scope it acts as a free variable (one that is not “obsucred” and is “available”). You seem to think that free variables somehow complicate matters. They do not, bad math education and ignorance create confusion, and that complicates matters. Free variables are a completely natural concept. They reflect how humans speak and think. When you say “suppose $x$ is a real number”, and then you proceed to argue about it, you have used a free variable. Mathematicians do that all the time.
The Tortoise had something to say to Achilles about this, as narrated by Lewis Carroll.
I’m having the same questions as Eduardo and Tom Prince.
All of the presentations I have seen of first order logic include the derivation rule of Universal generalization. As someone who strongly agrees with this post, this rule has always seemed blatantly false to me.
I thought that the right way to fix it is to ban free variables from formal derivations. Of course, in math as it is normally written, one writes “Suppose $x$ is a real number …” and continues from there. My proposal would be that this “suppose” means that every single sentence which follows begins with an implicit $\forall_{x \in \mathbb{R}} \cdots$ and that, when giving formal derivations, we should actually write out the $\forall_{x \in \mathbb{R}} \cdots$ on every line. This doesn’t mean that $x^2 \geq 0$ isn’t a grammatical phrase, just that it is not the sort of thing which can appear by itself as a line in a proof.
But I haven’t seen any references which do this, and you seem to be saying it is a bad idea for some reason. If you could flesh out why this is a bad idea, and help me understand the alternative, I’d appreciate it.
Regarding the Universal Generalisation rule as it appears on the Wikipedia page to which you link, I am inclined to say that the reason why you consider that it is blatantly false is that it is false. However, I cannot say this because too many traditional authors in logic use the $\vdash$ notation in this way.
Just as one deduces $\vdash Q\Rightarrow P$ from $Q\vdash P$, so one should obtain $\vdash \forall x\in Q.P(x)$ from $x:Q\vdash P(x)$ and not from $\vdash P(x)$.
I am not sure how generally you propose to prepend $\forall x$ to predicates with a free variable $x$, cf. Andrej’s original comments. It’s technically correct in some cases. However, it is idiomatially wrong: this is not how we write mathematics. Specialists sometimes need to do things in idiosyncratic ways, but logicians have been notorious for doing this, to the extent that they end up talking about angels dancing on pinheads instead of the real (mathematical) world.
All of the difficulties with free variables, including the notorious Empty Domain fallacy ($\forall x.P(x)\vdash\exists x.P(x)$), are solved by just keeping track of them, in a context (explicit listing), along with the propositional hypotheses.
This does not mean to say that we have to retain all of this accountancy while writing ordinary mathematical arguments. Programming languages have developed good ways of indicating which variables are free (in scope) in any particular part of a program. Similarly, there is the box notation for doing it in logical proofs. However, there are also longstanding conventions for doing this in mathematical arguments written in natural language.
In the last paragraph/section, isn’t the “theorem” quoted actually the definition of $M \models \phi(x_1, \ldots,x_n$? Or, more precisely, one clause in the mutually recursive definition of what $M \models \phi$ means for any $\phi$.
@Ian: Well, yes, that just makes the theorem easier to prove… Also, with non-empty domains (an assumption often made in logic textbooks), one can then conclude from this and the completeness theorem that a formula is provable if, and only if, its universal closure is.
Paul, I really like your loom analogy! I wish there was a way to integrate a loom *into* the weave so the the analogy would be complete (and I could explain $l$-calculus to my mother).
I confess that I only thought of the loom analogy when I wrote my first post, so I will leave you to develop it.
I’m trying to understand Reason 2 in the original post: I would answer all of the questions 1. 2. 3. & 4. with yes (possibly adding something like the context $x,y\in \mathbb{R}$). Why is that inconsistent?
@Michael: it is precisely the missing context, or to be more precise, the missing abstraction, which is the problem. The expressions $x + \sin x$ denotes a number. This number depends on another number $x$, but so what? To get a function out of this you have to write something like $x \mapsto x + \sin x$ or $(x,y) \mapsto x + \sin x$, at which point it becomes clear what this function maps from (in the first case from $\mathbb{R}$ and in the second $\mathbb{R^2}$). Mahtematicians are doing themselves a great disservice by saying things like “$n + 3 k$ is an odd number” at one point, and then “$n + 3 k$ is a monotone function” at another. They should say “$n + 3 k$ is an odd number$” but $n \mapsto n + 3 k$” is a monotone function”.