Mathematics and Computation

A blog about mathematics for computers

Representations of uncomputable and uncountable sets

Occasionally I hear claims that uncountable and uncomputable sets cannot be represented on computers. More generally, there are all sorts of misguided opinions about representations of data on computers, especially infinite data of mathematical nature. Here is a quick tutorial on the matter whose main point is:

It is meaningless to discuss representations of a set by a datatype without also considering operations that we want to perform on the set.

Unless you tell me what it is that you would like to compute about the set, I can represent it whichever way I want. You will never look at my representation because you will never do anything with the elements of your set. By this line of reasoning you can represent any set $X$ simply by the unit type, which has a single element $()$. Naturally, you will object that this makes little sense because all elements of $X$ would have the same representation. But that means you do care about the structure of $X$: you are interested in the inequality relation on $X$.

For a less trivial but still silly example, consider the set of all Turing machines. The famous halting problem asks for an algorithm that decides, given a description of a Turing machine $T$, whether $T$ halts when we run it on the empty tape. Such an algorithm is easy to obtain if all we care about is the halting problem. Simply represent each Turing machine $T$ with a bit which is $1$ if $T$ halts and $0$ if it does not. This way it is really easy to decide whether $T$ halts. Silly? Yes, but it makes a point: it is not the set we really want to represent but a set together with relevant operations on it. What are the relevant operations on Turing machines? Essentially, we want a representation which makes the u-t-m and s-m-n theorems true. The former says that the representation should be such that we can compute what happens when we apply a Turing machine to an input, and the latter that the currying operation should be computable. One can prove that any such representation has an undecidable Halting problem. That is much better.

In general, we can make any function $f : X \to Y$ whatsoever computable if we are allowed to change the representation of $X$: represent elements $x in X$ as pairs $(a, b)$ where $a$ represents $x$ (in the original representation of $X$) and $b$ represents $f(x)$ in the representation of $Y$. The function $f$ then becomes easily computable as the first projection: given a pair $(a,b)$ representing $x$, the value $f(x)$ is represented by $b$.

Sometimes it is quite hard to discover which operations on a set are “relevant”. What would you say is the relevant structure of real numbers $\mathbb{R}$? Specialists in computable mathematics disagree about the answer. Everyone wants computable arithmetic operations $+$, $-$, $\times$, $/$ and computable constants $0$ and $1$, but what else? If you say that the relation $<$ should be computable as a function from $\mathbb{R}$ to boolean truth values, you are in good company, although you have opted for a rather unrealistic model of real number computation. It is much more reasonable to require that $<$ be computable as a function from $\mathbb{R}$ to semidecidable truth values.
In essence, the relevant structure of the reals is that of a Dedekind complete Archimedean ordered field which is also overt, Hausdorff, and locally compact. No, wait! The relevant structure is that of a Cauchy complete Archimedean ordered field! Oh no, I disagree with myself!

At least the good news is that once we have settled the relevant mathematical structure of a set, we can automatically compute a description of its computer representation.

Remember then, when you see someone claiming that such-and-such set is not representable by a computer, you should always ask “but what operations on the set do you want to compute with?”

Representing uncountable sets

There are two popular arguments which allegedly show that we cannot represent uncountable sets by computers:

  1. Computers represent everything by finite sequences of 0’s and 1’s. There are only countably many such sequences, therefore, we cannot represent an uncountable set.
  2. Even if we allow infinite binary streams, we will be able to actually use only the computable ones, and there are only countably many of those.

It is true that there are only countably many finite sequences of 0’s and 1’s. To be more precise, there are computably countably many such sequences because there is a program that enumerates them. But did you know that a computably countable set may contain subsets which fail to be computably countable? For example, the set of all Turing machines is computably countable, but the subset of those Turing machines which do not halt is not. The first argument relies on the fact that we are talking about arbitrary enumerations, including non-computable ones.

In fact, if we take computability seriously, the second argument immediately fails because the set of computable infinite streams of 0’s and 1’s is computably uncountable by a famous diagonal argument. (Note that “computably uncountable” is more than “not computably countable”: the former means that given any enumeration we can compute an element outside the enumeration, while the latter just says that there is no enumeration of all elements.) The second argument stands only if we mix “computable” and “non-computable” ingredients in just the right way.

Even if we disregard the theoretical issues about computable vs. non-computable enumerations, we are still left with the question whether real computers “contain” non-computable sequences. Do they? Does the universe? Or do you claim that all streams of 0’s and 1’s that physicists are able to feed into computers are definitely computable? Perhaps physicists should devise an experiment that would tell us something about existence of non-computable streams. I think we would quickly discover that such an experiment is in principle impossible because we may only ever observe finite prefixes of a binary stream. According to the Verification principle this makes the question meaningless and any position about its truth an article of faith.

To conclude, let us then say that computers can represent all infinite binary streams of our universe, because computers are equipped with I/O devices that let them interact with the universe. Whether there are uncountably many binary streams in the universe is a matter of faith if you are a human physicist trained in classical logic. If you are a computer the matter is clear: there are computably uncountably many binary streams.

A much more interesting question is whether the space of binary streams is compact, but that is another story.


[...] Andrej had a nice post on representing uncountable sets. [...]


I'm not sure it's fair to say that physicists actually would, on the basis of something like the verification principle, take the question of the possibility of physical generation of non-computable streams to be meaningless. The fact that we can only observe finite prefixes of a binary stream leaves us in a similar position regarding "There exist binary streams with only finitely many 0s" as with "There exist uncomputable binary streams", does it not? And yet, every physicist would say the former question is settled (by the inductive grounds that lead us to believe certain streams are all 1s).

Similarly, we could at least potentially imagine concluding, on inductive grounds, that some physical process actually yields, say, a binary stream whose nth value differs from that of the nth program on itself whenever such computation halts. And thus be led to conclude the physical existence of uncomputable binary streams. I don't think such a situation is terribly likely, but only because I, for reasons that are probably more superstition than logic, doubt the world works that way, and thus don't think the requisite sort of empirical evidence will ever be found. However, it seems clear that the methods of scientific practice themselves present no necessary barrier to drawing such a conclusion.

(Where I wrote above "nth program on itself", I meant of course "nth program on input n")

You are quite correct that I was a bit hasty in dragging in the Verification principle. What actually matters is whether the laws of physics predict existence or non-existence of computable streams, not what we can verify in finite time. Also note that there might exist funky experiments that test for existence of non-computable streams in a probabilistic way without actually exhibiting a non-computable stream.

However, there does seem to be a difference between "There exist binary streams with only finitely many 0s" and "There exist uncomputable binary streams", namely that I can produce a mechanism which outputs an infinite stream od 1s and prove from currently accepted laws of physics that it does not contain any 0s (modulo idealizations such as "the mechanism will not break or run out of energy and time"). Is it possible to build a mechanism which outputs a binary stream and prove, using currently accepted laws of physics, that it outputs a non-computable stream? Is it possible to prove from currently accepted laws of physics that no such mechanism exists? I suspect the current laws of physics do not say much about either option.

> but what operations on the set do you want to compute with?

I have two questions regarding this condition:

  1. Why is it important from pragmatic point of view? For a computational task all we need is to be able to feed the input and read the output (maybe in a uniform way), we don't need to be able to compute any function on representation.

  2. Is this a sufficient condition for accepting a representation(/coding/numbering/naming)? As I wrote in 1, what is important is that we should be able to communicate with the computing machine. Does the operations grantee this?

Kaveh: it is meaningless to communicate data if the receiving party does not analyze it in any way. To know what given data communicates/represents/names is to be able to do something reasonable with the data, i.e., you want to perform some operations on the data. So this is quite pragmatic, really.

To give you an example (that is essentially the same as what I wrote above): suppose we are two banks that communicate credit card numbers to each other. We would like to agree on a protocol which allows us to "feed the input" and "read the output" but we impose no requirement on what we should be able to do with the credit card numbers. Then the following (bizarre) protocol is ok: whenever you want to communicate a credit-card number to me, just send me the string "hi". You have fed the input and I have read the output.

As soon as we want to do something with the credit card numbers, this protocol will suck. And that is the whole point of my post: first figure out what it is that you want to do (such as "I want to be able to tell the digits of the credit-card number"), then you choose a representation.

You also ask about sufficient conditions that make a representation acceptable. This is a question that was studied by various people. For example, in Type Two Computation, "good representations" are characterized topologically and go under the name admissible. Another possibility is to express the abstract mathematical properties that characterize your data and then compute the representation from that using the realizability interpretation of logic and type theory. That is what RZ does.

Steven Elliott

"let us then say that computers can represent all infinite binary streams of our universe"

You forgot to mention that it is also a matter of faith whether or not there exists Infinite binary streams in our universe. I'm pretty sure there are none, and all binary streams are finite.

@Steven: The assumption "computers can represent all infinite binary streams of our universe" does not presume anything about existence of such streams. So I do not see why your remark is relevant at all. Moreover, I think that discussions of the kind "Is there an infinite stream in our universe?" are a bit pointless unless the concepts involved are very well defined, otherwise one can always defend his position by saying "that depends on what you mean by 'is'".

andrej> The assumption “computers can represent all infinite binary streams of our universe” does not presume anything about existence of such streams.

Well, you are quite right to call Steven in on his inference, but this does presuppose (since here we no doubt presuppose Church--Turing) that infinite streams are at least recursive, which is a nontrivial claim. Consider the incompatible claim that there are infinite streams in nature, but that these all are nondeterministic. It does seem to me to be defensible.

@Charles: I reiterate; I did not presume anything about computability or existence of infinite binary streams of our universe when I said "computers can represent all infinite binary streams of our universe". Also, because the proof that there are uncountably many infinite binary streams is constructive, it works no matter what we assume about computability of infinite binary streams.

[…] Bauer wrote a blog post in which he discusses the importance of representation. He gives two examples. In the first, he […]

Post a comment:
Write your comment using Markdown. Use $⋯$ for inline and $$⋯$$ for display LaTeX formulas, and <pre>⋯</pre> for display code. Your E-mail address is only used to compute your Gravatar and is not stored anywhere. Comments are moderated through pull requests.