Sometimes all functions are continuous

You may have heard at times that there are mathematicians who think that all functions are continuous. One way of explaining this is to show that all computable functions are continuous. The point not appreciated by many (even experts) is that the truth of this claim depends on what programming language we use.

Surely they are not all continuous!?

You must be thinking to yourself: how can anyone in their right mind claim that all functions are continuous – here’s one that isn’t:
$$\mathrm{sgn}(x)=\begin{cases}
-1 & x < 0 \\
0 & x = 0 \\
1 & x > 0
\end{cases}$$
At $x=0$ the sign function jumps from $-1$ to $0$ to $1$, which is a nice discontinuity. As crazy as it seems, it makes sense to refuse to admit that $\textrm{sgn}$ is a legitimate function!

The official definition nowadays is that a function $f : A \to B$ is the same thing as a functional relation on $A \times B$. Recall that a relation $R$ on $A \times B$ is just a subset of $A \times B$. It is functional when it is

  • single-valued: if $(x,y) \in R$ and $(x,z) \in R$ then $y = z$.
  • total: for every $x \in A$ there is $y \in B$ such that $(x,y) \in R$.

If you are a programmer or a computer-sciency mathematician, you will probably recite the above definition when asked by a judge what a function is. Alas, your own brain secretly thinks that functions are the same thing as procedures that you can implement on your computer. Perhaps you will not admit it, but a careful psychoanalysis of your mind would reveal, among other things, that you never ever concern yourself with non-computable functions. (Note: if you are a physicst, your mind is of a different sort. I shall address your psychology on another day.)

There are no discontinuous ones

So for a moment let us shake off the prejudices of classical training and accept the following definition, which is accepted in certain brands of constructive mathematics.

Definition: A function $f : A \to B$ is a computational procedure which takes as an input $x \in A$ and yields as output a uniquely determined $y \in B$.

We have left out many details here, but the basic intuition is clear: “It ain’t a function if it can’t be computed”. Now we return to the sign function defined above. Is it a function? In order to compute $\mathrm{sgn}(x)$, we must for any given real number $x$ discover whether it is negative, zero, or positive. So suppose we inspect the digits of $x$ for a while and discover, say, that the first billion of them are all zero. Then $\mathrm{sgn}(x)$ is either $0$ or $1$, but we cannot tell which. In fact, no finite amount of computation will guarantee that we will be able to tell whether $x=0$ or $x > 0$. Since infinite amount of computation is impossible in finite time (please do not drag in quantum computing, it does not help) the sign function cannot be computed. It is not a function!

An argument similar to the one above shows that we cannot hope to compute a function $f : \mathbb{R} \to \mathbb{R}$ at a point of discontinuity. With a more formal treatment we could prove:

Theorem: There are no discontinuous functions.

As long as we stick to functions that can actually be computed, we will never witness a discontinuity. Classicaly, there is no difference between the above theorem and

Theorem (classically equivalent form): All functions are continuous.

When constructive mathematicians says that “all functions are continuous” they have something even better in mind. They are telling you that all functions are computably continuous. This is where interesting stuff begins to happen.

They might be all continuous

To simplify the argument, let me switch from real-valued functions $\mathbb{R} \to \mathbb{R}$ to something that is more readily handled in a programming language. Instead of the reals, let us consider Baire space $B$, which is the space of all infinite number sequences,
$$B = \mathbb{N} \to \mathbb{N}.$$
Programmers will recognize in $B$ the datatype of infinite streams of non-negative integers, or the total elements of type nat -> nat. A function $f : B \to \lbrace 0,1 \rbrace$ is said to be classically continuous if the value $f(x)$ depends only on finitely many terms of the input sequence $x$. A function $f$ is computably continuous, if we can actually compute an upper bound on how many terms of $x$ are needed to determine $f(x)$. (This characterization of continuity of $f : B \to \lbrace 0,1 \rbrace$ comes from taking the product topology on $B$ and discrete topology on $\lbrace 0,1 \rbrace$.) Since we are concerned with computable continuity we shall drop the qualifier “computable” and call it just continuity.

Theorem: All functions $B \to \lbrace 0,1 \rbrace$ are continuous.

Without going into too many details, let me just state that the proof of the above theorem comes down to the following programming exercise.

Exercise: Write a program $m : (B \to \lbrace 0,1 \rbrace) \to B \to \mathbb{N}$ such that, for any $f : B \to \lbrace 0,1 \rbrace$ and $x \in B$ the value of $f(x)$ depends at most on the first $m(f)(x)$ terms of $x$. More precisely, if $M = m(f)(x)$, then for every $y \in B$, if $x_0 = y_0$, …, $x_M = y_M$ then $f(x) = f(y)$.

The program $m$ is called a modulus of continuity functional because it gives us information about how $f$ is continuous: any input $y \in B$ which agrees with $x$ in the first $m(f)(x)$ terms gives the same output $f(y)$ as does $f(x)$. Incidentally, such an $m$ is an order 3 functional.

A strategy for computing $m(f)(x)$ is clear enough: given $f$ and $x$, inspect the computation of $f(x)$ to discover how much of input $x$ is actually needed to arrive at the answer. Since the computation of $f(x)$ takes only finitely many steps, an infinite prefix of the sequence $x$ will suffice.

Let us write $m$ in a real programming language. How can we do that? The trick is rather obvious, once you think of it. Instead of computing $f(x)$, we compute $f(y)$ where $y$ is special: it gives the same answers as $x$ but it also records the highest index $n$ which was fed to it. In ocaml we get something like this:

let m f x =
  let k = ref 0 in
  let y n = (k := max !k n; x n) in
    f y ; !k

Function $y$ behaves like $x$, except that it stores in the reference $k$ the highest value of $n$ for which $y n$ has been computed so far. To compute $m(f)(x)$ we evaluate $f y$. After it returns, we look at $k$ to see how many terms $f$ used.

Another possibility is to use exceptions instead of mutable store. This time, we feed $f$ a $y$ which behaves like $x$, except that if $f$ attempts to compute $y n$ for $n \geq k$, we throw an exception. Here $k$ is a variable parameter. First we try to compute $f y$ with $k = 0$. If we get an exception, we catch it and try $k = 1$. As long as we keep getting exceptions, we increase the value of $k$. At some point no exception happens and we know that $f$ looked only at those values $y n$ for which $n \leq k$. The code in ocaml is this:

exception Abort
let m f x =
  let rec search k =
    try
      let y n = (if n &lt; k then x n else raise Abort) in
        f y ; k
    with Abort \to search (k+1)
  in
    search 0

We now have two solutions, one using mutable store, another using exceptions. Could we use some other feature instead? Yes, we could program $m$ using continuations, which I leave as an exercise.

An interesting question comes to mind: which programming features, apart from the ones we mentioned above, allow us to program $m$? In terms of Haskell, the same question is which monads allow us to program $m$. Can we do it without any extra features and use just “pure” functional programming?

If by pure functional programming we understand a functional programming language with natural numbers, booleans and recusive definitions, also known as PCF, the answer is no. The proof of this uses denotational semantics and domain theory, and I will not go into it now. You may entertain yourself by trying and failing to define $m$ in pure Haskell or ML, i.e., no side effects, no exceptions, no mutable store, no parallelism, no monads.

As far as I know, nobody has studied systematically which programming features allow us to program $m$. So I pose it as a question:

What can be said about those programming features X for which PCF+X causes all functions to be computably continuous?

We know mutable store, exceptions and continuations give us continuity of all functions. Some other candidates to think about are: parallelism, non-determinism, communication channels, timeouts (interrupting computations that take too long) and unqouting (disassembling the code at run-time).

Type II Computability

Very likely “Type 2 Computability” experts are lurking around. This comment is for them. In Type 2 Computability we do not work with a programming language but directly with (Type 2) Turing machines. There is a well-known proof that in this setting the modulus of continuity $m$ is computable. The proof uses the fact that functions $B \to \lbrace 0,1 \rbrace$ are represented in such a way that the value of $m(f)(x)$ can be read off the tape directly.

Type II Turing machines are not special. They are just another programming language in disguise. In comparison with pure functional programming, a Type II Turing machine can do the following special thing: it can do a thing for a while, and if it takes too long, it just stops doing it and does something else instead. The programming feature that this corresponds to is timeout, which works as follows: $\mathtt{timeout}\, k\, x\, y$ evaluates $x$ for at most $k$ steps. If $x$ evaluates successfully within $k$ steps, its value is returned. If the evaluation of $x$ is interrupted, $y$ is evaluated instead. I have heard John Longley claim that Type II Turing machines are equivalent to PCF+timeout+catch (where “catch” is a form of local exceptions), but I think the question was never settled. It would be interesting to know, because then we could replace arguments using Turing machines with source code written in ML+timeout+catch.

Is there a lesson?

The lesson is for those “experts” who “know” that all reasonable models of computation are equivalent to Turing machines. This is true if one looks just at functions from $\mathbb{N}$ to $\mathbb{N}$. However, at higher types, such as the type of our function $m$, questions of representation become important, and it does matter which model of computation is used.

30 thoughts on “Sometimes all functions are continuous

  1. “In order to compute sgn(x), we must for any given real number x discover whether it is negative, zero, or positive. So suppose we inspect the digits of x for a while and discover, say, that the first billion of them are all zero. Then sgn(x) is either 0 or 1, but we cannot tell which. In fact, no finite amount of computation will guarantee that we will be able to tell whether x=0 or x>0. Since infinite amount of computation is impossible in finite time (please do not drag in quantum computing, it does not help) the sign function cannot be computed. It is not a function!”

    By this logic, wouldn’t all functions be non-computable? For all functions, we need to know what x is to determine what the function output should be.

  2. Of course not. The special thing about `\text{sgn}(x)` is that you need to “know” `x` to infinite precision in order to produce just one bit of output. But there are many functions where, in order to compute `n` bits of output you only need to know some finite amount of input bits. For example, if we took the function `f(x) = 2x`, then in order to get `n` bits of `f(x)` correct, we need to know only the first `n+1` bits of `x`.

    I think we need to be careful about what “to know `x`” means. It makes no sense to say “a machine knows the number `x`” — a machine might be given `x` as an infinite sequence of digits on an input tape, but it does not “know” `x` in the sense of it being aware of `x`.

  3. This might sound naive…but why not use an encoding for x that allows computing sign in finite time? (use first two bits to encode whether the number is positive, negative, or zero)

  4. It’s not naive but it does not work because it destroys computability of basic operations. Suppose we did that. Then every operation on reals, e.g. addition and subtractions, would have to compute not only the magnitude of the result, but also its sign. In particular, we would have to compute the sign of `x – y`, given `x`, `y` and their signs. But this is impossible, as it would allow us to compare real numbers (compute `x – y` and look at the sign to see whether `x = y`), from which we could solve the Halting Problem.

    So, if you’re willing to sacrifice computability of addition and subtraction, you can do a thing like that. But then you are not representing “real” real numbers anymore.

  5. Can the proof be done by simply observing that this functional m does not preserve extensionality of functions, and hence cannot be definable in the pure lambda calculus? Anyway, this is a nice example.

    I’m interested in the continuation implementation because I thought that continuations could be implemented in the pure lambda calculus. At least they seem to be done in such a way in Haskell’s Control.Monad.Cont.

  6. Indeed, the proof that you cannot write modulus of continuity in pure `lambda`-calculus has to do with the fact that modulus seems to be non-extensional. But do you really think it is just a matter of “observation” that the functional `m` is necessarily non-extensional?

    And here is functional `m` with continuations. I used SML/NJ continuations. If there is a nice person who is proficient in Haskell, please rewrite it in Haskell for roconnor.

    open SMLofNJ.Cont ;
    fun m f x =
      let
        fun search n =
          callcc (fn k => (f (fn j => if j < n then x j else throw k (search (n+1))) ; n))
      in
        search 0
      end
    
  7. Well, I think it is obviously true, but probably hard to prove. ;) In the following, newf transforms f into an extensionally equivalent function that accesses at least the 100th element.

    newf f b = (f b)+(b 100)-(b 100)

    or even worse newf’ accesses some element greater than f does.

    newf’ f b = (f b)+(b ((m f b)+1))-(b ((m f b)+1))

    Of course to prove it requires one to actually define the semantics of evaluation to determine what the elements accessed are. Then proving all this works is probably not trivial. For instance, I suppose it possible to have some strange semantics that replaces e-e directly with 0. Then my example doesn’t work.

  8. No, it is not crazy. By the same reasoning, we should all stop doing mathematics because “most problems are unsolvable”, right? But clearly mathematics has had an enormous impact on our society (a good one). Similarly, real numbers have had an imporant value in the history of science. Without classical analysis, which is based on real numbers, physics probably would not have made the progress it did.

    Even if it turns out that in the future we will not use real numbers at all (I find this probable, in fact), this does not mean that today we should dismiss real numbers based on speculation by a couple of physicsts and Chaitin’s fine piece of popular science writing.

    If you are claiming that it is crazy to talk about computations on real numbers because somehow computation on non-computable reals is an ill-defined concept, then you are just flat out mistaken.

  9. I ported the code to Haskell, and now I clearly see what is going on. Although it is essentially written in the pure lambda calculus one is still forced inside the continuation monad (of course). This means the type of m becomes

    ((a -> Cont a b) -> Cont a t) -> (a -> Cont a b) -> a

    where a is something like Integer.

    So the input are functions and functionals that are Kleisli arrows rather than pure function arrows.
    This type is basically a nice way of writing

    ((a -> (b->a)->a) -> (t->a)->a) -> (a -> (b->a)->a) -> a

    Continuations are a nice way of bumping up the order of the type. ;)

    For those interested, here is the code:

    import Control.Monad.Cont
    m f x = runCont (search 0) id
      where
       search n =
        callCC (\\k -> do {f (\\j -> if j < n then x j else k=<<(search (n+1)));
                          return n})
    
  10. Hi! In comment 4, you write: “But this is impossible, as it would allow us to compare real numbers [...], from which we could solve the Halting Problem.”

    Are there formal systems in which the Halting Problem is solvable? If so, does this have any consequences on the cited passage? (Sorry if my questions are naive or FAQs. Pointers to papers would totally suffice.)

  11. I am not quite sure you asked what you meant to ask. To solve the Halting Problem one has to provide a program which does a certain thing. Such a program does not exist. The proof that the required program does not exist only uses logical principles that everybody agrees on, i.e., it is valid both intuitionistically and classically, so one would have to come up with a rather bizarre logic to invalidate the proof. Are you asking whether it is possible to change the meaning of the word “program” so that there will be one that solves the Halting Problem?

    The Halting Problem also manifests itself in logical form as the question whether the following holds:

    `forall f : NN -> {0,1}. (forall n : NN. f n = 0) or not (forall n : NN. f n = 0)`

    Trivially, the statement holds if we use classical logic, as it is just a special case of the law of excluded middle. The statement cannot be proved in constructive logic. There are formal systems strictly weaker than classical logic in which the statement holds. All of this has little to do with what can actually be computed (with the usual understanding of the word “program”). I hope I am answering your question, whatever it is.

  12. Thank you very much, you have not only answered my question, but also gave me new food for thought, namely the logical form of the Halting Problem. :) Thanks!

    (For reference, my question was answered in “so one would have to come up with a rather bizarre logic to invalidate the proof”. I thought that maybe there’d exist some known logic which is a) not too bizarre and b) invalidates the proof.)

  13. I just found your site, by way of a link to this post. Good stuff!

    WRT your example of sgn, here’s a simpler version, with a question:

    sg n = if n < 0 then -1 else 1

    From your argument for the continuity of sgn above, it appears that you’re using a two’s complement representation, scanning the bits from the “left” (i.e. your statement “… the first billion of them are all zero. Then sgn(x) is either 0 or 1 …”).

    I don’t see how a similar argument applies to sg. Isn’t the first bit (again, in two’s comp. representation) enough to distinguish negative from non-negative?

  14. I’m struggling with ‘All functions B→{0,1} are continuous’, yet it seems I can make a function over B, such as: ‘if the number stream contains a 1, return 1, otherwise return 0′, which should be rejected for the same reason ‘sgn’ is rejected.

  15. Actually, I need to adjust my previous comment and ask ‘If ‘f’ has no upper bound, is m a function?’.

  16. Hopefully last update (sorry slow on the B uptake)… not if ‘number stream contains a 1, but f only sees a 1 only after a very large number of other values’ (understanding B as sets of sets of N). If the Type II proof orders the values on the tape, I think that’s cheating, if they don’t, then it seems like there needs to be some number of ‘timeouts’ above which the machine says ‘not a function’. If so, they should have the same problem as non-type II machines on random input-value tapes.

  17. Reply to comment 15 by jn: First, I did not make an argument “for continuity” of sgn, but rather “against continuity”. Whether your sg is continuous (computable) depends on how you represent real numbers. Assuming a standard representation of reals (one in which addition is computable, for example), your sg is not computable and is not continuous as a map between representations. I do not know what “two’s complement representation” is for real numbers (including negative ones). Please provide more detail about your representation. Is addition computable in your representation?

  18. Reply to comments 16, 17 by NotQuiteFunctional: I am afraid I do not understand you very well.

    Reply to 16: the function you are suggesting is not computable, therefore not a function for the purposes of this discussion.

    Reply to 17: yes, but I suspect what you wrote is not what you meant to ask.

  19. If you can write down a base-10 representation of a real number then it must follow that that number is rational and finite, because irrational numbers have an infinite numeric representation and it is impossible to write down an infinite string and all finite base-10 numerals represent finite rational numbers.

    So if you can write a number down, I can implement a sgn function that will tell you its sign using effort at most proportional to what it took you to write the number down.

    If you get to manipulate real numbers containing an unbounded amount of information, then I get to have functions that process an infinite amount of information in a finite time. No finite tools can work with infinite bundles of information.

  20. @NovaDenizen: you misunderstand what it means to inspect the digits of a number, or what it means to have a real number given as a sequence of digits. Namely, the real number is not given by an infinite sequence that is actually written down anywhere, but rather as an algorithm, or a black box, which accepts a number `n` and outputs the `n`-th digit. At no point do I manipulate an infinite amount of information, yet I can look at any digit I want. Of course, I cannot look at all digits in a finite time, which is sort of the point of the post.

  21. Isn’t it very easy to come up with functions that can be computed in finite, neigh constant time that aren’t continuous. For instance take the function in base 10:

    FirstDigit(x)= 0 if the first digit of x is 0,
    1 if the first digit of x is 1,
    etc,

    This function is obviously not continuous and should be computable for any real number, since it’s only needed to look at one digit.

    I can only assume there’s something obvious that I’m missing.

  22. @Jacob: your function is of course continuous as a map from the streams of digits to digits. However, this function does not represent a function between real numbers because it does not respect equality of reals. Let me call your function `f` for easier notation. Then we have `f(0.99999…) = 0` and `f(1.00000…) = 1`, however the streams of digits `0.99999…` and `1.0000…` both represent number one. One way to get around this problem is to forbid streams which end with `99999…` If we do so your function `f` will be well-defined as a map from the real numbers to digits. However, the topology that is induced on the reals with this representation is not the usual Euclidean topology but rather something that goes under the name Fine metric. And in this topology `f` is continuous again!

    In general, when we compute with the points of a space `X` we do not actually use the points themselves but their representations in terms od datatypes (such as streams of digits). It is important that we take as the topology of `X` the one induced by the representation, otherwise we can make any function continuous or discontinuous at will just by changing the topology of `X`.

  23. I’m coming to this discussion very late. I wanted to make two comments. First, thank you for helping me to understand, at last, the sense of the apparently obviously false statement that every function is continuous. Secondly, I thought I’d just point out that I was confused by the definition you gave of a function in the post and reached my eventual understanding only after reading your answer to the first question above. I think it would be helpful to make it more explicit what computation over the real numbers means. That is, a real number is something like an oracle that tells you, for each n, what the nth digit is. A function from the reals to the reals is a computational procedure that can convert an oracle for x into an oracle for f(x). The classical function sgn is not a function in this sense because, as you point out, if x happens to be 0 then you need to know infinitely many digits of x to be sure of even the first digit of sgn(x).

    Actually, I now see that that’s not quite the right way of putting it, since then x goes to 3x wouldn’t be a function. (Proof: you need to know infinitely many digits of 1/3 to know whether its image begins with 0.9 or 1.0.) So to make sure we’re talking about the right topology we have to talk about knowing f(x) to arbitrary precision rather than about knowing digits.

    But if you define a function to be a computable procedure that can output f(x) to arbitrary precision given x to sufficient precision, it feels almost tautologous to say that a function is continuous.

  24. @gowers: Precisely, it is almost tautologous that computable implies continuous when you get the representations and topologies right. This might be why some people think that “all functions are continuous” is a reasonable axiom. As always, the difference between classical and constructive math is not in what is true, but in what words mean. Constructive mathematicians mean something else by “function” than classical mathematicians do.

Leave a Reply

Your email address will not be published. Required fields are marked *

7 × 1 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>