Induction and AI
I explore how: pattern recognition relates to computation, its connection to logic through induction and deduction, and why universal pattern recognition defies the simplicity of universal computation.
What's the next entry in this list?
$$ \text{B123A, A231B, B312A, A123B, } \ldots $$
If you feel very strongly that the next string should be $\text{B231A}$, then this is because you've recognized a pattern, even though this is most likely the first time you've seen this sequence. Once you've found the pattern, you then used it to predict the next string in the sequence.
Like many, I believe that this ability to recognize patterns is the hallmark of intelligence. In fact, most of the other abilities commonly associated with intelligence such as learning, planning, analogical reasoning, and so on, can be seen as either directly connected to or dependent upon our capacity for pattern recognition.
I would have preferred to use the term AI instead of pattern recognition, but many disagree that the latter is sufficient for AI.
Pattern recognition and computation
Pattern recognition and computation are intimately related to each other.
Computation is any calculation process that is well defined. It can be thought of as the process that takes a program $p$ as input and then mechanically generates an output $x$ that is the result of the calculation described in $p$. Thus, if you can express unambiguously how to do a task, then computation will perform the exact task you've described. The Church-Turing thesis proposes that these mechanical processes are precisely those computable by a Turing machine (or any formalism that is equivalent to it, such as lambda calculus, Post machines, etc.).
Pattern recognition can be regarded as the inverse process: given an input $x$ it finds a program $p$ (or a set of programs) that would mechanically compute said sequence. In other words,
$$ \text{PatternRecognition}(x) = \text{Computation}^{-1}(x). $$
In this sense, a pattern is a program, and pattern recognition is program synthesis. Programs are the most general way of specifying describing formal relationships of the parts within a whole. They encompass symmetries (and hence, the physical laws), arithmetic sequences, fractals, even the geometrical patterns found in your carpet.
Computation is deduction, pattern recognition is induction
There's a connection between the aforementioned processes and logic. Logicians identify two distinct reasoning methods: deduction and induction.
Deduction is the method which yields inevitable conclusions based on given premises and rules. The archetypal example is that if you know “all humans are mortal” (premise) and “Socrates is a human” (premise), then you can unequivocally conclude that “Socrates is mortal” (necessary conclusion) through the use of a categorical syllogism (rule).
Induction: This method produces general hypotheses (rules & premises for instance) derived from specific observations. An example is observing that the sun has risen every day in recorded history and thus concluding that the sun will rise every day (a general rule, but not an absolute certainty). Another example would be to observe that “Socrates is mortal” and then conclude that this is because “all humans are mortal” and “Socrates is a human”, although the credence attributed to this hypothesis is arguably quite low given the anecdotal data. Induction is often criticized because unlike deduction, it does not lead to a single conclusion, but multiple possible explanations.
Logicians sometimes acknowledge other forms of reasoning, for instance abduction. I won't explore them separately here, as I believe they're subsumed by deduction and induction for the purposes of our discussion.
If we identify rules and premises with the information contained in a program, computation appears to correspond to a type of deduction restricted to processes that can be carried out through mechanical means. Induction, on the other hand, can be regarded as a process that finds rules and premises from conclusions. This suggests the following parallels between logic and computer science:
Term in Logic | Term in CS |
---|---|
Deduction | Computation |
Induction | Pattern Recognition |
That is, if we limit ourselves to processes that can be executed through mechanical means, deduction becomes computation. Would it be right to say that pattern recognition is just induction restricted to mechanical processes?
Universal computation vs universal pattern recognition: a key distinction
Universal computation, that is, the concept of computation in itself, is intriguingly also computable. This means that it can be described with a program (i.e. a finite amount of information) despite its vast scope. This profound insight is a cornerstone of computer science: Alan Turing's discovery of the Universal Turing Machine (UTM) showed that a single program could simulate any other program. In today's terms, this is reflected in programming language interpreters, which can execute any valid code, making computation elegantly self-contained within a finite, describable set of rules.
But universal pattern recognition—the inverse of universal computation—tells a different story. Unlike computation, recognizing patterns from observations cannot be fully captured by a program. This is because determining whether a given program will halt (Turing's halting problem) is undecidable, hence determining whether a candidate program will halt and produce the desired output is undecidable too. (For those who know, this is basically the reason why Solomonoff induction is incomputable).
A naive attempt to encapsulate universal pattern recognition could involve a cheat list: an infinite binary sequence where the bit in position $n$ indicates whether the $n$th-program would terminate. But this sequence contains incompressible information, and thus the resulting description of universal pattern recognition would overall involve an infinite amount of bits, which by definition isn't a program.
Thus, while universal computation is simple, the infinite complexity of universal pattern recognition defies any finite rule set.
If we give up universality, then pattern recognition is computable
This doesn’t mean pattern recognition is impossible. In fact, Shane Legg demonstrated in 2006 that perfect pattern recognition can be achieved within a restricted domain.
If we limit the set of candidate programs that we want to use as an explanation to a finite class, we can create a universal program capable of recognizing patterns within that class. In other words, given any input generated by a program in this class, the pattern recognition program will identify it. However, the trade-off is that the recognition program must be at least as complex (i.e. long) as the most complex program in the class.
This also tells us that there isn't a single pattern recognition algorithm that dominates every other one, but rather, that the search for pattern recognition programs is open-ended: for any given pattern recognition program there is always another that encompasses the class of the former algorithm. In fact, the set of pattern recognition programs forms a partial order.
In addition, the fact that for any given finite class of programs there exists a pattern recognition program that is universal for the class shows how induction was transformed into a deduction. This is related to deductive-nomological model of scientific explanation which was proposed as an alternative to induction.
Induction and the computational complexity classes
The above results apply to the limit case when we have unlimited resources to run and invert programs. But similar results hold when we take into account computational complexity. In fact, the relationship between deterministic and nondeterministic complexity classes mirrors the divide between deduction and induction.
The P vs NP problem can be thought of as a question about pattern recognition. In P, you're given an input and tasked with computing the output efficiently—in polynomial time. In NP, it’s the reverse: you're given a desired output and need to find the input—or “program”—that produces the output in polynomial time.
For example, with NP problems like the Boolean satisfiability problem (SAT), verifying if a given solution works is easy (in P), but finding that solution is hard, much like trying to reverse-engineer a program to figure out what input created a specific result.
In this sense, NP is about solving problems by inverting the process—finding the input that leads to the output. If P = NP, it would mean that finding this input (or inverting the program) is as easy as verifying it, making program inversion computationally feasible. But if P ≠ NP, as many believe, it suggests that reversing the computation (finding the right input) is much harder than simply running it forward.
Conclusions
Pattern recognition as core to AI and intelligence: I've highlighted that pattern recognition is fundamental to intelligence, driving key AI capabilities like learning and reasoning. Many AI applications, from machine learning to natural language processing, depend on recognizing patterns in data. However, while humans excel at this across diverse contexts, AI struggles to generalize beyond specific tasks, underscoring the difficulty of pattern recognition.
Universal pattern recognition is incomputable, but feasible in restricted domains: Though universal pattern recognition is incomputable, it's possible within finite limits. Perfect pattern recognition works when restricted to a finite set of candidate programs. This means that while AI can solve problems within specific, bounded domains (e.g., diagnosing diseases or navigating traffic), building better AIs will forever remain open-ended.
The challenge of inverting computation and the general structure of AI algorithms: The P vs NP problem fundamentally revolves around the difficulty of pattern recognition. In class P, given an input, computing the output is efficient. In class NP, however, the challenge is reversed: you need to find the input (or “program”) that produces a specific output, a task that is much harder. This highlights a general framework for pattern recognition algorithms: first, analyze the data, then iteratively hypothesize potential programs, verifying at each step whether a given program generates the observed data.
AI as a core branch of computer science: AI isn't just an engineering branch about practical problem-solving—it's deeply rooted in computer science theory. Concepts like computation, logic, and complexity are essential for understanding and advancing AI, though their importance is often underplayed. Strengthening these links could unlock new AI breakthroughs by helping researchers better understand the limits and possibilities of computation and pattern recognition. AI's future lies in its deep integration with computer science, focusing more on foundational concepts like algorithmic complexity.