A recent Apple paper highlighting limitations in reasoning by Large Language Models (LLMs) has sparked considerable discussion and debate, which is excellent news. Admittedly, the discourse hasn't always reached the high standard we might ideally desire—but that's acceptable. Such conversations are vital and will undoubtedly continue indefinitely.
Many critiques have targeted the supposed inability of LLMs to reason altogether, suggesting that advanced thinking is uniquely reserved for humans. However, these claims display a fundamental misunderstanding of what formal reasoning truly entails: the construction of formal proofs. It was Kurt Gödel who decisively clarified this issue in his seminal works on the Incompleteness Theorems, notably achieved at the impressive age of just 25. Central to Gödel’s groundbreaking work is the concept of Gödel numbering, an encoding scheme allowing every formal proposition—including theorems and proofs—to be represented uniquely as natural numbers. The specifics of this encoding are less critical than the key insight that each formula within a formal system can be distinctly enumerated.
An immediate consequence of this seemingly modest observation is that all formulae can be enumerated mechanically and deterministically—a task that machines can evidently perform. Such enumerations indeed feature prominently in various mathematical proofs. Crucially, this insight dismantles the myth of human-exclusive divine intuition or mystical depth underlying core reasoning processes. It must be acknowledged, though, that exhaustive enumeration is not particularly efficient. Both humans and machines rely significantly on heuristics—necessarily incomplete and approximate—to accelerate reasoning processes.
Indeed, it's undeniable that exceptional mathematicians, logicians, philosophers, scientists, composers, writers, and artists leverage their creativity and intuition extensively to produce groundbreaking ideas. Nevertheless, this does not preclude the existence of alternative shortcuts allowing machines to discover similarly innovative concepts. Conceptually, this could be as straightforward as performing nearest-neighbor searches within abstract vector spaces—a foundational operation of modern LLMs.
One major argument presented in the Apple paper emphasizes a sudden drop in the reasoning capabilities of LLMs beyond a certain complexity threshold. Many have hastily extrapolated this finding to conclude that LLMs are merely "statistical parrots," mindlessly generating tokens. However, such critiques fail to acknowledge that formal systems themselves inherently operate within finite vocabularies and structured rules.
There is, in fact, no profound mystery in this perceived limitation. It suggests strongly that LLMs primarily memorize and cache answers already encountered within their extensive training datasets, swiftly responding with precomputed information. This behavior underscores a fundamental challenge in inductive reasoning, wherein LLMs struggle to produce genuinely general hypotheses, instead reproducing finite, previously encountered instances—such as the Towers of Hanoi problem mentioned explicitly in the paper.
Predictably, critics of LLMs rush to generalize (ironically, via induction) this limitation, declaring categorically that LLMs cannot engage in authentic reasoning. However, this argument overlooks a critical nuance: general inductive propositions and proofs themselves are finite and, in principle, discoverable by machines through systematic enumeration, potentially augmented by advanced techniques like vector-space nearest-neighbor searches.
I am going to stop now but there is much more to come in this exciting journey :)