François Chollet: Redefining AGI for Autonomous Invention with ARC

Frank Fan

7-3

Mia: So, you know how everyone's always talking about AI models just getting absolutely massive, right? Like, bigger parameters, more data, it's the only way forward. But François Chollet, he's got this *really* bold take on it, totally flipping that narrative. What's his big beef with just endlessly scaling up as the golden ticket to AGI?

Mars: Oh, he's basically saying we're slamming into a brick wall. He pulled out these stats, like, GPT-4, it's scaled up fifty thousand times since 2019, right? Yet, on this crucial intelligence test, ARC, its accuracy barely nudged from zero to, what, ten percent? Meanwhile, my five-year-old could probably nail ninety-five percent on that same test. It's wild.

Mia: So, basically, we're just throwing a ton of spaghetti at the wall, and it's not actually making these things any smarter, is it?

Mars: Exactly! He reframes intelligence completely. It's not about how much stuff you've crammed into your brain, like rote memorization. It's about how *efficiently* you can take what you already know and apply it to something you've never, ever seen before. He puts it beautifully: it's about *inventing* brand new roads, not just being a perfect driver on the same old highway. That’s automation, not true intelligence.

Mia: Okay, so if just making things bigger isn't the magic bullet, and intelligence is all about handling the totally unexpected, how did the AI world actually pivot? What kind of new tests popped up to really poke at this whole 'fluid intelligence' thing?

Mars: The massive paradigm shift, and this really got going in 2024, was something called 'test adaptation'. So, instead of an AI just being this static, pre-programmed thing trying to spit out an answer, these new models can literally *change themselves* and learn *while* they're taking the test. It's that incredible ability to just adapt on the fly that finally let them crack open challenges like ARC.

Mia: And ARC itself, I mean, it's like it's on fast forward, right? From ARC1, which was pretty much a simple yes or no, to ARC2, which started poking at some seriously complex reasoning. So what's the grand plan for ARC3? Where's that headed?

Mars: Oh, ARC3 is a *giant* leap. We're talking about assessing 'agency' now. It's this idea that an AI can actually explore its surroundings, learn as it interacts, and then, completely on its own, figure out how to hit a goal. It's way beyond just getting the right answer; it's about how slickly an AI can *invent* a solution when it's thrown into a totally alien situation.

Mia: So, these test adaptation ideas and the ARC benchmarks, they're really nudging AI towards that more human-like, flexible intelligence. But Chollet, he's like, Nope, we gotta dive even deeper into abstraction. What does that even mean? What's he getting at there?

Mars: He drops this absolutely mind-bending concept called the 'Kaleidoscope Hypothesis'. Think about it: the world feels infinitely complex, right? But he posits it's actually just built from this tiny set of fundamental 'atoms of meaning' that just endlessly shuffle and recombine. To grasp that, an AI needs two kinds of abstraction: Type 1, which is all about intuition and perception, like what deep learning nails. And then Type 2, which is super logical and symbolic, like, say, writing a perfect piece of code.

Mia: He had that brilliant analogy about chess players using their gut feeling – that's Type 1 – to then guide all their deep, logical calculations, which is Type 2. So how does he actually see that kind of fusion playing out in a real AI system?

Mars: He's imagining something he calls a 'programmer-like metalearner'. This isn't just about following instructions. This system would leverage that deep learning intuition to smartly guide a much more structured, algorithmic hunt for answers. It wouldn't just stick to one track; it would literally *synthesize* brand new programs right there on the spot, seamlessly weaving together those intuitive and logical parts to conquer completely new problems.

Mia: Okay, so if this 'programmer-like metalearner' is truly the horizon, what's the absolute endgame here? What's Chollet's lab, Tendia, really aiming for with all of this?

Mars: The ultimate goal isn't just to automate the stuff we've already figured out, which, let's be honest, is already pretty cool. No, it's about putting scientific progress itself into hyperdrive. We're talking about building a true engine for *autonomous invention*. That's the real prize.

Mia: Wow, we've really zipped through a ton today, from hitting the limits of just scaling everything up to the absolute power of test adaptation. It's pretty clear Chollet's whole vision is completely rewiring how we even think about chasing AGI. So, what's the bigger picture? What's the massive implication of this entire shift?

Mars: The implication is profound: we're finally moving beyond just building smarter parrots, if you will, to actually crafting genuine collaborators. It's about birthing an intelligence that doesn't just neatly answer the questions we already have, but helps us *unearth* entirely new questions to even ask, fundamentally pushing the very boundaries of human knowledge itself.

Outline

François Chollet argues that simply scaling up current AI models will not lead to Artificial General Intelligence (AGI), as it confuses memorized skills with true fluid intelligence. He redefines intelligence as the efficiency of adapting to novelty and proposes the ARC (Abstraction Reasoning Corpus) benchmarks to measure this dynamic adaptation. Chollet advocates for a new AI paradigm that combines intuitive deep learning with rigorous discrete program search to achieve autonomous invention, the ultimate goal of AGI.

Critique of Scaling Laws & Redefining Intelligence

Scaling Limitations: While compute costs have fallen and deep learning advanced with large models (e.g., LM3 training), scaling up models like GPT-4.5 (50,000x scale-up) did not yield fluid general intelligence; accuracy on ARC remained near 0%.
Intelligence Redefined: Intelligence is an "efficiency ratio" – how efficiently past information is operationalized to deal with future novelty and uncertainty, focusing on building "new roads" (invention) over navigating existing ones (automation).
Misleading Benchmarks: Traditional benchmarks measure task-specific skill and knowledge, leading to the "shortcut rule" where targets are hit but the true "point" (learning about intelligence) is missed, as seen with the Netflix Prize or early AI chess.
AGI as Autonomous Invention: True AGI should be capable of "autonomous invention" and accelerating scientific progress, rather than merely automating known tasks.

The ARC Benchmarks and the Shift to Test Adaptation

ARC's Purpose: François Chollet created ARC1 (2019) as an "IQ test for machines" to highlight the difference between static skills and fluid intelligence, requiring on-the-fly problem-solving for unique tasks.
Test Adaptation Pivot: In 2024, AI research pivoted to "test adaptation," where models dynamically modify their own state at test time to adapt to new situations, finally showing significant progress on ARC.
Evolving ARC Series: ARC1 was a binary test for minimal fluid intelligence; ARC2 (March 2025) challenges reasoning systems with more sophisticated compositional tasks; ARC3 (early 2026, developer preview July 2025) will assess "agency" – the ability to explore, learn interactively, and autonomously achieve goals in unique environments.
Human vs. AI Performance: Humans consistently score high (95%+ on ARC1, 100% with majority voting on ARC2 with no prior training), while even advanced models like OpenAI's O3 (fine-tuned for ARC) are not yet human-level on ARC2, and base models like GPT-4.5 get 0%.

Abstraction, Cognition, and the Future of AI

The Kaleidoscope Hypothesis: The world's apparent novelty is a recombination of a small number of "unique atoms of meaning" (abstractions), and intelligence is the ability to identify and reuse these invariant structures.
Two Types of Abstraction:
- Type 1 (Value-centric): Continuous domain, perception, intuition, pattern cognition (deep learning/Transformers excel here).
- Type 2 (Program-centric): Discrete domain, exact structure matching, reasoning, planning (Transformers struggle here).
Integrated Path to AGI: Achieving AGI requires combining both Type 1 (deep learning/intuition) and Type 2 (discrete program search/reasoning) to overcome combinatorial explosion.
"Programmer-like Metalearner": Chollet proposes an AI system (developed at his new lab, Tendia) that synthesizes programs on the fly, blending deep learning modules for Type 1 problems and algorithmic modules for Type 2, guided by deep learning intuition and constantly evolving a library of reusable abstractions.

Script

Mia: So, basically, we're just throwing a ton of spaghetti at the wall, and it's not actually making these things any smarter, is it?

Mia: Okay, so if this 'programmer-like metalearner' is truly the horizon, what's the absolute endgame here? What's Chollet's lab, Tendia, really aiming for with all of this?