
François Chollet: Redefining AGI for Autonomous Invention with ARC
François Chollet argues that simply scaling up current AI models will not lead to Artificial General Intelligence (AGI), as it confuses memorized skills with true fluid intelligence. He redefines intelligence as the efficiency of adapting to novelty and proposes the ARC (Abstraction Reasoning Corpus) benchmarks to measure this dynamic adaptation. Chollet advocates for a new AI paradigm that combines intuitive deep learning with rigorous discrete program search to achieve autonomous invention, the ultimate goal of AGI.
Critique of Scaling Laws & Redefining Intelligence
- Scaling Limitations: While compute costs have fallen and deep learning advanced with large models (e.g., LM3 training), scaling up models like GPT-4.5 (50,000x scale-up) did not yield fluid general intelligence; accuracy on ARC remained near 0%.
- Intelligence Redefined: Intelligence is an "efficiency ratio" – how efficiently past information is operationalized to deal with future novelty and uncertainty, focusing on building "new roads" (invention) over navigating existing ones (automation).
- Misleading Benchmarks: Traditional benchmarks measure task-specific skill and knowledge, leading to the "shortcut rule" where targets are hit but the true "point" (learning about intelligence) is missed, as seen with the Netflix Prize or early AI chess.
- AGI as Autonomous Invention: True AGI should be capable of "autonomous invention" and accelerating scientific progress, rather than merely automating known tasks.
The ARC Benchmarks and the Shift to Test Adaptation
- ARC's Purpose: François Chollet created ARC1 (2019) as an "IQ test for machines" to highlight the difference between static skills and fluid intelligence, requiring on-the-fly problem-solving for unique tasks.
- Test Adaptation Pivot: In 2024, AI research pivoted to "test adaptation," where models dynamically modify their own state at test time to adapt to new situations, finally showing significant progress on ARC.
- Evolving ARC Series: ARC1 was a binary test for minimal fluid intelligence; ARC2 (March 2025) challenges reasoning systems with more sophisticated compositional tasks; ARC3 (early 2026, developer preview July 2025) will assess "agency" – the ability to explore, learn interactively, and autonomously achieve goals in unique environments.
- Human vs. AI Performance: Humans consistently score high (95%+ on ARC1, 100% with majority voting on ARC2 with no prior training), while even advanced models like OpenAI's O3 (fine-tuned for ARC) are not yet human-level on ARC2, and base models like GPT-4.5 get 0%.
Abstraction, Cognition, and the Future of AI
- The Kaleidoscope Hypothesis: The world's apparent novelty is a recombination of a small number of "unique atoms of meaning" (abstractions), and intelligence is the ability to identify and reuse these invariant structures.
- Two Types of Abstraction:
- Type 1 (Value-centric): Continuous domain, perception, intuition, pattern cognition (deep learning/Transformers excel here).
- Type 2 (Program-centric): Discrete domain, exact structure matching, reasoning, planning (Transformers struggle here).
- Integrated Path to AGI: Achieving AGI requires combining both Type 1 (deep learning/intuition) and Type 2 (discrete program search/reasoning) to overcome combinatorial explosion.
- "Programmer-like Metalearner": Chollet proposes an AI system (developed at his new lab, Tendia) that synthesizes programs on the fly, blending deep learning modules for Type 1 problems and algorithmic modules for Type 2, guided by deep learning intuition and constantly evolving a library of reusable abstractions.