zkML-JOLT (Atlas): 7x Faster Zero-Knowledge AI Inference via a16z JOLT

Jingchang Sun

9-3

Mia: In the world of artificial intelligence, trust is everything. We want to know that an AI making a critical decision, whether in finance or medicine, is doing exactly what it's supposed to. The holy grail for this is something called verifiable AI, using cryptography to prove an AI's computation was correct. The problem? Historically, this has been incredibly, painfully slow. But what if the entire approach we've been using is wrong? What if, instead of forcing AI models through complex, custom-built mathematical mazes, you could fundamentally change the game? A new project is doing just that, and it’s showing some truly breakthrough results.

Mia: The core innovation here, found in a project called zkML-JOLT, is a radical departure from the norm. You see, most systems try to prove machine learning operations by translating them into complex arithmetic circuits. zkML-JOLT, building on foundational research called JOLT, does something different. It throws out the complex arithmetic for non-linear functions—think things like ReLU and SoftMax which are everywhere in neural networks—and instead, it just uses lookup tables. It’s a bit like instead of calculating a difficult math problem from scratch every time, you just look up the answer in a pre-made table. This seemingly simple design choice has a massive domino effect. It gets rid of a ton of complexity, things like quotient polynomials, byte decomposition, and convoluted circuits, which results in a prover that is just fundamentally faster and more efficient.

Mia: What this really means is we're looking at a paradigm shift in how zero-knowledge proofs are applied to machine learning. By focusing on lookups and the inherent sparsity in ML models, zkML-JOLT just sidesteps the computational cost that plagues other methods. This isn't just a small, incremental improvement. It's a strategic architectural advantage. It creates a much leaner, faster proving process, and that directly attacks the biggest barrier to using zkML in the real world: practicality.

Mia: This focus on lookup efficiency directly translates into superior performance for core ML operations like matrix multiplication, which is where we're going next.

Mia: Now, for those really heavy-duty tasks, like matrix-vector multiplication, which is the absolute workhorse of machine learning, zkML-JOLT uses a highly efficient protocol called batched sumcheck. Here’s the key difference: other approaches often have to retrofit their systems to handle sparsity, which is common in ML models. But JOLT's lookup-centric architecture is sparse by nature. And here’s the really clever part: JOLT doesn't even need to write down the entire massive lookup table. The tables are structured, not explicitly stored. This allows for incredible flexibility, supporting different kinds of quantization and even opening the door to floating-point operations, which is a huge limitation for competitors.

Mia: The critical tension here is how these design choices directly map to the real-world patterns we see in machine learning workloads. By natively handling sparsity and avoiding these massive, pre-processed tables, zkML-JOLT offers a solution that is just far more adaptable and less constrained. This flexibility is a huge differentiator. It could enable much broader adoption across different types of AI models and even solve the floating-point arithmetic problem that so many zkML frameworks are stuck on. Plus, its ability to integrate new operations as primitives without a bunch of restrictive rules is a powerful engine for future innovation.

Mia: The practical impact of all these optimizations becomes crystal clear when you actually compare zkML-JOLT's performance against other leading projects.

Mia: In end-to-end tests, the results are pretty stunning. zkML-JOLT completed a verification for a multi-classification model in about 0.7 seconds. To put that in perspective, competitors like Mina-zkml took around 2 seconds, and ezkl was even slower at 4 to 5 seconds. And that's not even mentioning that some frameworks couldn't even support essential operations needed for the model. But this isn't just about raw speed. The research also shines a light on some serious gaps in the entire zkML ecosystem, specifically around completeness and correctness. For example, many frameworks claim they support the standard ONNX format for AI models, but they're missing crucial components like memory consistency checks. Without those checks, you can't actually guarantee the integrity of the execution.

Mia: So, what does this comparison really tell us? It proves that the theoretical advantages of this lookup-based approach translate directly into practical, deployable speed. But it also serves as a critical warning. Speed is useless if it comes at the expense of verifiable correctness. The fact that zkML-JOLT is committed to both performance and these foundational guarantees is what could finally build the trust needed for widespread adoption of verifiable AI.

Mia: And while all of this is currently running on a CPU, the path to even greater performance gains, and the truly critical aspect of zero-knowledge privacy, are the key next steps.

Mia: Right now, zkML-JOLT operates on CPU architecture. The clear next step is GPU acceleration, which is projected to deliver another 10x performance boost. But perhaps more importantly, the architecture is designed to enable true zero-knowledge privacy. This is a crucial distinction. Many so-called zkVMs are succinctly verifiable—meaning you can prove the computation was done correctly—but they aren't actually zero-knowledge, because they don't hide the inputs, outputs, or intermediate steps. zkML-JOLT uses something called folding schemes to achieve this, allowing for genuine privacy. This is essential for sensitive AI use cases in fields like healthcare or finance. This folding approach also makes the system incredibly flexible, allowing it to run efficiently on everything from a personal computer to a massive server farm. The roadmap ahead is packed with further optimizations, showing a deep commitment to pushing the boundaries of efficiency, security, and practicality.

Mia: So, to wrap things up, here are the key points to remember from today's briefing.

Mia: First, zkML-JOLT achieves significant speedups in verifiable machine learning, we're talking 3 to 7 times faster, by prioritizing a lookup-based approach over traditional, complex circuit arithmetic.

Mia: Second, the entire architecture is designed for efficiency from the ground up. By eliminating complex components and natively handling the sparsity found in AI models, it offers a much more practical solution for real-world machine learning workloads.

Mia: Third, while it demonstrates superior performance, zkML-JOLT also emphasizes verifiable correctness and completeness, which addresses some really critical gaps and weaknesses in the broader zkML ecosystem today.

Mia: And finally, the future development is focused on two things: scaling performance even further with GPUs, and enabling true zero-knowledge privacy through folding schemes. This combination is what could finally pave the way for the widespread adoption of truly trustworthy AI agents.

大纲

zkML-JOLT (Atlas) achieves 3-7x faster zero-knowledge machine learning inference by adapting a16z's JOLT lookup-based architecture. It specifically optimizes for neural network non-linearities and sparsity, cutting down on complex circuit representations and expensive field arithmetic inherent in traditional zkML approaches. This focus allows for native efficiency and flexibility where other methods struggle.

JOLT's Foundational Innovation

a16z JOLT: Achieved 6x speedup using a lookup-based sumcheck protocol, optimized via "Twist and Shout" for sparsity.
zkML-JOLT (Atlas): Extends a16z's JOLT, modifying it for neural network inference (non-linearities, sparsity, MatMul) to leverage its sumcheck + lookup design.
Simplified Design: Eliminates complex components like quotient polynomials, grand products, and complicated circuits that plague other zkML approaches.

Technical Advantages for zkML

Efficient Non-Linearities: Uses lookups for functions like ReLU and SoftMax, avoiding expensive circuit representations.
Optimized Operations: Employs a highly efficient batched sumcheck protocol for matrix-vector multiplication and natively benefits from sparsity.
Flexible Quantization: JOLT's non-materialized, structured lookup tables enable flexible quantization and potential floating-point operations.
Expanded Precompiles: "Twist and Shout" allows operations to be added as primitive instructions if their evaluation tables are MLE-structured, dramatically broadening JOLT's instruction set.

Performance & Practical Deployment

Benchmark Superiority: Demonstrates significant speed advantages (e.g., ~0.7s vs. ~2.0s for Mina-zkml, 4-5s for ezkl) for a multi-classification model.
Scalability & Correctness: Plans for GPU acceleration and prioritizes verifiable correctness, including memory consistency checks crucial for ONNX compatibility.
Integration: Deployed as the first specialized prover in NovaNet (DeSCI for ZK), and utilized by Kinic for verifiable AI memory.
Real-World Use Cases: Enables practical applications in privacy-preserving healthcare, financial services, and verifiable AI agents, addressing trust and auditability.

True Zero-Knowledge & Future Roadmap

Folding Schemes: Achieves true zero-knowledge privacy using techniques from the HyperNova paper, allowing dynamic step sizing and memory-speed tradeoffs.
Parallelization: Multi-folding with continuations supports vast parallelization of zkML workloads, unlocking use cases for consumer devices and resource-constrained environments.
a16z Roadmap: Continues advancing JOLT's core infrastructure with "streaming JOLT," optimized Fiat-Shamir, enhanced sumcheck, and GPU acceleration.
ICME Roadmap: Focuses on lattice-based polynomial commitment schemes (PCS) and folding schemes for advanced features like pay-per-bit costs and post-quantum security.

脚本

Mia: This focus on lookup efficiency directly translates into superior performance for core ML operations like matrix multiplication, which is where we're going next.

Mia: The practical impact of all these optimizations becomes crystal clear when you actually compare zkML-JOLT's performance against other leading projects.

Mia: And while all of this is currently running on a CPU, the path to even greater performance gains, and the truly critical aspect of zero-knowledge privacy, are the key next steps.

Mia: So, to wrap things up, here are the key points to remember from today's briefing.