The AI Scaling Laws Hold
Gemini 3 is breaking down the “AI wall”
Why Everyone Thought We’d Hit the AI Wall
For most of 2025, a loud narrative took hold in tech and on X: “The scaling party is over.” Bigger models no longer seemed to deliver the same clear, step-function jumps we saw from GPT-3 to GPT-4. Improvements felt incremental, benchmarks moved by inches instead of miles, and many concluded that simply throwing more compute at pre-training had run into a hard wall. Investors started asking if we were early-2000s telecom all over again: massive capex, questionable returns.
That story made intuitive sense because costs were exploding while headline capabilities looked to be slowing. GPU prices soared, datacenter power became a board-level constraint, and leading labs got more secretive about parameter counts and training FLOPs. Without a clean, public “wow” moment, it was easy to believe that pre-training had exhausted its gains and that only exotic post-training tricks or specialized agents could move the needle from here.
What Scaling Laws Actually Say
The original “scaling laws for language models” work from Kaplan and colleagues showed something surprisingly simple: as you increase three knobs - model parameters, training data, and compute - the loss falls in a smooth power-law curve across many orders of magnitude. In plain language, bigger models trained on more data with more compute get predictably better. Crucially, those laws never said you must always increase parameter count; they describe a relationship between all three variables.
Later work on “compute-optimal” training (the Chinchilla-style results) sharpened that picture. It showed that many large models were actually under-trained for their size: too many parameters, not enough tokens or compute. A smaller model trained longer and on more data (with the same total compute) beat much larger predecessors across benchmarks. The lesson: scaling laws aren’t “just add parameters,” they’re “spend your compute wisely across size, data, and training.”
This matters because it reframes what “scaling” means. You can scale by adding parameters, by feeding more high-quality data, or by training more efficiently on better hardware and software stacks. When people said “scaling hit a wall,” they were mostly looking at one axis - parameter count - while the real game had quietly shifted to compute, data, and algorithmic efficiency.
Gemini 3: The Model That Shattered the Wall Myth
Enter Google’s Gemini 3. In Google’s own words, Gemini 3 Pro is their “most intelligent” model yet, with clear gains in reasoning, multimodal understanding, and coding. Early users describe it as a genuine step up: it reclaims top spots on community leaderboards, beats its own predecessor Gemini 2.5 Pro, and competes aggressively with other frontier models like Grok 4.1 and the latest GPT-family releases. For the first time in months, the internet didn’t just debate UX features - it debated a real capability jump.
The crucial twist is how Gemini 3 seems to achieve this. Public details are sparse, but informed speculation and rumor point toward Gemini 3 being a huge sparse Mixture-of-Experts (MoE) model: potentially on the order of 5 trillion parameters in total, while only a fraction of those are active for any given token. In other words, its capacity may have exploded, but its active footprint per token is kept in the same ballpark as prior models. That is exactly the sort of clever scaling that the “AI wall” narrative failed to anticipate.
Way Smarter: How Is That Possible?
At first glance, “5 trillion parameters” sounds like a brute-force repudiation of the idea that we were hitting limits. But the way MoE models work makes the story more subtle - and more interesting. In a sparse MoE, you have many “experts” inside the model, and a small router network chooses just a couple of them for each token or segment. Total capacity can be enormous, but only a slice of it is used at inference time. You pay for the full model when training, but you only pay for the active experts on each forward pass.
If Gemini 3 is indeed a multi-trillion-parameter MoE with roughly “Gemini 2.5–ish” active parameters per token, it fits perfectly into the updated scaling-law picture. Rather than endlessly scaling a single dense network, Google appears to have scaled capacity via MoE, compute via better hardware and training pipelines, and data via vast, highly conversational corpora, including large licensed web and social datasets. The result is a model that feels much “smarter” without feeling proportionally more expensive to run.
Another lever is architecture and training strategy beyond MoE. Gemini 3 is described as an advanced multimodal system with improved tool use, long-context reasoning, and better integrated vision and code capabilities. Rumors and technical sleuthing suggest more efficient routing, higher-quality pretraining data, and deeper post-training all contributed. In effect, Gemini 3 isn’t just a bigger brain; it’s a better-organized one, with far more specialized regions that can be recruited on demand, running on faster, more coherent hardware pipelines.
The Science Signal: Scaling Laws Still Work
This is why many observers are now arguing that “the scaling wall was a mirage.” The fear that pre-training had stopped paying off looks, in hindsight, like a misread of a temporary plateau. For a couple of model generations, we saw under-trained dense models and messy transitions in hardware and infrastructure. Gemini 3 arrives after that turbulence with a cleaner recipe: smarter architecture (MoE), more compute, better data, and more mature training pipelines.
Seen through the lens of the scaling-law papers, the story hangs together. Performance continues to improve smoothly when you push the right combination of model size, data, and compute - even if the raw active parameter count per token only inches up while the total parameter count explodes via MoE. Gemini 3 doesn’t break the laws; it obeys them in a more sophisticated regime, where sparse capacity, training data, and system-level efficiency do more of the heavy lifting than brute-force dense width.
For investors, this is the key insight: the “laws of motion” for AI progress are intact, just operating in a more complex, capital-intensive regime. The path forward is not “stop scaling,” it’s “scale smarter” with mixtures of experts, carefully chosen data, and huge, tightly coupled compute clusters.
From Chips to Clusters: Where the Value Accrues
If scaling laws still hold - and MoE lets you keep pushing total capacity into the multi-trillion-parameter range - the next obvious question for any tech investor is: who gets paid? Start with the hardware layer. NVIDIA’s Blackwell architecture, and Google’s own next-generation TPUs, were designed explicitly for massive AI workloads and reasoning-heavy inference. Gemini 3’s success is a powerful demand signal for accelerators, high-bandwidth memory, and ultra-fast interconnect fabrics that can feed gigantic MoE models without choking.
But the story doesn’t stop at the chip. The real magic happens in coherent clusters: thousands of GPUs or TPUs wired together with enough bandwidth and low latency to act like one giant computer. MoE makes those clusters even more important, because you’re juggling many experts across devices and racks. Google is already doing this internally for Gemini 3 on its TPU-based infrastructure, and is exposing more of that capability to developers through Gemini APIs, Vertex AI, and agent-building tools. Neoclouds like CoreWeave, and the hyperscalers like Google Cloud, AWS, and Microsoft Azure, sit right at this junction between chips and training runs, monetizing customers’ hunger for large, coherent compute fabrics.
Gemini 3’s Second-Order Effect: Token Economics and Leaders
On top of raw capability, Gemini 3 changes the economics of tokens - the fundamental “unit of work” for AI. With better reasoning, more specialized experts, and stronger tool-calling, each token can do more useful work for end-users; a page of text is no longer just autocomplete, it’s planning, coding, running multi-step agents, or orchestrating other tools and services. That increases the value per token and makes it rational for enterprises to pay for higher-end models, as long as cost-per-token keeps trending down.
This dynamic tends to favor players with both strong models and deep infrastructure. Google (Gemini + TPUs + cloud) is suddenly in a stronger position: it has a frontier MoE model, a vertically integrated hardware stack, and a massive global footprint of datacenters to run it. OpenAI (with o-series reasoning models on Azure), Anthropic (Claude on AWS), and xAI (Grok with tight hardware and automotive adjacency) also benefit from the same logic, but Gemini 3’s MoE-driven leap is a proof point that Google’s “train huge capacity, route sparsely” strategy works. In a world where scaling laws still pay, the companies that can fund and operate trillion-parameter MoE stacks will shape the frontier.
Why This Is Bullish for the Whole AI Stack
If you step back, Gemini 3 and the survival of scaling laws are good news for almost every layer of the AI stack. Hardware manufacturers like NVIDIA and, on the TPU side, Google itself, get clearer justification for their next-generation designs and fabs. Their chips are not just keeping models afloat; they’re the enablers of a new capacity regime where 5+ trillion-parameter MoEs are practical. Cloud and neocloud providers get a longer runway of demand for large, coherent clusters and the power, cooling, and networking to support them.
Model labs see that pre-training is still a growth engine, not a dead end. The recipe is evolving - from dense giants to sparse, expert-rich models - but the underlying principle is intact: more thoughtfully deployed compute still buys better intelligence. Even application companies win because the playbook now looks familiar: better infrastructure → more capable models → richer products → more user data and usage → further model improvement. With Gemini 3 showing progress in reasoning and agentic workflows, we are closer to AI systems that can handle meaningful chunks of real work, not just chat replies. That expands the TAM for AI from “assistant in a browser tab” to “co-pilot for entire workflows,” raising the ceiling for everyone.
A Clearer Map of the Next Decade
The most important thing Gemini 3 does is restore a sense of continuity. We’re not in a mysterious post-scale phase where progress is random; we’re in a more demanding phase of the same curve, where progress comes from better allocation of compute, smarter architectures like MoE, and more integrated systems. The underlying scaling laws - the idea that more thoughtfully applied compute yields smoother, predictable gains - still hold, even when the models quietly jump from hundreds of billions to trillions of parameters under the hood.
For the average tech-savvy reader or investor, that should be both grounding and exciting. Grounding, because it suggests this is not a hype-only story: there is a recognizable scientific backbone - scaling laws, compute-optimal training, and sparse expert architectures - underneath the marketing noise. Exciting, because Gemini 3 looks less like a climax and more like a milestone on a long, upward curve. If this is what “MoE capacity in the trillions, better compute and training” can do, imagine what happens as Blackwell-class systems and their successors are fully brought to bear.
The wall wasn’t real. The hill just got steeper - and the companies with the best chips, the densest clusters, the most capable MoE architectures, and the smartest training regimes are about to climb faster than ever.
Disclaimer: This is not investing advice.



The most interesting point here is that Gemini 3 isn’t a “dense scaling” story at all.
It’s the first real mainstream proof that capacity scaling through MoE remains fully consistent with the original scaling-law papers.
People misread the plateau because they were only watching parameter count, not compute allocation or data efficiency.
“Scaling is dead” was never a data-driven claim. Gemini 3 shows the opposite.
Oops, I stopped reading to race off and buy Alphabet, before I got the financial disclaimer at the end. Think I'll hold anyway, even if this wasn't financial advice :)