AI Infrastructure · Geopolitics

The Homegrown Model: How China Trained a Trillion-Parameter AI Without a Single Nvidia Chip


Meituan says its 1.6-trillion-parameter LongCat-2.0 is the first frontier-scale model ever pretrained and run entirely on domestic Chinese silicon. If the claim holds up, it is the clearest answer yet to years of American export controls.

July 2, 2026 Lisa Pedrosa ~10 min read AI Geopolitics
full pretrain+infer

On the night of June 30, 2026, somewhere in a data center whose exact address Meituan has never disclosed, a training run finished without a single Nvidia logo anywhere in the rack. Fifty thousand domestic AI accelerators, wired together on a homegrown communications library, had spent weeks chewing through more than thirty trillion tokens to produce a 1.6-trillion-parameter model called LongCat-2.0. Then Meituan did something almost more startling than the chip story itself: it gave the model away, publishing the weights on Hugging Face and GitHub for anyone to download, inspect, and run.

I want to be precise about why this is the story and not just a footnote to it. Big model releases happen constantly now; a new trillion-parameter system barely raises an eyebrow anymore. What makes LongCat-2.0 different is the hardware underneath it, and the specific, almost bureaucratic-sounding claim Meituan is making: that this is the first trillion-parameter model to complete both pretraining and inference on domestic Chinese computing infrastructure. Not one or the other. Both. That distinction, which sounds like engineering trivia, is actually the whole plot.

What LongCat-2.0 actually is

Strip away the geopolitics for a moment and LongCat-2.0 is, on its own technical merits, a serious piece of engineering. It's a Mixture-of-Experts model, meaning it doesn't fire all 1.6 trillion parameters for every request. Instead, an internal router selects a subset of specialized "experts" for each token, activating somewhere between 33 and 56 billion parameters depending on the task — reportedly around 48 billion on average. That sparsity is what makes a model this large practically usable: you get the capacity of a trillion-plus-parameter system with the running cost closer to a much smaller dense model.

It also ships with a native one-million-token context window, and it was purpose-built for agentic coding and tool use — the kind of long-horizon, multi-step tasks where an AI has to hold an entire codebase or a sprawling instruction set in mind at once. Meituan says the model was trained on more than 30 trillion tokens of data, and it was quietly powering something the developer community had already noticed: an anonymous model called "Owl Alpha" had spent roughly two months topping community leaderboards on OpenRouter, a marketplace where developers route requests across dozens of competing AI models. When Meituan lifted the curtain on LongCat-2.0, it confirmed that Owl Alpha had been this model all along, already ranked near the top of coding-agent charts before anyone knew whose it was or what it ran on.

1.6TTotal Parameters
50,000Domestic AI Chips
1MToken Context Window
33–56BActive Params / Token

On benchmarks Meituan has published itself, LongCat-2.0 scores 59.5 on SWE-bench Pro, a test of real-world software engineering tasks — nosing ahead of GPT-5.5's reported 58.6 and well clear of Gemini 3.1 Pro's 54.2, though it still trails Anthropic's Claude Opus models. It also posts a 77.3 on SWE-bench Multilingual and a 70.8 on Terminal-Bench 2.1, a benchmark for command-line agent tasks. Those are strong numbers. They are also, importantly, numbers Meituan is reporting about itself, a caveat I'll come back to.

Why "pretraining and inference" is the real headline

Here is the part that took me a moment to fully appreciate, because the two halves of "training an AI model" sound like one job but are really two very different engineering problems. Inference — running an already-trained model to answer a user's question — is comparatively forgiving. The computation is predictable, the workload is bursty but well understood, and chips with narrower capabilities can be made to handle it. Pretraining is a different animal entirely: it means synchronizing tens of thousands of chips over weeks, with the entire run vulnerable to a single failed node or a communication bottleneck. It demands enormous, sustained memory bandwidth and interconnect speed. It is, in short, the hard part.

Chinese AI labs have been chipping away at the inference half for a while. Back in April, researchers used a cluster of roughly 1,000 Huawei Ascend 910C chips to handle post-training work on DeepSeek's V4-Pro model — a real milestone, but one that still leaned on Nvidia hardware, or Nvidia-adjacent low-precision formats, for the heaviest lifting of pretraining itself. That was domestic silicon proving it could handle the easier problem.

The distinction that matters: Meituan says LongCat-2.0 is the first trillion-parameter model in which the full pipeline — pretraining and inference — ran end-to-end on roughly 50,000 domestic AI accelerators, using Huawei's HCCL communication library, the homegrown counterpart to Nvidia's NCCL. If verified, that's the first time a frontier-scale model has been born, not just served, entirely on Chinese-designed chips.

Meituan hasn't officially named the chip vendor in its release materials, but the infrastructure fingerprint — the HCCL library, the scale of the cluster, the "Atlas" naming pattern that has surfaced in independent reporting — points strongly toward Huawei's Atlas-950 SuperPods, the same architecture Huawei has been positioning as its answer to Nvidia's GB200 systems. Lehigh University researcher Hanchi Sun, one of the analysts who reviewed the release, called it the first model trained to near-frontier performance on 50,000 Chinese domestic accelerators — a description that, notably, treats the achievement as real even while the vendor identity remains something reporters are inferring rather than confirming outright.

The chokepoint, and the counter-move

It's worth stating plainly what backdrop this lands against, because without it the story loses half its meaning. For several years now, Washington has treated advanced AI chips as a chokepoint — a single, defensible bottleneck through which China's entire AI ambition had to pass, and one the U.S. could squeeze. Export controls tightened around Nvidia's most advanced chips, and even after a partial policy reversal in late 2025 — when the Commerce Department shifted from presumptive denial to case-by-case licensing for chips like the H200, complete with volume caps and a 25 percent tariff — the practical result has been that almost none of that hardware has actually shipped. As of this spring, U.S. regulators had cleared roughly ten Chinese firms to buy H200 chips, capped at 75,000 units per customer, and not one chip had been delivered, caught in a mix of legal limbo and Beijing's own new supply-chain rules.

LongCat-2.0 is the counter-move to that entire narrative. It doesn't argue with the chokepoint framing — it just demonstrates a workaround. If a 1.6-trillion-parameter, near-frontier model really can be pretrained end-to-end on domestic accelerators, then the premise that advanced AI requires Nvidia hardware, that the chokepoint is a hard ceiling rather than a temporary one, gets a lot shakier. Nvidia's own language in its most recent annual report is remarkably blunt about this: the company describes itself as having been "effectively foreclosed from competing in China's data centre computing market," and adds that this foreclosure "helped our competitors build larger developer and customer ecosystems that can challenge us globally." That's not spin from a critic. That's Nvidia's own filing.

"As of the end of fiscal year 2026, we were effectively foreclosed from competing in China's data centre computing market, and our effective foreclosure from the China market helped our competitors build larger developer and customer ecosystems that can challenge us globally." — Nvidia, FY2026 annual report filing

CEO Jensen Huang has been similarly candid in public remarks, telling investors bluntly to "expect nothing" in terms of near-term approval to sell advanced chips back into China. Meanwhile Huawei's AI chip revenue is projected to climb 60 percent year-on-year to around $12 billion in 2026, and Cambricon, a smaller Chinese chip designer, posted profits up more than 4,000 percent year-on-year on the back of surging domestic demand. Goldman Sachs expects Cambricon's shipments alone to grow from 143,000 units in 2025 to 2.1 million by 2030. None of that growth depends on Nvidia's cooperation.

What this signals about the Nvidia moat

For years, Nvidia's advantage wasn't just faster chips — it was the software ecosystem wrapped around them, CUDA above all, plus the sheer difficulty of orchestrating tens of thousands of accelerators without everything falling over. That combination was the moat. What LongCat-2.0 suggests, cautiously, is that the moat has a bridge across it now, at least for one very well-funded, very determined lab with access to Huawei's most advanced hardware and its own systems engineering talent. It doesn't mean the moat is gone. Huawei's chips still reportedly lag Nvidia's best on raw performance per chip, which is part of why the cluster needed 50,000 of them. It means the moat can be walked around, at significant cost and engineering effort, rather than simply walked through.

That has consequences beyond China. Counterpoint Research and other analysts tracking China's chip self-sufficiency push have noted a target of roughly 70 to 80 percent domestic wafer and GPU self-sufficiency by 2030 — an ambition that looked more aspirational a year ago than it does today. Every model like LongCat-2.0 that ships successfully is a proof point Beijing can point to, both domestically and to potential customers of Chinese AI infrastructure abroad.

The caveats worth holding onto

I try not to let a good story outrun its evidence, and this one comes with real, unresolved question marks. Meituan has not officially confirmed the chip vendor; everything pointing to Huawei's Atlas-950 SuperPods is strong circumstantial reporting, not an on-the-record disclosure. The benchmark numbers — the SWE-bench Pro score, the Terminal-Bench results — are self-reported by Meituan, and independent evaluators like Artificial Analysis had not, as of this writing, published their own third-party comparative assessments. Nobody outside Meituan has confirmed wall-clock training time, total energy consumption, or cost versus an equivalent Nvidia-based run, which are exactly the numbers that would let us judge whether domestic chips are merely capable or actually competitive on efficiency.

That gap matters. "It can be done" and "it can be done at a cost that makes commercial sense" are very different claims, and right now we only have solid evidence for the first one.

2023–24 Inference only Apr 2026 DeepSeek V4-Pro ~1,000 chips, post-training Jun 2026 LongCat-2.0 50,000 chips, full pretrain+infer Domestic-chip milestones →
From serving models to building them: the widening role of domestic chips, 2023–2026

The wider pattern: robots, sovereign models, and a world hedging its bets

LongCat-2.0 doesn't exist in isolation. It lands in the same season as two other data points about the world's growing appetite for AI independence. China's Ministry of Industry and Information Technology, together with its state-asset regulator, has ordered local governments and state-owned enterprises to help push more than 10,000 humanoid robots into commercial use by the end of 2026 — deployments already underway in logistics centers run by China Post and SF Express, and on assembly lines at electronics manufacturers. Separately, in June 2026 the European Commission selected the Domyn-led EUROPA consortium to build its own open-source frontier model, targeting more than 400 billion parameters trained on European supercomputing capacity and covering all 24 official EU languages — Europe's own attempt to avoid permanent dependence on American or Chinese AI infrastructure.

Different regions, different reasons, same underlying instinct: nobody wants their AI future gated by a supply chain, a licensing regime, or a single company's roadmap that they don't control. LongCat-2.0 is simply the most concrete, chip-level expression of that instinct so far.

Where this goes next

If Meituan's claim survives independent scrutiny — if outside benchmarking confirms the performance numbers and someone eventually nails down the real cost and duration of that training run — LongCat-2.0 will mark the moment chip export controls stopped being able to assume a multi-year head start. It won't make Nvidia irrelevant; the company's chips remain faster per-unit, and the software ecosystem around them remains enormous. But it changes the shape of the bet Washington placed. Export controls were always a wager that China's AI ambitions would stall out waiting for hardware. LongCat-2.0 suggests that instead of stalling, Chinese labs are learning to build frontier AI with more chips of a lesser kind rather than fewer chips of a better one — and that at 50,000 units and climbing, "lesser" is looking like a temporary label rather than a permanent one. The next year will tell us whether this was a one-off feat of engineering brute force, or the first data point in a curve that keeps climbing.

Share LinkedIn
Ko-fi Buy me a coffee
Scroll to Top