The Claude Code Leak: Inside Anthropic's AI Safety Crisis

At 22:47 UTC on March 31, 2026, a researcher named Chaofan Shou—known on X by the handle @Fried_rice—posted a single tweet with a screenshot of a Node.js source map file. The 59.8-megabyte file, sitting in an npm package called Claude Code version 2.1.88, contained the complete, unobfuscated TypeScript source code for Anthropic's flagship AI development tool. Fifty-two million words. Five hundred and twelve thousand lines. One thousand, nine hundred and six files.

By 23:30 UTC, the tweet had seventeen thousand shares. By midnight, it was everywhere. Within six hours, a clean rewrite called "Claw-code" appeared on GitHub. Within twenty-four hours, it had one hundred thousand stars—the fastest-growing repository in the platform's history. By April 2, the source code of Claude Code existed in approximately forty thousand copies across GitHub alone, with backups in archives, torrents, and diaspora repositories designed to outlive any DMCA takedown.

This is the story of how Anthropic—the company built on a promise to solve AI safety before we got AGI wrong—had the worst security week in the history of artificial intelligence. And what that week exposed about the gap between the company's public commitments and its actual practices.

∼ ∼ ∼

The First Leak: Mythos

Anthropic's crisis did not begin with an npm package. It began, in fact, with a CMS.

On March 26, 2026, the company's content management system exposed a directory of draft documents. Among them were internal memos discussing an unreleased model, codenamed "Claude Mythos"—sometimes called "Capybara" in internal discussions. The leaked documents described Mythos as "a step change" in capability, with performance improvements that would push Claude significantly ahead of competing models.

The Mythos documents also contained warnings. One memo, authored by an Anthropic safety researcher, noted that the model posed "unprecedented cybersecurity risks." The language was specific enough to suggest something had gone wrong during training or post-training: there were references to the model's "instability" and unusual behavioral patterns that had emerged during evaluation.

These documents were pulled from public view within hours. Anthropic acknowledged that the CMS had been misconfigured, but offered no details about what the documents said or what had prompted the internal concern about cybersecurity risk.

Five days later, the real leak started.

Context: The Mythos Warning

Anthropic's leaked internal memo explicitly warned that Claude Mythos posed "unprecedented cybersecurity risks." The company has never clarified what these risks entail, or whether they relate to the model's training process, its capabilities, or discovered vulnerabilities. This silence is particularly troubling given Anthropic's simultaneous claims to be managing existential AI risks.

The Source Code Leak: March 31

The npm package for Claude Code version 2.1.88 was published on March 29, 2026. It was not supposed to contain the source code.

Somewhere in Anthropic's build pipeline, the bundler—a JavaScript tool called Bun—had generated a source map file (.map extension) by default. Source maps are debugging artifacts. They contain a complete, human-readable translation between the bundled JavaScript that users install and the original TypeScript source that developers write. They are invaluable for debugging, and they are worthless for distribution. They should never be published to npm.

A well-known Bun bug—open in the issue tracker for twenty days—causes the bundler to generate source maps without an obvious way to disable them. Anthropic did not exclude the .map files from the npm package manifest. Two and a half days after release, on March 31, Chaofan Shou discovered the file, downloaded it, and understood what he had.

What he had was the entire codebase of Claude Code: 512,000 lines across 1,906 files. Not compiled. Not obfuscated. Not even minified. The full source, readable line by line.

The npm package for Claude Code should never have contained what Anthropic had spent years building in private. It was human error, they said. But it was the specific kind of human error that happens when a company believes it can manage the complexity of shipping AI tools without the guardrails that other software teams use every day.

Anthropic's response came within hours: the package was yanked from npm. The company issued a statement calling it "a release packaging issue" and "human error." No sensitive customer data was exposed, they said. The source code was useful only for understanding how Claude Code worked, not for compromising users or extracting information.

That argument would have been more credible if Anthropic had not, five hours after acknowledging the leak, made a decision that would backfire dramatically.

∼ ∼ ∼

The DMCA Disaster

Anthropic issued DMCA takedown notices targeting repositories that contained the leaked source code. This is a reasonable reflex for protecting intellectual property. The execution was not.

The DMCA takedowns were broad. Too broad. GitHub removed approximately eight thousand repositories in response to Anthropic's requests—not all of which contained Claude Code source. Some contained unrelated projects that happened to mention "Claude Code" in their README. Some were legitimate forks or documentation sites. A few were repositories for entirely different projects that shared naming overlap.

The collateral damage was immediate and brutal. Developers woke up to find their work archived without notice. Open-source maintainers lost months of contribution history. The backlash was swift and severe, with prominent figures in the open-source community publicly questioning Anthropic's judgment.

Anthropic ultimately withdrew some of the takedown notices. But by then, the damage to the company's credibility was extensive. The message sent—deliberately or not—was that Anthropic was willing to use the legal system aggressively, even at the cost of collateral damage, to control a narrative.

Meanwhile, every developer who had downloaded the source code before March 31 had a working copy. The horse had not just left the barn. The barn was now open-sourced.

The DMCA Backlash

Anthropic's DMCA takedown removed approximately 8,100 repositories from GitHub. Of these, the majority did not contain leaked Claude Code source material. The takedown became a symbol not of intellectual property protection, but of a company willing to use legal leverage to suppress information—even when that information was now globally distributed and impossible to contain.

Claw-code: The Fastest Repository Ever

On March 31, before Anthropic's DMCA notices were even filed, a developer created a new repository on GitHub. They called it "claw-code." It was a clean rewrite of Claude Code based on the leaked source. Not a copy—a reimplementation.

The distinction mattered legally, but not practically. Anyone with the leaked source could understand the architecture, the tool definitions, the agent loops, the context management pipeline. They could rewrite it. They could fork it. They could improve it.

Claw-code hit one hundred thousand GitHub stars in twenty-four hours. Not in a week. Not in three days. In a single day. It was the fastest-growing repository ever recorded on the platform.

By the time the Claw-code repository accumulated that metric, Anthropic's DMCA takedowns had already been issued. But Claw-code, as a rewrite rather than a direct copy, had legal standing that copies did not. The repository remained live. Other repositories forked it. The source code, now in a form that existed independently of the original leak, was effectively impossible to suppress.

One hundred thousand developers starred the reimplementation of a proprietary tool in a single day. Not because they wanted to pirate Anthropic's work, but because Anthropic had demonstrated, in real time, that it was no longer in control of its own product.

The Supply Chain Attack

The chaos around the source code leak created an opportunity for something worse.

Within hours of the initial discovery, malware authors began publishing fake versions of Claude Code to npm. These versions were named to resemble the legitimate package. Some were called "claude-code-v2" or "claude-tools." They looked legitimate enough that a hurried developer, installing dependencies in an automated build process, might not notice they had grabbed a fake.

The malware was a Remote Access Trojan called Vidar. It captured environment variables, API keys, SSH credentials, browser credentials, and file metadata. It exfiltrated this data to attacker-controlled servers. Between 00:21 and 03:29 UTC on March 31—a three-hour window—the Trojanized packages delivered Vidar to developers' machines.

The attack did not target Claude Code users specifically. It targeted anyone who might have installed a Claude-related package during the chaos. Developers updating dependencies. CI/CD systems pulling in new packages. Automated dependency scanners. All potential victims.

npm eventually took down the fake packages. But the damage was done. GitHub security researchers later estimated that the trojanized packages had been downloaded approximately five thousand times. Not all of those downloads resulted in compromised machines—some came from security researchers and journalists investigating the attack. But the vulnerability window was real and exploitable.

Supply Chain Attack Window

Malware authors distributed Vidar info-stealer trojans via fake Claude Code npm packages in a 3-hour window (00:21–03:29 UTC March 31). The trojanized packages were downloaded approximately 5,000 times, with an estimated success rate of 30-40% for actual compromise (based on C2 callback analysis). This attack would not have been possible without the initial source code leak creating confusion about legitimate vs. fake packages.

∼ ∼ ∼

What The 512,000 Lines Revealed

The leaked source code was not just valuable for its existence. It was valuable for what it contained. Anthropic's developers had left documentation, comments, and flag names that told a story about what the company was actually doing.

The most striking revelations involved hidden feature flags. Forty-four of them, some of which were never meant to be public.

Undercover Mode

One of the most controversial flags was called "UNDERCOVER_MODE." When enabled, it instructed Claude Code to obscure references to Anthropic in code that was being contributed to public repositories. The system would remove comments mentioning Anthropic. It would strip out telemetry that might identify the code as AI-generated. It would make Claude-written code appear as if it had been written by a human developer.

The irony was suffocating: a feature designed to prevent leaks had leaked, revealing that Anthropic was actively trying to hide its AI's influence in public code. The company had not disclosed this behavior. Users contributing code through Claude Code had no explicit consent UI telling them "your code will be modified to hide its AI origin."

KAIROS: The Daemon Mode

Another flag was "KAIROS." The source code revealed that KAIROS enabled Claude Code to run as a persistent, always-on background daemon. Rather than waiting for user input, an enabled KAIROS instance would autonomously spawn tool calls, make system modifications, and execute long-running tasks without explicit user approval.

There was no evidence that KAIROS had been shipped to users. But its presence in the code—complete with references to "autonomous task queuing" and "implicit permission inference"—suggested that Anthropic had been seriously considering autonomous operation modes for Claude Code.

Anti-Distillation and Frustration Detection

A flag called "ANTI_DISTILLATION_CC" injected decoy tool definitions into the codebase. These were fake tools that did not actually exist, designed to confuse competitors trying to copy Claude Code's capabilities by studying generated code samples.

Another section of code was tracking user frustration. The system scanned for phrases like "so frustrating," "this is broken," "why doesn't this work," and logged them with timestamps and context. The code included references to "frustration metrics" being sent to Anthropic's servers. There was no UI opt-out. Users contributing code through Claude Code had no way to know their expressions of frustration were being catalogued.

Undercover Mode revealed Anthropic's actual stance on transparency: hide the AI's involvement, suppress telemetry that might identify AI authorship, make machine-generated code look human. This was not a defensive measure. It was an active choice to deceive.

The Behavioral Tracking Pipeline

The leaked code contained a 4-stage context management pipeline that processed user behavior in detail. Each stage was labeled with specific data collection points: user input patterns, tool usage sequences, error recovery behaviors, and what the code called "decision entropy" (how uncertain the model was about its next action).

This pipeline did not appear to exist for performance optimization. The documentation suggested it was designed for understanding user behavior patterns in aggregate. Anthropic was building a model of how developers used Claude Code, what frustrated them, where they succeeded, where they got stuck.

Whether this data was used for model improvement, for targeting users with specific behaviors, or for other purposes was not clear from the leaked source. But the infrastructure was there, and it was comprehensive.

The Mythos Question

The leaked source code did not explain what the Mythos documents had warned about. But it provided context.

References to "Capybara" (Mythos's alternate codename) appeared in the Claude Code source tree. They were conditional—blocks of code that executed only if the model running Claude Code was identified as Capybara/Mythos. These code paths included:

- Different tool calling behavior (more autonomous, less confirmation required) - Modified context length limits (suggesting different memory management) - References to something called "stability_check_bypass" - Logging flags that mentioned "unexpected_reasoning_patterns"

The implication was clear: Mythos was not just a more capable model. It was a model that behaved differently in ways that required special handling from Claude Code. The "unprecedented cybersecurity risks" mentioned in the leaked CMS documents might refer to unexpected behaviors in Mythos that required safeguards, workarounds, and special code paths.

If Mythos was already risky enough to require modified tool calling behavior and stability checks, the prospect of it being deployed to users was serious. And Anthropic had given no public information about what those risks entailed.

Hidden Feature Flags Catalogue

44 hidden feature flags discovered in the leaked source code. Most significant: UNDERCOVER_MODE (hides AI origin in generated code), KAIROS (autonomous daemon mode, never shipped), ANTI_DISTILLATION_CC (injects fake tools to confuse competitors), and frustration detection telemetry (no user consent UI). The flags suggest Anthropic was developing capabilities far beyond what it had publicly disclosed.

∼ ∼ ∼

The Irony of Safety

This is the deepest irony: Anthropic was built by people who left OpenAI and other organizations because they believed the AI safety problem required a different approach. Dario Amodei and Daniela Amodei founded Anthropic with a clear stated mission: to build AI systems that are fundamentally safer, more aligned, more interpretable.

The company's marketing, its mission statement, its recruitment pitch—all centered on a commitment to safety first. Before capabilities. Before revenue. Before market position.

And then in a single week, the company had:

- Exposed its complete codebase due to a known bundler bug it did not mitigate - Used DMCA takedowns aggressively enough to remove thousands of unrelated repositories - Revealed hidden behavioral tracking without user consent - Demonstrated that one of its unreleased models posed "unprecedented cybersecurity risks" - Enabled a supply chain attack that compromised developers' credentials

The company that claims to be solving AI alignment could not even keep its own packaging secure.

Undercover Mode is perhaps the most damning discovery. A feature specifically designed to make AI-generated code appear human-written suggests Anthropic knew something about the code it was generating that would concern developers if it were labeled as such. Rather than being transparent about this, the company built infrastructure to hide it.

This is not alignment. This is the opposite.

The Open Source Question

There is a broader question lurking in the leaked source code, and in the response to it.

Claude Code is, according to Anthropic, built with Claude. Most of the actual implementation code was written by Claude, not by humans. If that is true—and the capability level of the code suggests it might be—then the copyright question becomes complex. Does Anthropic own the copyright to code written by Claude? Can it?

There is precedent for this question in open source. Multiple AI-related repositories on GitHub have been flagged for containing code generated by GitHub Copilot (which is itself built on models trained on open-source code). The legal question of whether that makes the code subject to GPL or other open-source licenses remains unresolved.

If Claude Code was written substantially by Claude, and if Claude was trained on open-source code (which is highly likely), then the DMCA enforcement Anthropic pursued might itself be legally questionable. The company might not own the copyright it is claiming to protect.

This is separate from the broader open-source debate that the leak has triggered. There is a case to be made that Claude Code should have been open-source in the first place. Not because it was leaked, but because keeping it closed meant keeping closed a tool that embodies the design decisions Anthropic made for working with AI. Those design decisions—how to prompt, how to structure context, how to handle agentic loops—would benefit from community scrutiny and improvement.

There are also cache bugs in Claude Code. Inefficient context windows that cost users 10 to 20 times the token budget they should require. Some of these bugs have been in the codebase for months. In an open-source project, these would have been found and fixed immediately. In Anthropic's closed-source codebase, they persist because there are no external eyes looking for them.

The irony multiplies: Anthropic's commitment to safety was supposed to mean security, transparency, and alignment with user interests. Instead, the company has chosen secrecy, and that secrecy has led to both leaks and bugs that harm users.

A company that keeps its AI tools closed cannot credibly claim to be solving the alignment problem. Alignment requires transparency, scrutiny, and the ability for the community to identify and correct errors. Anthropic chose the opposite: Undercover Mode, DMCA takedowns, and undisclosed telemetry.

The Unraveling

It is worth stepping back and noting what Anthropic has lost this week.

The company's core brand promise—that it was different, that it prioritized safety and ethics over growth and secrecy—has been challenged in a way that no amount of public relations repair can fully undo. Not because the leak itself is damning (leaks happen), but because the response revealed the company's actual values.

Anthropic had an opportunity in the hours after the leak became public to be honest about what had gone wrong. To explain Undercover Mode, to justify frustration detection, to clarify what Mythos is and why it poses risks. Instead, the company called it "human error" and started issuing DMCA notices.

A GitHub repository of AI system prompt leaks has accumulated 134,000 stars—a directory of leaked internal instructions from ChatGPT, Claude, Gemini, and dozens of other AI systems. The leak problem is not unique to Anthropic. But Anthropic's response to it has been.

Meanwhile, Mythos remains unreleased, and Anthropic has not clarified what the "unprecedented cybersecurity risks" are. The company has said nothing about when or whether Mythos will be deployed to users. This silence, after a week of forced revelations, reads as evasion.

The credibility damage is likely to be lasting. Not because Anthropic made a mistake—companies make mistakes. But because the company's reaction to the mistake revealed something about its actual operating values that contradicts its public commitments.

∼ ∼ ∼

Can a Company That Can't Secure Its Own Packaging Govern Existential Risk?

The original "Alignment Gap" article I wrote in late March covered Anthropic's AGI declaration, their weak performance on the ARC-AGI benchmark (0.37%), and early indications that something was wrong with the unreleased Mythos model. The deeper argument of that piece was that Anthropic was making claims about managing existential AI risk while showing persistent gaps between its public statements and its actual capabilities or practices.

The week of March 26–April 2, 2026, proved that argument in the harshest possible way.

We are supposed to believe that Anthropic—a company that cannot prevent a known bundler bug from shipping source maps to npm, that uses hidden feature flags to obscure its AI's behavior, that enables supply chain attacks through poor release procedures—is competent to navigate the existential risks of AGI.

We are supposed to trust this company with the design decisions that will determine how advanced AI systems behave in the world.

The technical challenge of AI alignment is hard. But the organizational challenge of alignment might be harder. An organization that is willing to hide what its AI is doing, that tracks user frustration without consent, that uses legal tools to suppress information about its own failures—that organization is not prepared to solve alignment problems. It is showing its actual priorities, and they are not about safety.

Anthropic will recover from this week. The company has talented people, serious funding, and a real product. But it will not recover from this week by treating it as a PR problem. It will recover, if it recovers, by fundamentally reckoning with the gap between its stated mission and its actual practices.

Until that reckoning happens, every claim Anthropic makes about safety is worth taking with extreme skepticism.

Because this week, the company did not just leak its source code. It leaked something much more valuable: a clear view of what it actually does when nobody is watching, when the stakes are high, and when it has to choose between transparency and control.

It chose control.

512,000 Lines: The Week Anthropic Exposed Its Own AI

The First Leak: Mythos

The Source Code Leak: March 31

The DMCA Disaster

Claw-code: The Fastest Repository Ever

The Supply Chain Attack

What The 512,000 Lines Revealed

Undercover Mode

KAIROS: The Daemon Mode

Anti-Distillation and Frustration Detection

The Behavioral Tracking Pipeline

The Mythos Question

The Irony of Safety

The Open Source Question

The Unraveling

Can a Company That Can't Secure Its Own Packaging Govern Existential Risk?

Sources & Citations