Field Notes · AI & Biology

The Cell, Simulated


Half a billion dollars is betting that we can build a working model of a living cell inside a computer—and read disease before it begins.

June 15, 2026 Lisa Pedrosa 9 min read AI · Medicine
PREDICTIVE MODEL OF LIFE

For all our triumphs in biology, we still treat the cell the way medieval cartographers treated the ocean—mapping its coastlines, naming its monsters, but unable to say what will happen if we sail into the middle of it. We can list a cell's parts. We cannot reliably predict what it will do when we change one of them. A new and enormously funded effort is betting that artificial intelligence can finally close that gap by building something biology has never had: a working simulation of life itself.

On April 29, 2026, Mark Zuckerberg and Priscilla Chan announced that their nonprofit research organization, the Chan Zuckerberg Biohub, would commit $500 million over five years to a program called the Virtual Biology Initiative. Its goal sounds almost like science fiction: to build AI-powered predictive models of human cells—"virtual cells"—that simulate how a cell works, how it responds to its environment, how it breaks down under disease, and how it might be repaired.

If that sounds abstract, consider what it would mean to do biology without it. Today, testing whether a drug will help or harm a particular cell type means running an experiment, waiting, and hoping. A virtual cell that could be trusted to predict the answer would compress years of trial and error into computation—and it would let researchers ask questions no laboratory can practically run.

What a Virtual Cell Actually Is

A virtual cell is not a cartoon animation of organelles bobbing around. It is a computational system trained to predict, simulate, and ultimately program cellular processes across different scales and types of data at once. The raw material is the explosion of biological measurement over the past fifteen years: nucleotide sequences, single-cell transcriptomes that record which genes are active in individual cells, multi-omic profiles, and spatial data that captures where everything sits inside a tissue.

The bet borrows directly from the playbook that produced large language models. Train a sufficiently large model on a sufficiently vast and varied corpus, and it begins to capture the hidden grammar underneath—not the words, in this case, but the logic of how a living system behaves. Researchers can now train foundation models directly on these biological corpora, and from that training the virtual cell is meant to emerge as a generative, reasoning framework rather than a hand-built equation.

$500M
Biohub commitment over five years
$400M
Internal tech: imaging, engineering, data infrastructure
$100M
External research and global data generation
1,024
NVIDIA H100 GPUs in the dedicated SuperPOD

The funding split is itself revealing. Of the half-billion dollars, roughly $400 million is earmarked for internal technology development—next-generation imaging, molecular engineering, and the data infrastructure to feed the models—while about $100 million funds external research and a coordinated, global data-generation effort. The computing muscle behind it includes a dedicated cluster of 1,024 NVIDIA H100 GPUs. And the collaborators read like a roll call of modern biology: the Broad Institute of MIT and Harvard, the Wellcome Sanger Institute, the Allen Institute, the Arc Institute, NVIDIA, and international consortia including the Human Cell Atlas and the Human Protein Atlas.

The initiative's most radical choice may be its least technical one: every dataset and technology it produces is to be made freely available to the global scientific community. A predictive model of life, built as a commons.

The Data Problem Underneath It All

Here is the catch that no amount of compute can buy around. Language models had the internet—trillions of words, already written, already digitized. Biology has nothing comparable. The measurements that exist were collected by thousands of labs using different methods, different instruments, and incompatible standards. Much of the data needed to train a truly predictive virtual cell does not yet exist, and the data that does is scattered and hard to compare.

This is why the Virtual Biology Initiative is, at its core, a data-generation project as much as a modeling one. The same logic drove a parallel announcement: an effort to galvanize a global push to build the open data foundation that AI-accelerated biology will require. The bottleneck is no longer imagination or algorithms. It is the slow, expensive, unglamorous work of generating enough high-quality, standardized biological measurement to teach a machine what a cell is.

"Virtual cells" are AI systems trained to simulate how cells work, how they respond, how they malfunction under disease—and how they might be reprogrammed or treated.
— Chan Zuckerberg Biohub, Virtual Biology Initiative

How Far Along Are We?

Honestly assessed, the field is early—and the scientists building it know it. In April 2026, researchers published a pointed self-examination asking whether current AI virtual cell models are actually useful for scientific discovery yet. The verdict was sobering: many of today's models are impressive at reproducing patterns they have already seen, but still unreliable when asked to predict the genuinely novel—the perturbation no one has tried, the cell state no one has measured. That is precisely the gap that matters, because prediction of the unknown is the entire point.

Promising directions are emerging fast. Autonomous systems that pair an AI coding agent with biological foundation models can now construct perturbation-response models on their own, searching across architectures and training pipelines with minimal human guidance. The vocabulary is expanding too—researchers now write about "world models" of the virtual cell and even predictive virtual embryos—signaling ambitions that reach well beyond a single cell type.

2023 Single-cell atlases scale 2024–25 First cell foundation models Apr 2026 $500M Virtual Biology Initiative → 2031 Predictive virtual cells? THE ROAD TO A SIMULATED CELL — A FIVE-YEAR HORIZON
From cataloguing cells to predicting them: the arc the initiative hopes to complete.

The Language-Model Lesson, and Its Limits

The intellectual confidence behind virtual cells comes from a genuine surprise of the last few years: scale works in ways no one fully predicted. Train a large enough model on a large enough corpus and capabilities emerge that were never explicitly programmed. The Virtual Biology Initiative is, at heart, a wager that the same phenomenon will appear in biology—that a model fed enough measurements of cells will internalize the rules of cellular behavior the way a language model internalized grammar.

But biology is not language, and the disanalogies matter. A sentence is its own ground truth; a cell's behavior is not. Two cells with identical genomes can behave differently depending on their history, their neighbors, and a fog of stochastic noise that biology seems to tolerate and even exploit. The "tokens" of biology are also far more expensive to generate than words—each requires a wet-lab experiment, reagents, instruments, time. And unlike text, biological data carries silent biases from the methods used to collect it, so a model can learn the quirks of an assay rather than the truth of a cell.

This is why the people building virtual cells talk less like triumphant disruptors and more like cautious engineers. The field has internalized a hard-won humility: impressive demonstrations on familiar cell types are not the same as reliable prediction in the wild. The honest framing is that the Initiative is buying a five-year option on a possibility, not announcing a finished tool—and that the binding constraint is data quality and coverage, not raw model size.

Why This Could Change Everything—Or Disappoint

The upside is genuinely vast. A trustworthy virtual cell would let researchers screen drug candidates against simulated patients before touching a pipette, identify which cell types a therapy will hit and which it will harm, and watch disease unfold from its very first molecular missteps. The Initiative frames its purpose precisely this way: understanding how disease begins, at the level of individual cells, so it can be stopped earlier than medicine has ever managed.

We have spent a century learning the parts of the cell. The next step is learning to predict it—and prediction is the difference between describing life and steering it.
— On the promise of virtual biology

The risk is equally real, and it is not a risk of catastrophe but of overpromise. Biology has humbled grand computational ambitions before. A virtual cell that works for a handful of well-studied cell types but fails on the messy, variable reality of human tissue would be a useful tool, not a revolution. The honest answer to "will this work?" is that we do not yet know—and that the people spending half a billion dollars are betting on a five-year horizon precisely because the science is not finished.

What makes this moment worth watching is the convergence: the same architectural ideas that gave us language models, the same abundance of GPUs, and now a deliberate, well-funded campaign to manufacture the missing data. If it succeeds, the cell stops being an ocean we map from the shore and becomes a system we can finally predict. That would not be the end of biology's mysteries. It would be the beginning of a biology that can see around corners.

Sources

  1. Chan Zuckerberg Biohub. "Virtual Biology Initiative." biohub.org
  2. PR Newswire. "Biohub Launches the Virtual Biology Initiative." prnewswire.com
  3. Axios. "Zuckerberg Chan Biohub gives $500 million to AI biology." axios.com
  4. The Rundown AI. "Zuckerberg's $500M AI biology swing." therundown.ai
  5. EdTech Innovation Hub. "Biohub puts $500 million behind AI biology push with MIT, Harvard and NVIDIA." edtechinnovationhub.com
  6. ION Genomics. "Biohub Leads $500M Push to Generate Biological Data for AI Models." iongenomics.bio
  7. bioRxiv. "Harnessing AI to Build Virtual Cells." biorxiv.org
  8. bioRxiv. "Are Current AI Virtual Cell Models Useful for Scientific Discovery?" biorxiv.org
  9. arXiv. "Large Language Models Meet Virtual Cell: A Survey." arxiv.org
  10. arXiv. "A path towards AI-scale, interoperable biological data." arxiv.org
  11. Phys.org. "AI foundation model aims to make stem cell therapies more predictable." phys.org
  12. OmicsML. "Awesome Foundation Model Single-Cell Papers." github.com
Ko-fi Buy me a coffee
Scroll to Top