What are Co-Scientist and Robin AI systems

Co-Scientist and Robin are AI systems that can generate scientific hypotheses, design experiments, and make real discoveries. Co-Scientist from Google DeepMind runs on Gemini 2.0, while Robin from FutureHouse uses Claude 3.7 and GPT o4-mini. Both are multi-agent systems with specialized AI agents coordinated to execute different steps of the research process.

What drug discoveries did AI make in Nature papers

Co-Scientist identified Vorinostat as a liver fibrosis treatment and found novel combination therapies for acute myeloid leukemia. Robin discovered that ripasudil, a glaucoma drug, could treat dry macular degeneration. All findings were validated in laboratory experiments, though none have entered human trials yet.

How do AI scientific discovery systems work

Both systems decompose scientific research into specialized steps handled by different AI agents - literature review, hypothesis formation, experimental design, data analysis, and interpretation. A supervising layer coordinates these agents. Human researchers still perform the physical laboratory experiments, but the AI provides the intellectual scaffolding for what to test and why.

What are the limitations of AI scientists

AI scientists face three main limitations: they communicate through natural language which lacks the precision needed for exact scientific quantities, they struggle in domains with sparse literature, and they cannot perform physical laboratory experiments. The language barrier means they cannot natively represent complex biological structures or statistical distributions.

When were AI scientist papers published in Nature

Two papers on AI scientific discovery systems were published back-to-back in Nature on May 19, 2026. The papers came from independent research teams working separately who arrived at the same conclusion in the same week that AI can function as a genuine scientific co-investigator.

The Telescope Moment — AI Scientists Arrive in Nature

When Galileo pointed a telescope at Jupiter in January 1610 and saw four small points of light that moved relative to the planet night after night, he wasn't just observing more clearly. He had a new instrument — one that made a whole category of discovery possible that had been structurally impossible before. On May 19, 2026, two papers published back-to-back in Nature suggested that something comparably significant may be happening to science right now. Two independent research teams, working separately, arrived at the same conclusion in the same week: AI can now function as a genuine scientific co-investigator.

What Happened

The papers that landed the same week

The two systems are architecturally similar but independently built. Co-Scientist, from Google DeepMind, is a multi-agent system built on Gemini 2.0. Robin, from FutureHouse — a nonprofit with the explicit mission of automating scientific discovery — runs on a combination of Anthropic's Claude 3.7 and OpenAI's o4-mini. Both are multi-agent systems: collections of specialised AI agents coordinated by a supervising layer to execute different steps of the research process.

Both teams published not just descriptions of their systems, but experimental results. Not thought experiments or benchmark scores — real laboratory validation of AI-generated hypotheses.

Co-Scientist · AML

Novel combination therapies for acute myeloid leukaemia

AI-proposed drug repurposing candidates confirmed to inhibit tumour viability in multiple AML cell lines at clinically relevant concentrations.

Co-Scientist · Liver fibrosis

Vorinostat as an anti-fibrotic agent

Co-Scientist identified the FDA-approved anti-cancer drug Vorinostat as a liver fibrosis candidate. In hepatic organoid tests: 91% reduction in TGFβ-induced chromatin structural change.

Robin · dAMD

Ripasudil as a novel treatment for dry macular degeneration

Robin identified a drug already used in ophthalmology — the glaucoma treatment ripasudil — as a novel candidate for dAMD, a condition it had never been proposed for. Lab-validated in RPE cell culture.

Co-Scientist · AMR

Mechanisms of antimicrobial resistance gene transfer

Co-Scientist generated novel hypotheses about the evolutionary mechanisms by which resistance genes spread between bacteria — relevant to the global antimicrobial resistance crisis.

None of these are clinical results. None of the drugs have entered human trials. The researchers from both teams are careful to note that preclinical validation is necessary before any therapeutic claim can be made. What is validated is not the drug — it is the process: that AI can generate scientifically useful hypotheses, propose and guide experiments to test them, and identify candidates that had not occurred to human researchers working in the field.

How They Work

The architecture of a machine scientist

Both systems share a fundamental design philosophy: the discovery process is decomposed into specialised steps, each handled by a different agent, coordinated by a supervisor. The decomposition mirrors the actual structure of scientific research — literature review, hypothesis formation, experimental design, data analysis, interpretation, refinement — and assigns each step to a component optimised for it.

Co-Scientist

General purpose across disciplines. Uses a "hypothesis tournament" — an internal review board that tests each proposed hypothesis against the existing scientific literature before it advances. This addresses hallucination: a proposal that contradicts established evidence is challenged internally before it becomes an experimental directive.

Robin

Drug discovery specialist. Three core agents: Crow (literature search), Falcon (candidate evaluation), Finch (data analysis — writes and executes its own Python and R code). A "built-in brake" restricts Robin to established knowledge and limits irrational leaps in logic — its version of hallucination mitigation.

The human researchers in both cases executed the physical laboratory experiments — culturing cells, running assays, preparing organoids. But the intellectual scaffolding — what to test, why, what to do with the results, what to try next — came from the AI. All hypotheses, experimental choices, data analyses, and figures in Robin's dAMD paper were generated autonomously. The paper was written by AI. Human researchers validated it.

"These systems are designed to collaborate with researchers, and a scientist would always be in the loop. The real-world demonstrations from both groups provide examples of what the future of scientific research with AI agents might look like."

— Nature Press Release, May 19, 2026

The Limits

What the same papers reveal

The more honest the science, the more it reports what failed alongside what succeeded. Both Nature papers document limitations clearly — and the same limitations have been documented in independent analysis of the results.

The most fundamental is structural: both systems communicate through natural language. Language is the medium that makes AI scientists accessible to human researchers. It is also a medium with inherent imprecision. Scientific communication requires exact quantities, precise units, and unambiguous descriptions of experimental conditions. Natural language approximates all of these. "Increase phagocytosis significantly" is a hypothesis. "Increase phagocytosis by ≥40% at 10μM concentration in ARPE-19 cells within 24 hours of treatment" is a scientific claim. The distance between those two formulations is the distance between promising and reproducible.

The Language Problem

Language-based AI systems face a structural limit in biology: the quantitative complexity of biological systems — dose-response curves, off-target effects, cell-line-specific behaviour, time dependence — cannot be fully encoded in natural language prompts and responses. A model that reasons primarily through text cannot natively represent a three-dimensional protein binding site, a pharmacokinetic equation, or the statistical distribution of results across a cell population.

Both teams took care to limit hallucination — confident, false assertions. Co-Scientist uses an internal review board; Robin uses a literature-grounding brake. These are meaningful mitigations. They do not resolve the deeper issue: language is not biology's native language.

A second limitation is domain breadth. Robin, applied to a well-studied disease with a rich literature (dAMD), performed remarkably. It is less clear how the same system would operate in a less-characterised domain — a disease with sparse literature, contradictory findings, or research primarily published in non-English languages. The systems are powerful precisely because they have enormous amounts of literature to synthesise. In domains where that literature is thin, their advantage narrows.

A third limitation, acknowledged in the Nature editorial accompanying both papers, is the boundary between hypothesis generation and experimental verification. AI systems currently cannot perform wet lab experiments. They cannot observe that an assay result looks anomalous, cannot smell that a reagent has degraded, cannot notice that a cell culture behaved differently than expected in a way not captured by the data file. The human researcher's embodied presence in the laboratory is not yet replaceable, and may not be for a long time.

The Wider Picture

A new instrument, not a replacement

Researchers at Stanford HAI's AI+Science conference in May 2026 reached for the same historical analogy independently: the telescope, the microscope. What those instruments had in common was not that they replaced scientists, but that they made a category of discovery possible that had been structurally impossible before. Stars were always there. Cells were always there. The knowledge was latent, waiting for an instrument sensitive enough to reveal it.

What is latent in the scientific literature today — in the 50 million biomedical papers that no single human researcher can read, in the cross-domain connections that disciplinary specialisation makes invisible, in the pattern that emerges only when you can hold the entire literature in working memory at once — is unknown. The argument both Nature papers make, implicitly, is that some of it is accessible now. Robin identified ripasudil for dAMD not because it reasoned better than an ophthalmologist, but because it read more widely, without the disciplinary borders that constrain expert thinking.

The rate at which scientific knowledge is produced has been accelerating for decades. The rate at which any single researcher can absorb and synthesise that knowledge has not. AI scientists, at their best, do not outthink human researchers. They out-read them.

Whether this constitutes a telescope moment — whether future scientists will look back at May 2026 as the month the instrument arrived — depends on how these systems perform as they scale, diversify, and encounter harder problems. The results so far are a proof of concept, not a proof of general capability. But the structure of both papers — independent teams, different architectures, same week, same conclusion, peer-reviewed in Nature — is the structure of a real signal.

The question worth sitting with is this: if AI can already traverse 50 million papers and identify a drug nobody thought to try — what else is in there?

Primary Sources

Gottweis J, Weng WH, Daryin A et al. Accelerating scientific discovery with Co-Scientist. Nature (2026). doi.org/10.1038/s41586-026-10644-y
Ghareeb AE, Chang B, Mitchener L et al. A multi-agent system for automating scientific discovery. Nature (2026). doi.org/10.1038/s41586-026-10652-y
Nature Editorial. Why AI cannot do good science without humans. nature.com
Nature News. Teams of AI agents boost speed of research. nature.com
Conversation. New AI scientists are improving — but reveal their fundamental limits. theconversation.com
TechTimes. Google Co-Scientist Reaches Nature: Hypothesis Agents Validated in Lab. techtimes.com
Stanford HAI. AI+Science: Accelerating Discovery conference. May 5, 2026. hai.stanford.edu
FutureHouse. Demonstrating end-to-end scientific discovery with Robin. futurehouse.org

The TelescopeMoment

The papers that landed the same week

The architecture of a machine scientist

What the same papers reveal

A new instrument, not a replacement

The Telescope
Moment