Skip to main content
AI molecular discovery visualization A vast chemical space of scattered molecular points converging on a single glowing compound — synthecin — at the centre. Annotated with Space Mono labels. SYNTHECIN 46B COMPOUNDS EXPLORED MRSA-ACTIVE IN VIVO McMaster · Molecular Systems Biology · 2026 CHEMICAL SPACE

AI & Scientific Discovery  ·  Medicine & Drug Discovery

The 46 Billion
Molecule Search

An AI explored a chemical universe no human lab could ever screen — and came back with a working antibiotic against drug-resistant staph.

The biggest laboratory screen in pharmaceutical history can test roughly one million compounds. That sounds enormous until you consider that the chemical space of small molecules — the universe of all possible drug-like compounds — contains an estimated 1060 candidates. The gap between what we can search and what exists is so vast it barely has a name. For decades, that gap has been one of the core reasons antibiotic discovery has slowed to a near-standstill. A new AI system from McMaster University just started closing it.

The Discovery

One compound from seventy-nine

In April 2026, researchers in Jon Stokes' lab at McMaster University published a paper in Molecular Systems Biology — selected for the cover of the journal's June issue — describing a new generative AI model they call SyntheMol-RL. The model was designed to do something no previous drug-discovery AI had managed cleanly: generate novel antibiotic candidates that are not only effective at killing bacteria, but also water-soluble enough to actually work inside a human body.

To test it, the team gave SyntheMol-RL a specific target: design water-soluble compounds capable of killing Staphylococcus aureus — staph. The bacteria responsible for wound infections, pneumonia, and the notoriously hard-to-treat MRSA, methicillin-resistant S. aureus, which kills tens of thousands of people annually despite being detectable with routine diagnostics.

The model generated 79 candidate compounds. The team synthesised them in the lab and tested each one. Thirteen showed potent antibacterial activity in cell culture. Seven of those thirteen were structurally novel — meaning no compound like them had ever been documented in the scientific literature. And one stood apart.

They named it synthecin.

Synthecin was formulated as a topical cream and tested on a mouse model of MRSA wound infection — the same class of infection that causes serious complications in surgical patients and burn victims. The result was unambiguous: the infection was controlled. "Synthecin was highly effective at controlling the infection," said Denise Catacutan, the graduate student who led the wet lab work. "It worked extremely well as a topical drug, and also shows early promise as something that could be applied or optimized for systemic use in the future."

46B Molecular compounds
in chemical space searched
13 / 79 Generated candidates with
potent in vitro activity
1 Clinical lead — synthecin —
effective against MRSA in vivo

That hit rate — 13 active compounds from 79 generated, with one already validated in a living system — is extraordinary by the standards of conventional drug screening, where identifying even a single viable lead from millions of compounds is considered a success.

How It Works

Molecular Lego, reinforced

To understand what SyntheMol-RL does, it helps to understand why drug discovery is so hard — and why earlier versions of AI haven't fully solved it.

Most AI drug discovery tools are trained to predict whether a given molecule will be biologically active. That is still useful: given a database of candidate compounds, a trained model can triage the ones most likely to work, reducing how much wet lab testing is required. But these tools are still selecting from known chemical space. SyntheMol-RL builds from scratch.

The model works like a constrained combinatorial engine. It draws on a library of approximately 150,000 molecular building blocks — smaller chemical fragments — and a set of 50 known chemical synthesis reactions. Using reinforcement learning, it learns to assemble those fragments in sequences that maximise a reward signal tuned to the properties the researchers care about: antibacterial activity against S. aureus, and water solubility.

Key Innovation

Earlier versions of SyntheMol were optimised for antibacterial activity alone. That produced candidates that killed bacteria in a dish but couldn't be developed into drugs — too insoluble to survive in the body, too toxic to human cells, or simply impossible to synthesise at scale.

SyntheMol-RL builds solubility directly into the generation objective. The model learns to reject chemical paths that lead to insoluble structures before they're even completed — rather than generating candidates first and filtering afterward. The result is a much higher rate of compounds that pass basic clinical viability tests alongside biological activity.

The "RL" in the name stands for reinforcement learning — the same class of algorithm that taught AlphaGo to play Go, and that underlies much of the reasoning capability in modern large language models. In SyntheMol-RL, the reinforcement signal comes from a scoring function that evaluates each newly assembled molecule on both dimensions simultaneously. The model gradually learns which types of structural choices produce high-scoring results, and biases its generation process accordingly.

"In the lab, we can build chemical compounds using a set of smaller chemical fragments, which can be stuck together like molecular Lego blocks," said Jon Stokes, the lab's principal investigator. "SyntheMol-RL configures those fragments in different ways, faster than humans ever could, to create new, larger chemical compounds that should — based on its knowledge — be antibacterial."

"Bleach is antibacterial — so is fire. But they obviously don't tick those other boxes. Good drug candidates must meet several different criteria, otherwise they'll never become actual medicine."
— Jon Stokes, McMaster University

The drug-likeness problem Stokes describes has killed far more promising compounds than biological failure has. A molecule that eradicates S. aureus in a petri dish is worthless if it destroys kidney cells at therapeutic doses, if it falls apart before reaching the infection site, or if it precipitates out of solution in the gut. SyntheMol-RL encodes those constraints at the generation stage — the point at which it's cheapest to enforce them.

One remaining unknown: the team has not yet determined synthecin's mechanism of action — precisely how it kills bacteria. That knowledge matters for predicting resistance pathways, assessing safety in human tissue, and guiding further optimisation. Stokes' lab is now conducting mechanism-of-action studies. The absence of this information doesn't undercut synthecin's in-vivo result, but it represents the next necessary step before clinical development begins.

Context

The emptying pipeline

Antibiotic resistance is not a future problem. The World Health Organization classifies it as one of the greatest threats to global health today. In 2019 — before COVID reshaped pharmaceutical supply chains and clinical practices — antimicrobial-resistant infections were directly responsible for an estimated 1.27 million deaths worldwide, and a contributing factor in nearly 5 million more. The numbers have grown since.

The fundamental issue is economic. Developing a new antibiotic takes a decade or more and costs upward of a billion dollars. Antibiotics are taken for days or weeks, not lifetimes — unlike treatments for chronic conditions such as diabetes or heart disease, which generate reliable long-term revenue. When a new antibiotic does reach the market, physicians are encouraged to use it sparingly, as a last resort, to delay resistance emergence. That rational prescribing strategy further reduces sales volume. Several companies that spent years and hundreds of millions of dollars successfully bringing new antibiotics to market have subsequently gone bankrupt.

Traditional vs. AI-augmented antibiotic discovery
Stage Traditional approach SyntheMol-RL approach
Compound sourcing Screen existing compound libraries (~1M molecules max) Generate novel compounds from 46B-compound chemical space
Solubility filtering Test after synthesis (expensive, slow) Built into generation objective (no wasted synthesis)
Time to first lead Years of high-throughput screening Days of AI generation + weeks of synthesis/testing
Structural novelty Limited by library composition 7 of 13 active compounds structurally novel in literature
Disease specificity Requires separate effort per target Model is disease-agnostic; objective function can be retuned

Into this landscape, AI has entered in waves. In 2020, MIT researchers used a deep learning model to identify halicin — a compound with a novel mechanism of action effective against drug-resistant tuberculosis and Acinetobacter baumannii, one of the WHO's Priority 1 pathogens. In 2023, the same team expanded to identify a new class of antibiotic candidates against MRSA specifically. Both discoveries used AI to repurpose or identify existing molecules, rather than to design new ones.

SyntheMol-RL represents a different posture: generative rather than selective. Instead of asking "which of these known compounds might work?", it asks "what compound, which has never existed, should we build?" That is a fundamentally different relationship between AI and chemistry — and synthecin is its first proof of concept to pass an in-vivo test.

Implications

Beyond the antibiotic

The significance of SyntheMol-RL extends well past MRSA. Stokes is explicit about this: "We used our model to design new antibiotics, but it's capable of so much more. We built it to be disease agnostic, meaning it could just as easily generate novel drug candidates for diabetes or cancer or other indications." The architecture — reinforcement learning over a combinatorial chemical space, with a reward function encoding multiple simultaneous biological constraints — is generalisable to any target for which a reliable scoring function can be built.

That generalisability matters because the bottleneck in drug discovery is not understanding which diseases to target. It's the ability to generate viable candidate molecules against those targets quickly and cheaply. If SyntheMol-RL's approach can be validated across a range of indications, it doesn't just add one new antibiotic to a depleted pipeline. It changes the cost structure of the entire early-stage drug discovery process.

What's Next

The immediate research priority is mechanism-of-action studies: understanding exactly how synthecin kills S. aureus. This is necessary for predicting what resistance mutations bacteria might evolve in response, and for evaluating safety in human tissue. Stokes' lab is actively conducting these studies.

In parallel, the team is developing an enhanced version of SyntheMol-RL, expected later in 2026, with improved generation efficiency and broader target coverage. Whether synthecin itself proceeds toward clinical development will depend on the mechanism studies and any further preclinical toxicology work.

The deeper implication is about the nature of scientific exploration. The chemical universe is unimaginably large. Human chemists, even working in large teams with sophisticated instruments, can only sample a tiny corner of it directly. For most of pharmaceutical history, that limitation has been a hard ceiling on what drugs could be discovered. AI systems like SyntheMol-RL don't eliminate that ceiling so much as replace it with something new: the ceiling imposed by the quality of our objective functions — our ability to specify precisely what we want a molecule to do.

That is a solvable problem in a way that "there aren't enough hours in the day to screen 1060 compounds" is not. As scoring models improve — as our ability to computationally predict toxicity, metabolic stability, cell permeability, and target binding becomes more reliable — the generative pipeline becomes more capable. The 46 billion molecules SyntheMol-RL explored this year may look modest against what its successors will search in five.

Synthecin may or may not become a drug. The mechanism studies may reveal problems that make it unsuitable for human use. That outcome would be disappointing but not surprising: early-stage leads fail all the time. What would not be undone by that outcome is the proof of concept: that an AI, given the right constraints and enough chemical space to explore, can design a molecule that never existed before — and have it work in a living animal against one of the world's most dangerous drug-resistant pathogens.

That is a genuinely new thing in science.

Ko-fi Buy me a coffee
Scroll to Top