Skip to main content
DNA double helix with anomalous codon highlighted in violet, a single-cell protist in the foreground A stylized DNA double helix curves through a near-black void, rendered in muted violet tones. In the foreground, a glowing circular protist cell displays its anomalous genetic code structure, with the codon sequences TAA and TAG highlighted in bright violet-white, surrounded by atmospheric glow. TAA TAG LYS GLU TGA STOP OLIGOHYMENOPHOREA SP. PL0344 NON-CANONICAL CODON REASSIGNMENT OXFORD UNIVERSITY PARKS, UK STANDARD DOUBLE HELIX STRUCTURE

Frontier Biology · Research Deep Dive

Three Stop Codons.
One Organism That Ignored Two of Them.


A microscopic protist pulled from a university pond carries a genetic code that defies the most universal rule in all of biology. Only one stop codon remains. The other two have been repurposed entirely.

There are 64 codons in the standard genetic code. Sixty-one specify amino acids. Three signal the end of a gene. For more than 50 years, biologists have treated this last part as essentially fixed across all life on Earth. A ciliate found in a freshwater pond in Oxford, England has now broken that assumption in a way that had never been documented before.

Section One

The Number


The number is two. Two stop codons that should have ended genes were instead found encoding amino acids. Not the same amino acid. Different ones. That distinction is what makes this organism unusual at a level that even other known exceptions to the genetic code do not reach.

Dr. Jamie McGowan, a postdoctoral scientist at the Earlham Institute, was not looking for this. His team was field-testing a new DNA sequencing pipeline capable of processing material from a single cell. They picked a protist from a pond at Oxford University Parks almost arbitrarily. It was a convenience sample from a familiar location. The organism turned out to be an undescribed species, classified as Oligohymenophorea sp. PL0344, and when the genome was sequenced and analysed, the standard interpretation of its stop codons did not work.

In every well-studied organism on Earth, three specific RNA sequences perform one job: they terminate protein synthesis. The sequences are UAA, UAG, and UGA. (In DNA form, those are TAA, TAG, and TGA.) They are not instructions for building anything. They are punctuation. A period at the end of every protein-coding sentence in the genome.

In Oligohymenophorea sp. PL0344, only TGA still acts as a stop codon. TAA now codes for lysine. TAG now codes for glutamic acid. Two of the three canonical stop signals have been reassigned to amino acids, and those amino acids are chemically distinct from each other. The protist has, effectively, rewritten two-thirds of the genetic code's punctuation system.

64 Total codons in the standard genetic code
3 Canonical stop codons: TAA, TAG, TGA
1 Stop codon remaining in PL0344's genome

Concept

A codon is a sequence of three nucleotide bases (A, T, G, or C in DNA; A, U, G, or C in RNA). The cell's protein-building machinery reads the genome three letters at a time, like a ticker tape. Each triplet is matched to a specific amino acid, which is added to the growing protein chain. Stop codons are the exception: no amino acid is added. Instead, the ribosome releases the completed protein and the process ends.

Changing what a stop codon means is not a small modification. It potentially alters how every single gene in the genome ends. If the cell's ribosomes now read TAA as "add lysine" rather than "stop here," the entire proteome is reconstructed on those terms. The organism is not bending the code. It has replaced it.

Section Two

What It Took to Find It


Single-cell genomics is technically demanding. Most sequencing requires substantial amounts of DNA, which means culturing organisms in bulk or pooling large numbers of cells. McGowan's team was developing a pipeline to work with the vanishingly small amount of genetic material inside a single cell, because most microbial life in natural environments cannot be cultured in a laboratory. They needed a test subject. The pond was nearby. PL0344 was there.

The pipeline worked. But the downstream analysis produced a puzzling result: the software that interprets genomic sequence data was finding what appeared to be coding sequences running through positions normally flagged as stop codons. When you see a TAA or TAG in the middle of a protein-coding gene, standard annotation tools treat it as an error or an incomplete sequence. They are not designed to consider that TAA might mean lysine.

McGowan and colleagues, including Thomas A. Richards at the University of Oxford and David Swarbreck and Neil Hall at the Earlham Institute, worked through the alternative interpretation systematically. The computational evidence pointed the same way every time: TAA and TAG were being used as sense codons, not as termination signals. They published their findings in PLOS Genetics in 2023.

"We're not aware of any other case where these stop codons are linked to two different amino acids. It breaks some of the rules we thought we knew about gene translation."
Dr. Jamie McGowan, Earlham Institute — PLOS Genetics, 2023

The organism itself belongs to Oligohymenophorea, a class within the ciliates, a group of single-celled eukaryotes characterised by hair-like cilia they use for locomotion. Ciliates like Paramecium and Tetrahymena are well-studied organisms, and several ciliate lineages are already known to deviate from the standard genetic code. But those known variants follow a specific pattern that PL0344 breaks.

The computational work required to confirm the finding was substantial. Identifying codon usage in an uncultured organism from a single cell means working with incomplete, fragmentary sequence data, and distinguishing a genuine reassignment from sequencing error or contamination demands multiple lines of corroborating evidence. The team used phylogenetic analysis, codon usage statistics, and protein homology searches across the genome to verify that the reassignment was real and consistent throughout the sequence, not a local artefact.

Section Three

Why It Was Missed


The universal genetic code has been known since the 1960s. By 1966, Marshall Nirenberg and colleagues at the NIH had cracked its full structure, establishing which codon specifies which amino acid. The code was described as universal. When exceptions appeared, the assumption held that they would be peripheral and constrained.

The first documented exceptions came in 1979, when vertebrate mitochondria were found to use UGA as a tryptophan codon rather than a stop signal. Mitochondria have their own small genomes, and those genomes had diverged under different evolutionary pressures than nuclear DNA. By the mid-1980s, further exceptions had accumulated: Mycoplasma bacteria used UGA for tryptophan; ciliated protozoa used UAA and UAG to encode glutamine.

That ciliate precedent is important context for understanding why PL0344 is different. In organisms like Tetrahymena and Paramecium, both UAA and UAG were repurposed as sense codons, but they both came to mean the same thing: glutamine. The two codons changed together, and they changed to the same amino acid. This pattern had been observed enough times that researchers began to treat TAA and TAG as evolutionarily coupled. If one changed, the other followed. And they went the same direction.

Stop Codon Reassignments: Standard vs. Known Variants vs. PL0344
Organism / Context TAA (UAA) TAG (UAG) TGA (UGA)
Standard genetic code STOP STOP STOP
Tetrahymena, Paramecium (ciliates) Glutamine Glutamine STOP
Vertebrate mitochondria STOP STOP Tryptophan
Mycoplasma capricolum STOP STOP Tryptophan
Oligohymenophorea sp. PL0344 Lysine Glutamic acid STOP (only)

Figure 1 — Codon reassignment variants across selected organisms. PL0344 is the only documented case where TAA and TAG encode two distinct amino acids.

PL0344 breaks the coupling assumption entirely. TAA means lysine. TAG means glutamic acid. These are chemically different amino acids with different properties, different roles in protein structure, and different biosynthetic origins. There is no known evolutionary mechanism that was supposed to produce this outcome. The assumption that TAA and TAG would always change in tandem, and always to the same thing, had not been formally disproven before this organism was found.

Why was it missed? Partly because environmental protists are extraordinarily undersampled. Single-cell eukaryotes in freshwater environments represent some of the least-studied biodiversity on Earth. Culture-dependent methods miss most of it. Even metagenomic approaches that sequence mixed environmental samples often lack the read depth or analytical tools to detect fine-grained codon-usage deviations in poorly characterised lineages. The pipeline McGowan's team was developing exists precisely to address this gap. PL0344 was discovered accidentally, but it was discovered because someone had built a tool capable of seeing it.

Lysine and glutamic acid, the two amino acids now assigned to former stop codons, are among the most abundant amino acids in cellular proteins. They carry electrical charges that govern protein folding, enzyme activity, and molecular interactions across the cell. PL0344's entire protein repertoire is built on a code where these amino acids can appear at positions where every other eukaryote places a period.

Section Four

What Changes


A single organism is not a paradigm shift. But a single organism that breaks a constraint assumed to be universal does something more precise: it collapses the assumption. If TAA and TAG are not necessarily coupled, every ciliate lineage sampled with the right tools now becomes a candidate for novel codon assignments. The search space for genetic code variation has expanded.

For evolutionary biology, the finding raises a mechanistic question that has no current answer. How does a cell survive while reassigning a stop codon to an amino acid? The transition problem is real. Partway through reassignment, ribosomes reading the genome would insert an amino acid where some protein-coding sequences expect a stop, and stop where others expect an amino acid. The cell should collapse during any intermediate state. The fact that PL0344 exists — functional, swimming, alive in a pond in Oxfordshire — means the transition happened. The mechanism by which it happened is unknown.

There are competing hypotheses. One proposes that certain codons become rare enough in the genome to be nearly absent before they are reassigned, reducing the disruption of the switch. Another invokes read-through suppression, where ribosomes occasionally misread stop codons anyway, creating a pool of extended proteins that natural selection could then act on. Neither hypothesis fully accounts for the double reassignment to two different amino acids seen in PL0344.

The practical implications reach into synthetic biology. Researchers working on engineered organisms with expanded genetic codes have spent years trying to introduce non-canonical amino acids at stop codons, precisely because stop codons are the only positions in the genome not already occupied by a standard amino acid instruction. PL0344's genome shows that biology has already solved a version of this problem under genuine evolutionary pressure. Studying how it manages the reassignment — which tRNA molecules recognise the former stop codons, which release factors no longer respond to them — could inform approaches to intentional code expansion in the laboratory.

McGowan described the discovery as a product of chance: "It's sheer luck we chose this protist to test our sequencing pipeline, and it just shows what's out there, highlighting just how little we know about the genetics of protists." That observation carries a specific weight. If an accidental sample from a university park pond yields an organism with an unprecedented genetic code, the untested space of protist biodiversity is effectively unlimited in its potential for further surprises.

There are more than a million described eukaryotic species. The number of protist lineages that have been sequenced at sufficient depth to detect codon reassignment is a small fraction of that. The number that have been sampled at all from freshwater environments is smaller still. The question PL0344 opens is not whether other non-canonical codes exist. It's how many of them already exist in ponds, puddles, and soil films that no one has yet thought to look in.

The genetic code was universal. Then it was almost universal, with a small number of documented exceptions in organelles and a handful of unusual organisms. PL0344 adds a new category of exception, one that violates a constraint that even the known exceptions had preserved. The rules are not rewritten. But the boundaries of what the rules permit are now demonstrably wider than they were before this organism was found.

Ko-fi Buy me a coffee
Scroll to Top