What is the cap set problem in mathematics

The cap set problem asks for the largest collection of points you can place in an n-dimensional grid where no three points form a straight line. It's defined in a mathematical structure called a finite field, where three points form a line if their coordinates sum to zero in the field's arithmetic.

How did AI solve the cap set problem

FunSearch, an AI system by Google DeepMind, broke the cap set record by writing Python code that constructs cap sets rather than searching for answers directly. The system uses a large language model to generate and evolve programs, with an automated evaluator scoring each program's results.

What was the previous cap set record before FunSearch

For n=8, the previous best known cap set had 496 points and had remained unchanged for years. FunSearch found a cap set of 512 points in 2023, representing a 3.2% improvement that broke decades of stagnation.

Why is FunSearch different from other mathematical AI tools

Unlike traditional mathematical computing tools that follow explicit instructions, FunSearch explores and proposes new approaches. It searches the space of possible algorithms rather than just optimizing parameters, potentially discovering mathematical strategies that humans haven't systematically explored.

What makes FunSearch's discovery important beyond the record

FunSearch provided executable code that generates the cap set, not just a list of coordinates. This gives mathematicians an algorithm to study and understand, revealing the method behind the result rather than just the answer itself.

Lisa Pedrosa

Mathematics · Combinatorics

No Three in Line

Mathematics · Combinatorics · AI & Discovery

No Three in Line

For decades, one of mathematics' most notorious open problems defied every human attempt. In 2023, an AI named FunSearch broke the record — by writing code no mathematician had ever imagined.

Terence Tao, one of the greatest living mathematicians and winner of the Fields Medal, once called it his favorite open problem. The cap set problem sounds deceptively simple: in an n-dimensional grid, what is the largest collection of points you can place where no three form a straight line? For n=3, you can work it out on paper. For n=8, the search space contains more possibilities than atoms in the observable universe. Mathematicians had been stuck for decades. Then, in December 2023, a large language model wrote a short piece of computer code that no human had ever written, and the record fell.

The Problem

A Puzzle That Grows Without End

To understand the cap set problem, imagine a grid of points. In three dimensions, think of a 3×3×3 Rubik's cube. Each of the 27 unit cubes in that grid is a point. Your task: place as many points as you can such that no three of them are collinear—meaning they cannot all lie on a straight line that passes through the grid evenly. This constraint of "no three in line" is what mathematicians call being in general position, and it's fundamental to combinatorics.

The mathematical formulation is more precise. The cap set problem lives in a structure called a finite field, specifically (Z/3Z)^n—an n-dimensional space where each coordinate is 0, 1, or 2, and arithmetic wraps around (so 2+1 equals 0, just like a clock). In this algebra, a "line" is defined not by Euclidean geometry but by arithmetic: three points a, b, c form a line if a + b + c = 0 in the field's arithmetic. A cap set is a subset of this space with no such arithmetic progressions. The problem asks: how large can a cap set be in (Z/3Z)^n?

The problem's history is a thread running through decades of mathematics. Posed in the 1970s, it remained tantalizing because the answer seemed to shrink as n grew—but by how much? For n=3, the maximum cap set has exactly 9 points out of 27 total. For n=4, it's 20 out of 81. As n increases to 5, 6, 7, and beyond, the proportion of points you can select before violating the constraint decreases, but the rate of decrease was unknown. This gap—between what mathematicians could prove and what they could construct—became one of combinatorics' most famous open questions.

Then came a breakthrough in 2016 that shifted the conversation. Mathematicians Croot, Lev, and Pach, working independently from Ellenberg and Gijswijt, proved that cap sets cannot grow faster than exponentially slowly—bounded roughly by 2.756^n. This was a ceiling, not a floor: we knew cap sets couldn't be huge, but we didn't know where the true maximum lay. For n=8, the best known cap set had 496 points for years. The needle barely moved.

The Arithmetic Structure

The cap set problem lives in (Z/3Z)^n: an n-dimensional space where each coordinate is 0, 1, or 2, and arithmetic wraps around (2+1=0). A "line" connects three points whose coordinates sum to 0 in each dimension. The problem is elegant in its definition but merciless in its difficulty: the search space explodes combinatorially with each additional dimension.

The Machine

Searching in the Space of Functions

FunSearch (short for "searching in the space of functions") was built by Bernardo Romera-Paredes and colleagues at Google DeepMind. Published in Nature in December 2023, it represents a fundamentally different approach to automated mathematical discovery. Rather than searching for answers directly, FunSearch searches for programs that generate answers. The AI doesn't try to output a cap set directly—it writes Python functions that construct cap sets.

The architecture is elegant and evolutionary. Two components work in a feedback loop. First, a pre-trained large language model (based on Codey, Google's code-generating LLM) writes and mutates programs. This is the creative engine—the model brings outside-the-box ideas, unexpected algorithmic strategies, approaches that human programmers haven't systematically explored. Second, an automated evaluator runs each program, measures the size of the cap set it produces, and scores it numerically. Bad programs are discarded. Good programs are fed back to the LLM as examples in the prompt, encouraging further refinement and variation.

The process is evolutionary in spirit because programs compete on fitness—the size of the cap set they construct. Survivors (high-scoring programs) are used as examples for generating new programs. Mutations come from the LLM paraphrasing, combining, and extending existing code. Over thousands of iterations, the population of programs improves. Each iteration learns from the previous generation, gradually climbing toward better solutions. It's not hill-climbing in the traditional sense, because the LLM introduces true novelty, not just gradient-based tweaks.

What makes this approach different from standard optimization is that FunSearch is searching the space of algorithms, not just parameter tuning. The LLM can propose entirely new mathematical strategies—different ways to construct a cap set, different heuristics for placing points, different pruning techniques. These are ideas that exist in the space of all possible programs but were never explicitly written by a human, never appeared in a textbook or research paper. FunSearch's role is to explore that space intelligently.

512

Points in FunSearch's best cap set for n=8, a new world record

496

Previous best cap set for n=8, the human record before FunSearch

2023

Year FunSearch published the first LLM-based mathematical discovery in Nature

The Discovery

The Code That Cracked the Record

FunSearch found a cap set of size 512 for n=8, breaking the previous record of 496. On its surface, this improvement seems modest—16 additional points, a 3.2% gain. But in combinatorics, records are earned in blood. These numbers had stood for years, established through meticulous human work, and advancing them by even a handful of points is cause for celebration. The fact that an algorithm found 512 first suggests there's likely more room to climb.

What makes FunSearch's discovery historically significant is not primarily the new record itself, but how the answer was delivered: as a computer program. Previous cap set records were discovered as explicit lists of coordinates or through mathematical proofs that bounded the maximum. FunSearch did something different. It gave mathematicians executable code that generates the cap set. This program is inspectable, generalizable, and conceptually understandable in a way that a raw list of 512 grid coordinates is not.

This distinction matters enormously. When a mathematician receives a new record from FunSearch, they don't just see a number—they see an algorithm. They can read the code and ask: Why does this strategy work? What mathematical principle is the program exploiting? What insight is embedded in these lines of Python? The AI's answer contains a method, not just a result. The program reveals a way of thinking about the problem that humans hadn't yet articulated. That's the gift: not a fish, but a way of fishing.

FunSearch also found improvements for other combinatorial problems in the same paper, including bin packing—a classic optimization problem in computer science. This demonstrates the approach isn't a single-trick system tailored only to cap sets. The same search-in-function-space strategy can be applied to any problem with a measurable objective function: if you can score a solution, you can evolve programs toward better solutions.

Why Programs Matter More Than Answers

In mathematics, a constructive proof—one that explicitly builds the object being discussed—is more valuable than an existence proof. FunSearch gives mathematicians a constructive method, not merely "there exists a cap set of size 512" but "here is a specific procedure for building one." That procedure contains mathematical information: heuristics, pruning strategies, structural insights. Mathematicians can then study the code, generalize it, and prove theorems about the principles it uses. The program is a Rosetta Stone for understanding the problem itself.

The Implications

The LLM as Mathematical Explorer

FunSearch reveals something unexpected about large language models: their value in mathematics is not in reciting known results or pattern-matching against training data—the "stochastic parrot" critique that has dogged LLMs since their rise. Rather, it's in creative generation. An LLM can propose approaches that no human has explicitly tried because it's not constrained by human intuition or the path-dependency of research tradition. It explores the space of possible programs with a kind of algorithmic curiosity.

The LLM wasn't trained on cap sets or even advanced combinatorics problems. It was trained on code. What it brings to the collaboration is the ability to write programs that implement creative mathematical strategies—strategies that human programmers haven't thought of because they haven't been systematically exploring that space. The evaluator scores them. Good ones breed. The cycle continues. Over thousands of iterations, this process discovers programs that embody mathematical insights no single human had conceived.

This paradigm is fundamentally different from traditional mathematical computing tools like numerical methods or computer algebra systems. Those tools do precisely what they're instructed to do: solve equations, simplify expressions, compute values. They are faithful servants, not explorers. FunSearch, by contrast, searches. It proposes. It mutates. It evolves. It finds solutions in the dark and brings them to light.

The implications for open problems are far-reaching. There are dozens of famous unsolved combinatorial optimization problems awaiting breakthroughs: the travelling salesman problem, various packing and covering problems, graph coloring under different constraints. If FunSearch can find new records for cap sets, could the same approach advance other hard combinatorial problems? The Nature paper suggests yes and outlines a framework for applying it to any problem with a measurable objective function—a broad swath of mathematics and computer science.

Yet a caution tempers the excitement. A program's success doesn't automatically reveal a deeper mathematical truth. The algorithm that generated 512 points in an n=8 cap set might be a clever but specific trick, a local insight that doesn't generalize. Or it might embody a fundamental principle waiting to be formulated and proved. The next chapter belongs to human mathematicians. FunSearch narrows the search space; it doesn't close the argument. The real work—understanding why the algorithm works, proving upper and lower bounds, connecting the program's logic to deeper mathematical structures—still rests with human minds.

"FunSearch didn't just break a record. It handed mathematicians a new tool and said: here is how to build the thing. Now figure out why it works."

Lisa Pedrosa

The cap set problem remains open. The maximum cap set for n=8 might be 512—or it might be larger. FunSearch gave mathematics a new landmark to aim for, a new record to study. The question that shaped combinatorics for fifty years has acquired a new form: not just "what is the maximum?" but "why does this algorithm achieve 512, and what does that tell us about the structure of the problem?" Mathematics has always worked this way—each discovery opens new questions—but now those discoveries can come from a collaboration between human curiosity and algorithmic creativity.

Sources

1. Romera-Paredes et al. "Mathematical discoveries from program search with large language models." Nature (2023). https://www.nature.com/articles/s41586-023-06924-6

2. DeepMind Blog. "FunSearch: Making new discoveries in mathematical sciences using Large Language Models." https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/

3. GitHub. "google-deepmind/funsearch." https://github.com/google-deepmind/funsearch

4. Ellenberg, J., Gijswijt, D. "On large subsets of F_q^n with no three-term arithmetic progression." Annals of Mathematics (2017). https://annals.math.princeton.edu/2017/185-1/p07

5. Tao, T. "Open question: The cap set conjecture." Blog post. https://terrytao.wordpress.com/2007/02/23/open-question-the-cap-set-conjecture/

6. Quanta Magazine. "A tight new bound on the cap set problem." https://www.quantamagazine.org/a-tight-new-bound-on-the-cap-set-problem-20161004/

7. Semantic Scholar. "FunSearch paper summary." https://www.semanticscholar.org/paper/Mathematical-discoveries-from-program-search-with-Romera-Paredes-Barekatain/d32ba88571141ed0ebe7aeefbaa4ccaf8cda7be3

All articles cited to primary institutional or peer-reviewed sources