Automated Proof of a Benign Nonconvex Landscape Theorem

View on GitHub

At the start of 2026, I noticed that LLM agents are becoming more capable at solving math problems, and began to use them for theorem proving. Early April, I ran an automated theorem proving experiment on a nonconvex optimization problem from sum-of-squares optimization, an extension of of my PhD work.

The goal was to give an agent a formal theorem statement in Lean, a computational search toolkit, and a verification harness, then let it autonomously work through the whole cycle that my coauthors and I previously went through: search for counterexamples, infer structure, write a mathematical blueprint, formalize the proof, and keep going until Lean accepted the final theorem.

Over roughly three days of continuous work, the agent produced about 58k lines of Lean, 2k lines of Julia, 7.5k lines of LaTeX, and more than 200 commits before finally proving the theorem.

Read more

How to Keep Codex Running Indefinitely

I’ve been using OpenAI’s Codex for programmatically verifible tasks that can take a long time to complete, such as formalizing a proof in Lean, or using autoresearch to iteratively improve ML models. Codex tends to stop at reasonable-looking milestones even when the next step is obvious, which can be a pain when I want it to run unattended. A Stop hook that injects a continuation prompt fixes it.

Read more