Thumbnail for Mathematicians STUNNED as o4-mini answers the world's hardest math problems...

Mathematicians STUNNED as o4-mini answers the world's hardest math problems...

Channel: Wes RothPublished: June 9th, 2025AI Score: 100
48.8K1.5K38520:53

AI Generated Summary

Airdroplet AI v0.2

Alright, let's dive into what's happening in the world of AI and math, because it's pretty wild! This video unpacks the stunning capabilities of new AI models, particularly OpenAI's O4 Mini, in solving incredibly difficult math problems, and how this is pushing the boundaries of what we thought AI could do. It really makes you think about how our relationship with AI is evolving from simple tools to genuine collaborators, and possibly even independent discoverers.

Here's the lowdown:

  • The "Secret Math Meeting" – AI vs. Human Brains: Imagine 30 of the world's best mathematicians gathering secretly to try and stump an AI. They were absolutely stunned to find O4 Mini could solve some of the hardest solvable problems. It wasn't just solving them; it was showing its reasoning in real-time, even tackling open questions in number theory that would be PhD-level for a human. For instance, one mathematician, Ken Ono, watched in silent amazement as O4 Mini unfurled a solution in minutes, first researching related literature, then trying a simpler "toy version" of the problem, and finally presenting a correct (and apparently "sassy") solution.
  • O4 Mini's "Sass" and Confidence: The AI wasn't just smart; it had attitude! Its solutions came with confident, almost cheeky remarks, like "no citation necessary because the mystery number was computed by me." This led to a concern about how much trust humans might place in AI, especially since these models, even when wrong, can present information with an air of complete confidence. It means we have to be careful and always verify.
  • Tier 4 Problems and Beyond: The problems O4 Mini was tackling were considered "tier four," meaning they're at the very top of human ability. The discussion at the meeting quickly shifted to "tier five" problems – those even the best mathematicians can't solve. The question now is, what happens when AI becomes superhuman at those? It's clear that the roles of mathematicians are about to change dramatically, possibly shifting from direct problem-solving to overseeing and collaborating with AI.
  • The Frontier Math Benchmark & OpenAI's Involvement: To measure AI's progress in math, a new benchmark called "Frontier Math" was created because AI was already acing existing human-level tests. OpenAI actually commissioned Epic AI, a nonprofit, to create 300 unpublished math questions for this benchmark. This raised some eyebrows, as it means OpenAI had access to much of the data, though 50 questions were kept as a "holdout set" that the models had never seen. This ensures legitimate testing, proving AI can solve truly novel problems.
  • Google DeepMind's AlphaProof/AlphaGeometry - Almost Gold: It's not just OpenAI; Google DeepMind's AlphaProof and AlphaGeometry also showcased incredible math skills, achieving a silver medal in the International Mathematical Olympiad (IMO) – just one point shy of gold! These tests are rigorously fair, with problems kept secret until the competition day.
  • The "Flawed Reasoning" Problem: A mathematician named Jasper, who was at the secret meeting, clarified that they used O4 Mini in its "high thinking mode." While it solved most problems, he noted that sometimes the AI would arrive at the correct numerical answer despite its reasoning being "occasionally incorrect." This is a known challenge in AI training where models are rewarded for correct outputs but the internal logic isn't always fully verifiable or sound. It's a tough problem to fix if you're only reinforcing the final answer and not the intricate steps of reasoning.
  • AI's Strengths and Weaknesses in Math (Currently): While AI excels at gathering relevant literature and drafting initial solutions, it still struggles with deep reasoning, especially when it needs to synthesize complex ideas from different sources into a novel computational method. So, it's not yet generating completely new mathematical theories on its own. Human oversight remains crucial for verification and for pushing the boundaries of true synthesis.
  • The Future: AI as Collaborator and Independent Discoverer: The prediction is that in the next year or two, AI will transition from assisting mathematicians to collaborating with them in discovering new theories and solving open problems. Eventually, it could even work independently to push the frontiers of mathematics and other scientific fields.
  • Recursive Self-Improvement: Alpha Evolve & Darwin Gödel Machine: This is where things get really exciting. Google's Alpha Evolve, powered by Gemini, is already discovering advanced algorithms that optimize Google's own data centers, saving significant resources. It can even improve its own training, hinting at recursive self-improvement. The Darwin Gödel Machine takes this further, acting as a self-improving coding agent that uses an evolutionary search process: it generates many potential solutions, tests them, and builds upon the promising ones, eventually outperforming human-coded agents.
  • The Power of Iteration and Feedback: The presenter thinks the math symposium's results, while impressive, might have been even more "staggering" if they had incorporated the iterative, feedback-driven approach of Alpha Evolve or the Darwin Gödel Machine. These systems can generate thousands of outputs and continuously refine solutions, whereas the symposium likely only tested one output at a time. Automating verification and synthesis steps could supercharge these AI systems, making them incredibly powerful.
  • The "Religion of Justism": The video closes with a powerful point from Scott Aaronson, highlighting the common human tendency to constantly deflate AI's achievements by saying it "just" does X (e.g., "it's just a stochastic parrot," "it's just a next token predictor"). He challenges this by asking, "What are you just a?" reminding us that we, too, could be reduced to a "bundle of neurons and synapses." This "justism" ignores the real-world impact and capabilities of AI, which are already changing civilization.