Both companies’ models solved five out of six problems, achieving the result using general-purpose “reasoning” models that processed mathematical concepts using natural language, in contrast to the previous approaches used by AI firms.
While Google DeepMind worked with the IMO to have their models graded and certified by the committee, OpenAI did not officially enter the competition. The startup revealed their models have achieved a gold medal-worthy score on this year’s questions on Saturday, citing grades by three external IMO medalists.
The achievement suggests AI is less than a year away from being used by mathematicians to crack unsolved research problems at the frontier of the field, according to Junehyuk Jung, a math professor at Brown University and visiting researcher in Google’s DeepMind AI unit.
“I think the moment we can solve hard reasoning problems in natural language will enable the potential for collaboration between AI and mathematicians,” Jung told Reuters.
OpenAI’s breakthrough was achieved with a new experimental model centered on massively scaling up “test-time compute.” This was done by both allowing the model to “think” for longer periods and deploying parallel computing power to run numerous lines of reasoning simultaneously, according to Noam Brown, researcher at OpenAI. Brown declined to say how much in computing power it cost OpenAI, but called it “very expensive.”
To OpenAI researchers, it is another clear sign that AI models can command extensive reasoning capabilities that could expand into other areas beyond math.
The optimism is shared by Google researchers, who believe AI models’ capabilities can apply to research quandaries in other fields such as physics, said Jung, who won an IMO gold medal as a student in 2003.
Of the 630 students participating in the 66th IMO on the Sunshine Coast in Queensland, Australia, 67 contestants, or about 11%, achieved gold-medal scores. Google’s DeepMind AI unit last year achieved a silver medal score using AI systems specialized for math. This year, Google used a general-purpose model called Gemini Deep Think, a version of which was previously unveiled at its annual developer conference in May.
Unlike previous AI attempts that relied on formal languages and lengthy computation, Google’s approach this year operated entirely in natural language and solved the problems within the official 4.5-hour time limit, the company said in a blog post. OpenAI, which has its own set of reasoning models, similarly built an experimental version for the competition, according to a post by researcher Alexander Wei on social media platform X. He noted that the company does not plan to release anything with this level of math capability for several months.
This year marked the first time the competition coordinated officially with some AI developers, who have for years used prominent math competitions like IMO to test model capabilities. IMO judges certified the results of those companies, including Google, and asked them to publish results on July 28.
“We respected the IMO Board’s original request that all AI labs share their results only after the official results had been verified by independent experts and the students had rightly received the acclamation they deserved,” Google DeepMind CEO Demis Hassabis said on X on Monday.
OpenAI, which published its results on Saturday and first claimed gold-medal status, said in an interview that it had permission from an IMO board member to do so after the closing ceremony on Saturday.
The competition on Monday allowed cooperating companies to publish results, Gregor Dolinar, president of IMO’s board, told Reuters