This week saw a heated exchange between an OpenAI employee and Elon Musk’s xAI project regarding Grok 3 benchmarks.
The Benchmark Battleground
The core of the dispute lies in Grok 3 benchmarks. xAI claimed its model outperformed others based on AIME 2025, but OpenAI pointed out missing data in these claims.
Importance of 'cons@64' in AI Benchmarks
'cons@64' can significantly inflate a model's scores, underscoring the importance of considering this method in model comparisons. It allows models to achieve their best performance after multiple attempts.
Decoding Grok 3's Benchmark Data
Initial performance analysis shows Grok 3 falls behind OpenAI’s model on AIME 2025. Despite this, xAI is marketing Grok 3 as the world’s smartest AI.
The Grok 3 benchmark dispute highlights the need for transparency and standards in the AI industry, reminding us of the importance of scrutiny in both AI and crypto.