• Dapps:16.23K
  • Blockchains:78
  • Active users:66.47M
  • 30d volume:$303.26B
  • 30d transactions:$879.24M

The Fresh ARC-AGI-2 Test: A Challenge for Modern AI Models

user avatar

by Giorgi Kostiuk

3 months ago


The new ARC-AGI-2 test is presented as a significant challenge for AI models, testing their true adaptability and efficiency.

Why is ARC-AGI-2 Tougher for AI Models?

According to the Arc Prize Foundation's blog, reasoning models like OpenAI’s o1-pro and DeepSeek’s R1 are barely scoring between 1% and 1.3% on ARC-AGI-2. Non-reasoning models, including GPT-4.5 and Claude 3.7 Sonnet, also hover around 1%. Human scores, however, average at 60% accuracy.

The test includes: - Visual puzzle tasks requiring generating correct grids based on patterns. - Adaptability that requires solving unique problems not encountered during training. - An efficiency metric evaluating not just correctness, but the path to the solution.

ARC-AGI-2 vs. ARC-AGI-1: Key Changes

François Chollet stated that compared to its predecessor, ARC-AGI-1, ARC-AGI-2 is superior in measuring a model’s true intelligence. The main difference is the shift from over-reliance on brute-force computation towards genuine intelligence and efficiency. An example is OpenAI’s o3 (low), which scored highly on ARC-AGI-1 but dropped significantly on ARC-AGI-2.

Importance of the New AI Benchmark

The ARC-AGI-2 test is a timely innovation, as there was a growing demand for more credible and novel benchmarks for evaluating AI's progress, especially in creativity. ARC-AGI-2 addresses identified shortcomings, contributing to a more accurate assessment of AI's capabilities.

The ARC-AGI-2 test represents a significant shift in the ability to measure and understand artificial general intelligence, emphasizing the ongoing challenges AI models face in reaching human-level intelligence.

0

Share

Other news

Tether Launches 1 Billion USDT to Ensure Liquidity

Tether has minted 1 billion USDT on TRON to prepare for potential increases in demand for its cryptocurrency.

user avatarGiorgi Kostiuk

5 minutes ago

Iran’s Decision on Strait of Hormuz Closure: Impact on Global Markets

The Iranian parliament has approved the potential closure of the Strait of Hormuz. Implications for global energy markets are anticipated, yet no official statements have emerged.

user avatarGiorgi Kostiuk

6 minutes ago

Arthur Hayes on Bitcoin: $250,000 Prediction by 2028

Arthur Hayes predicts Bitcoin growth to $250,000 by 2028 despite current market fluctuations.

user avatarGiorgi Kostiuk

6 minutes ago

Saylor Hints at New Bitcoin Purchases Despite Lawsuits

Michael Saylor hinted at potential Bitcoin buys while his company faces lawsuits over billion-dollar losses.

user avatarGiorgi Kostiuk

6 minutes ago

Texas Becomes First State to Use Public Funds for Bitcoin Holdings

Texas Governor Greg Abbott signs SB21 establishing a Bitcoin reserve, marking a pioneering use of public funds for cryptocurrency.

user avatarGiorgi Kostiuk

18 minutes ago

Bitcoin Recovers Above $100K Amid Geopolitical Tensions

After falling below $98.5K, Bitcoin price shows signs of recovery. Trader interest is rising amid escalating conflict in Iran.

user avatarGiorgi Kostiuk

18 minutes ago

dapp expert logo
© 2020-2025. DappExpert. All rights reserved.
© 2020-2025. DappExpert. All rights reserved.

Important disclaimer: The information presented on the Dapp.Expert portal is intended solely for informational purposes and does not constitute an investment recommendation or a guide to action in the field of cryptocurrencies. The Dapp.Expert team is not responsible for any potential losses or missed profits associated with the use of materials published on the site. Before making investment decisions in cryptocurrencies, we recommend consulting a qualified financial advisor.