• Dapps:16.23K
  • Blockchains:78
  • Active users:66.47M
  • 30d volume:$303.26B
  • 30d transactions:$879.24M

The Fresh ARC-AGI-2 Test: A Challenge for Modern AI Models

user avatar

by Giorgi Kostiuk

3 days ago


The new ARC-AGI-2 test is presented as a significant challenge for AI models, testing their true adaptability and efficiency.

Why is ARC-AGI-2 Tougher for AI Models?

According to the Arc Prize Foundation's blog, reasoning models like OpenAI’s o1-pro and DeepSeek’s R1 are barely scoring between 1% and 1.3% on ARC-AGI-2. Non-reasoning models, including GPT-4.5 and Claude 3.7 Sonnet, also hover around 1%. Human scores, however, average at 60% accuracy.

The test includes: - Visual puzzle tasks requiring generating correct grids based on patterns. - Adaptability that requires solving unique problems not encountered during training. - An efficiency metric evaluating not just correctness, but the path to the solution.

ARC-AGI-2 vs. ARC-AGI-1: Key Changes

François Chollet stated that compared to its predecessor, ARC-AGI-1, ARC-AGI-2 is superior in measuring a model’s true intelligence. The main difference is the shift from over-reliance on brute-force computation towards genuine intelligence and efficiency. An example is OpenAI’s o3 (low), which scored highly on ARC-AGI-1 but dropped significantly on ARC-AGI-2.

Importance of the New AI Benchmark

The ARC-AGI-2 test is a timely innovation, as there was a growing demand for more credible and novel benchmarks for evaluating AI's progress, especially in creativity. ARC-AGI-2 addresses identified shortcomings, contributing to a more accurate assessment of AI's capabilities.

The ARC-AGI-2 test represents a significant shift in the ability to measure and understand artificial general intelligence, emphasizing the ongoing challenges AI models face in reaching human-level intelligence.

0

Share

Other news

OpenAI Attracts $40 Billion from SoftBank

OpenAI's valuation could reach $300 billion thanks to new funding from SoftBank.

user avatarGiorgi Kostiuk

4 minutes ago

SEC Officially Closes Case Against Crypto.com without Charges

SEC closes Crypto.com case without action, possibly boosting market outlook.

user avatarGiorgi Kostiuk

4 minutes ago

SEC Ends Crypto.com Investigation

Crypto.com compliance confirmed: SEC concludes investigation without actions.

user avatarGiorgi Kostiuk

4 minutes ago

Unveiling Ghiblification: Binance's New Token Captures Market

Binance's Ghiblification hits $35M market cap alongside significant trading volume surge.

user avatarGiorgi Kostiuk

5 minutes ago

Overview: Potential of BlockDAG, XRP, Aptos, and Celestia in the Long Term

Analyzing the potential and opportunities for long-term crypto investments focusing on BlockDAG, XRP, Aptos, and Celestia.

user avatarGiorgi Kostiuk

5 minutes ago

Altcoin Popularity Rising: Dogecoin, BNB, and GateToken

Explore the prospects of Dogecoin, BNB, and GateToken in the context of current market trends.

user avatarGiorgi Kostiuk

6 minutes ago

dapp expert logo
© 2020-2025. DappExpert. All rights reserved.
© 2020-2025. DappExpert. All rights reserved.

Important disclaimer: The information presented on the Dapp.Expert portal is intended solely for informational purposes and does not constitute an investment recommendation or a guide to action in the field of cryptocurrencies. The Dapp.Expert team is not responsible for any potential losses or missed profits associated with the use of materials published on the site. Before making investment decisions in cryptocurrencies, we recommend consulting a qualified financial advisor.