OpenAI and Paradigm Introduce EVMbench for Ethereum Smart Contract Security

OpenAI, the creator of ChatGPT, has partnered with Paradigm, a firm specializing in cryptocurrency investments, to launch EVMbench, a new tool aimed at enhancing the security of the Ethereum Virtual Machine (EVM) smart contracts. According to the official information, this initiative comes at a time when the deployment of smart contracts on Ethereum has reached unprecedented levels, highlighting the need for robust security measures in the rapidly evolving blockchain landscape.

EVMbench: Assessing AI Capabilities in Smart Contract Security

EVMbench is specifically designed to assess the capabilities of AI agents in detecting, patching, and exploiting high-severity vulnerabilities within EVM smart contracts. Smart contracts are integral to the Ethereum ecosystem, powering various applications from decentralized finance (DeFi) to token launches. According to Token Terminal, the number of smart contracts deployed on Ethereum hit a record 17 million in November 2023, with 669,500 contracts deployed just last week.

Dataset and Collaboration

The tool utilizes a dataset of 120 curated vulnerabilities derived from 40 audits, many of which were sourced from open audit competitions like Code4rena. Additionally, EVMbench incorporates scenarios from the security auditing process of Tempo, Stripe's dedicated layer-1 blockchain aimed at facilitating high-throughput, low-cost stablecoin payments. Stripe launched the public testnet for Tempo in December 2023, collaborating with industry giants such as Visa and Shopify to ensure its development is grounded in real-world economic applications.

Evaluation Modes of EVMbench

EVMbench evaluates AI models across three distinct modes:

Detect
Patch
Exploit

In the Detect phase, agents audit repositories and are scored based on their ability to identify known vulnerabilities. The Patch phase challenges agents to rectify these vulnerabilities while maintaining the intended functionality of the contracts. Finally, in the Exploit phase, agents simulate fund-draining attacks in a controlled blockchain environment, with their performance assessed through deterministic transaction replay.

Performance Insights

In the exploit mode, OpenAI's GPT-3 Codex achieved a score of 722, significantly outperforming GPT-5, which scored 319. However, the performance of AI agents in the Detect and Patch tasks was less impressive, as they occasionally failed to conduct thorough audits or maintain full contract functionality. Researchers from OpenAI acknowledged that while EVMbench may not fully encapsulate the complexities of real-world security, it is essential for measuring AI performance in economically relevant contexts as these models become increasingly influential in both offensive and defensive cybersecurity strategies.

Ethereum has recently made significant advancements in artificial intelligence, positioning itself as a leader in AI development. This initiative complements OpenAI's collaboration with Paradigm on EVMbench, a tool enhancing smart contract security. For more details, see AI development.

OpenAI and Paradigm Introduce EVMbench for Ethereum Smart Contract Security

EVMbench: Assessing AI Capabilities in Smart Contract Security

Dataset and Collaboration

Evaluation Modes of EVMbench

Performance Insights

Rewards

More rewards

Rewards

More rewards

Other news

Dogecoin Approaches Critical Price Level for Potential Trend Reversal

Coinbase's Cryptobacked Loans Offer Liquidity Without Selling

Coinbase Expands Cryptobacked Lending Product in the US