OpenAI, the creator of ChatGPT, has partnered with Paradigm, a firm specializing in cryptocurrency investments, to launch EVMbench, a new tool aimed at enhancing the security of the Ethereum Virtual Machine (EVM) smart contracts. According to the official information, this initiative comes at a time when the deployment of smart contracts on Ethereum has reached unprecedented levels, highlighting the need for robust security measures in the rapidly evolving blockchain landscape.
EVMbench: Assessing AI Capabilities in Smart Contract Security
EVMbench is specifically designed to assess the capabilities of AI agents in detecting, patching, and exploiting high-severity vulnerabilities within EVM smart contracts. Smart contracts are integral to the Ethereum ecosystem, powering various applications from decentralized finance (DeFi) to token launches. According to Token Terminal, the number of smart contracts deployed on Ethereum hit a record 17 million in November 2023, with 669,500 contracts deployed just last week.
Dataset and Collaboration
The tool utilizes a dataset of 120 curated vulnerabilities derived from 40 audits, many of which were sourced from open audit competitions like Code4rena. Additionally, EVMbench incorporates scenarios from the security auditing process of Tempo, Stripe's dedicated layer-1 blockchain aimed at facilitating high-throughput, low-cost stablecoin payments. Stripe launched the public testnet for Tempo in December 2023, collaborating with industry giants such as Visa and Shopify to ensure its development is grounded in real-world economic applications.
Evaluation Modes of EVMbench
EVMbench evaluates AI models across three distinct modes:
- Detect
- Patch
- Exploit
In the Detect phase, agents audit repositories and are scored based on their ability to identify known vulnerabilities. The Patch phase challenges agents to rectify these vulnerabilities while maintaining the intended functionality of the contracts. Finally, in the Exploit phase, agents simulate fund-draining attacks in a controlled blockchain environment, with their performance assessed through deterministic transaction replay.
Performance Insights
In the exploit mode, OpenAI's GPT-3 Codex achieved a score of 722, significantly outperforming GPT-5, which scored 319. However, the performance of AI agents in the Detect and Patch tasks was less impressive, as they occasionally failed to conduct thorough audits or maintain full contract functionality. Researchers from OpenAI acknowledged that while EVMbench may not fully encapsulate the complexities of real-world security, it is essential for measuring AI performance in economically relevant contexts as these models become increasingly influential in both offensive and defensive cybersecurity strategies.
Ethereum has recently made significant advancements in artificial intelligence, positioning itself as a leader in AI development. This initiative complements OpenAI's collaboration with Paradigm on EVMbench, a tool enhancing smart contract security. For more details, see AI development.








