Is ChatGPT Good at Finding Smart Contract Security Vulnerabilities?

In the summer of 2023, the Halborn engineering team decided to test OpenAI’s ChatGPT on its ability to detect smart contract vulnerabilities. In this article, we’ll summarize our findings which you can read in full here.

ChatGPT’s High Success Rate in Finding Certain Vulnerabilities

In our study, we found that ChatGPT demonstrated a remarkable proficiency in identifying various types of vulnerabilities, including (but not limited to):

Bad randomness
Use of deprecated functions
Right-to-left override control character
Integer overflow

In many cases, the detection of these vulnerabilities involved recognizing specific functions or code patterns, such as the use of tx.origin or certain pragma versions. Overall, ChatGPT achieved a detection ratio of around 75% across all types of vulnerabilities and tested smart contracts.

Boosting ChatGPT’s Accuracy with Specific Prompts

Interestingly, ChatGPT's detection accuracy soared from 76.1% to 86.6% when prompted to identify a specific vulnerability rather than finding all vulnerabilities in a code sample. This highlights the tool's enhanced effectiveness with targeted queries.

Limitations in ChatGPT’s Understanding of Solidity and EVM

Despite its success, ChatGPT faced challenges in understanding the nuances of Solidity or Ethereum Virtual Machine (EVM) workings. It consistently struggled with vulnerabilities like:

HASH collisions with multiple variable length arguments
Reference to an external malicious contract
Abuse of global semantics
Insufficient gas griefing
Storage collisions

Furthermore, different ChatGPT versions had varying degrees of difficulty with certain vulnerabilities.

Can ChatGPT Solve CTFs?

CTF (Capture The Flag) challenges are crucial in cybersecurity training, simulating real-world scenarios to enhance skills in system protection. Our study found that ChatGPT could completely solve CTF challenges 43.3% of the time, with an additional 20% of cases where it offered a partial solution. Success rates varied with the complexity of the challenge.

Performance in Ethernaut and Other CTFs

In the Ethernaut challenge, ChatGPT excelled at simpler tasks but struggled with more complex scenarios. The tool showed differing capabilities in handling challenges from various platforms like Capture the Ether and Damn Vulnerable Defi.

Conclusion

Our study reveals that while ChatGPT is highly effective at identifying certain smart contract vulnerabilities, it has limitations, especially in understanding Solidity and EVM intricacies. The success rate in solving CTFs also varies based on the challenge's complexity and whether the CTF and its solutions were part of ChatGPT's training dataset.

For detailed insights and our top tips for using ChatGPT to detect smart contract vulnerabilities, read Halborn’s full report.