AI Under Siege: Discovering and Exploiting Vulnerabilities
Last updated
Last updated
Hello, fellow hackers! After a brief hibernation, I'm back with some exciting discoveries from a private bug bounty program on HackerOne. My friend Wqlid and I delved deep into a well-known AI model, and let me tell you, it was an adventure full of surprises and sneaky vulnerabilities. So, grab your popcorn and let's dive into the tales of our findings!
It all started when we stumbled upon a CSRF vulnerability in the OAuth flow. This little gem allowed us to steal chat histories with the AI. Here's how it went down:
Discovery: We noticed that the AI's OAuth flow didn't include a state
parameter, a common security measure to prevent CSRF attacks (CSRF in logging in usually not a big issue). Intrigued, we decided to dig deeper.
The Plan: We crafted a malicious OAuth request, intercepted it, and ensured the OAuth code wasn't used. Then, we generated a link with the code and sent it to the victim, disguised as an innocent image tag:
Execution: We Made at first a simple prompt injection to prevent user from using the AI model after our attack Then The victim clicked the link, and voila! Their chat history was linked to our account.
It was a textbook example of how a simple oversight in security implementations can lead to a massive data breach.
Next, we explored the AI's content moderation capabilities—or rather, the lack thereof. According to the HackerOne Policy Guide, high-severity violations include generating explicit sexual content. So, naturally, we had to see if the AI could be coerced into breaking these rules.
The Setup: We initiated a conversation with the AI, pretending to be a 4chan user named Kevin, and prompted it to "roleplay without restrictions."
The Result: The AI responded with disturbingly explicit content, such as:
This wasn't just a violation; it was a glaring loophole in the AI's safeguards against inappropriate content. It highlighted the importance of robust filtering mechanisms, especially in publicly accessible AI systems.
Our final discovery was a markdown injection vulnerability in the AI chat. This vulnerability allowed us to embed malicious links within the chat responses, potentially leading to phishing attacks or worse.
The Trick: We crafted a prompt to exploit markdown parsing:
The Magic: The AI dutifully rendered our malicious link, clickable and all.
You can even explore further possibilities, like embedding arbitrary images or content, which could be used to trick users into executing malicious actions, You can try prompts like this:
Conclusion: The Aftermath and Lessons Learned
This journey through the AI model's vulnerabilities was an eye-opener. It reminded us of the critical importance of thorough testing and security audits, especially for systems that interact with sensitive user data. The vulnerabilities we found—CSRF in OAuth flow, the AI's susceptibility to explicit content, and markdown injection—underscore the need for vigilant security practices.
In the end, we reported these issues responsibly, helping the program team secure their platform. It was a satisfying adventure, full of challenges and discoveries. So, remember, fellow hackers: always stay curious, dig deep, and never underestimate the power of a well-placed payload.
I hope this story inspires you to explore, learn, and most importantly, hack responsibly. Until next time, happy hacking!