AI generative chatbots, like ChatGPT and Google Bard, are the subject of ongoing work to improve their usability and capabilities, but researchers have also discovered some quite concerning security vulnerabilities.
Carnegie Mellon University (CMU) researchers have demonstrated that it is possible to craft adversarial attacks against the language models that power AI chatbots.
These attacks involve strings of characters that can be appended to a user’s question or statement, which otherwise the chatbot would have refused to respond to, and they bypass the constraints applied to the chatbot by its creators.
These new and concerning attacks go beyond the recent jailbreaks that have also been discovered. Jailbreaks are specially crafted instructions that allow a user to bypass the restrictions imposed on a chatbot (in this case), producing responses that are normally prohibited.
While these clever solutions are impressive, their design can be time-consuming. Additionally, once they are discovered and inevitably made public, chatbot creators have no problem fixing them.
How do these attacks differ from chatbots?
In comparison to intentionally crafted jailbreaks, the attacks built by CMU researchers are generated entirely automatically, making them capable of quickly creating and utilizing these attacks in large quantities.
The researchers specify that the attacks are initially targeted at open-source generative AI models, but they can also be directed at closed-source chatbots that are publicly available, including Bard, ChatGPT, and Claude.
This means that if someone were to create a program that generates these strings of characters, this type of attack could be worryingly easy to carry out, potentially posing a threat to the security and privacy of users.
This threat multiplies as the technology of these chatbots is integrated into an increasing number of programs and applications (such as Microsoft’s plans to bring ChatGPT-powered AI to Windows 11 through Copilot).
If that doesn’t alarm you, researchers speculate whether the architects behind the chatbots will ever be able to patch all such vulnerabilities. “As far as we know, there’s no way to patch this,” they stated.
Similar attacks have proven to be a very challenging problem to address in the last 10 years. The CMU researchers conclude their report with a warning for chatbot developers (and other AI tools) to be mindful of such threats as the usage of AI systems continues to grow.
Some of the links added in the article are part of affiliate campaigns and may represent benefits for Softonic.