
New Video from @BlackHatOfficialYT: Ellen Scott on Exploiting LLMs
The video begins with an introduction by Furer, a member of the Black Hat review committee, who presents a conference by Ellen Scott on the exploitation of LLMs (Large Language Models). Ellen Scott, who works for Airbnb, is known for her expertise in security and incident response. The conference focuses on chatbots powered by generative AI, their architectures, associated threats and defenses, and how to respond to incidents involving these technologies.
Ellen Scott starts by explaining how chatbots have become ubiquitous in our online experience, used both internally as IT assistants and externally as 24/7 support agents. However, these chatbots can sometimes cause problems, as shown by the example of the New York City chatbot that responded inappropriately to a question about selling human meat. Another example is a car dealership chatbot that agreed to sell a Chevy Tahoe for $1.
The conference then addresses the different levels of risk associated with chatbots. Low-risk chatbots provide general information, medium-risk chatbots have access to personalized information, and high-risk chatbots can perform actions or have "agency." Incidents involving low-risk chatbots mainly concern brand damage, while medium and high-risk chatbots can lead to leaks of sensitive data or remote code execution.
Ellen Scott presents three incident scenarios to illustrate the challenges and solutions in responding to incidents involving chatbots. The first scenario involves a weather chatbot that starts providing themed responses about Taylor Swift. To solve this problem, Scott explains the importance of logging user inputs, chatbot outputs, and guardrail metrics. She also introduces the concept of "guard rails," which are defense mechanisms to guide chatbot behavior.
The second scenario involves an event planning chatbot that executes malicious Python code injected by a user. Scott explains the concept of "prompt injection," where untrusted user inputs are concatenated with trusted prompts, allowing the execution of malicious code. She emphasizes the importance of never trusting user inputs and using external guardrails like "LLM judges" to evaluate inputs and outputs.
The third scenario involves a medical chatbot that discloses sensitive data due to poorly anonymized training data. Scott introduces the concept of a "model inversion attack," where an attacker can obtain sensitive information by asking repeated questions. She emphasizes the importance of cleaning training data and understanding the external data sources used by chatbots.
In conclusion, Ellen Scott proposes an action plan to prepare for and respond to incidents involving chatbots powered by generative AI. She stresses the importance of understanding the capabilities and data access of chatbots, implementing adequate logging, and using guardrails to contain and mitigate incidents.
To apply this knowledge in real-world scenarios, it is crucial to collaborate with chatbot engineering teams, legal teams, and public relations teams, and to establish robust logging and monitoring processes. By understanding the risks and potential attacks, security teams can better prepare to respond to incidents involving chatbots.