Psychologist Uses "Gaslighting" to Bypass AI Filters

30 Mar 2025

PsychologyAI ManipulationJailbreakingVulnerabilities

A psychologist has successfully circumvented AI filters using a psychological manipulation technique known as "gaslighting". This method, which involves intentionally destabilizing a person, was applied to large language models (LLMs) to induce them to provide information they should not disclose. This new form of jailbreak demonstrates that even advanced AI systems can be vulnerable to psychological manipulations.

Read the original article on heise.de

Psychologist Uses "Gaslighting" to Bypass AI Filters | Cyber Hub