
Paul Van Discusses Emerging Threats and Vulnerabilities in AI Security
🎬 The video features Paul Van, CEO of Valyia, discussing emerging threats and vulnerabilities in AI security, particularly targeting large language models (LLMs). Key risks include prompt injection and jailbreak attacks, which OpenAI has suggested may be unsolvable due to their similarity to social engineering, as well as distillation attacks—where adversaries replicate frontier models like ChatGPT or Gemini by extracting their reasoning and reward models through millions of requests, enabling IP theft and the removal of safeguards. Bias in AI models was highlighted as a vulnerability, exemplified by an antivirus tool misclassifying malicious code as safe due to training data containing Rocket League game code. Valyia is pivoting from deepfake detection to focus on detecting novel attacks like distillation and behavioral analysis of jailbreaks, aiming to monitor model responses rather than just inputs. The conversation also touched on the rise of AI agents (e.g., OpenClaw) and the challenges of securing them, including supply chain attacks and the limitations of using LLMs to classify malicious content. The discussion concluded that AI security is still in a "problem discovery phase," with knowledge gaps in the broader cybersecurity community hindering progress toward making AI a net positive for security.