
Black Hat 2025 Talk: Development of Dante, a Specialized LLM for Evasive Shellcode Loaders
The video presents a Black Hat 2025 talk by Kyle, an offensive security researcher at Outflank (Fortra), detailing the development of Dante, a 7-billion-parameter specialized large language model (LLM) trained to generate evasive shellcode loaders. Kyle explains the limitations of generalist LLMs (e.g., OpenAI, DeepSeek) for technical tasks like malware development, citing cost, privacy concerns, and refusal behaviors, while smaller open-source models lack performance. He introduces Reinforcement Learning with Verifiable Rewards (RLVR), a technique using automated verifiers (e.g., sandboxed execution with Microsoft Defender for Endpoint) to train models via trial-and-error, rewarding functional and evasive outputs. Dante was trained in two phases: supervised fine-tuning (13 hours on 8 H100 GPUs, ~$250) using modified Codeforces datasets and synthetic shellcode examples, followed by RLVR (56 hours, ~$1,100), achieving ~30% evasion success with 8 attempts. The model outperformed larger generalist LLMs (e.g., DeepSeek R1, Claude Sonnet) in generating functional, undetected malware while running locally on consumer GPUs like the RTX 3090/4090. Key takeaways include the viability of small, task-specific LLMs and the effectiveness of RLVR for technical domains with objective verification criteria. The project is open-source, with a demo framework emulating traditional shellcode loader generation.