
Researcher Huang Explores Integrating LLMs with Static Analysis Tools for Enhanced Vulnerability Detection
The presentation by Huang, a researcher transitioning from browser and cloud security to AI applications, explores integrating large language models (LLMs) with static analysis tools (SAS) to improve vulnerability detection. Three integration approaches are outlined: AI-enhanced design (LLMs filter SAS outputs), AI-explorer design (LLMs lead exploration while SAS verifies results), and AI-native design (LLMs act as the scanner itself). Key challenges include false positives in SAS outputs (e.g., misapplied control flow rules like "strictly dominate"), high costs of one-by-one LLM analysis, and coverage gaps due to LLM hallucinations or knowledge limitations. The talk introduces a closed-loop framework for optimizing CodeQL rules using LLMs, reducing data pollution by constraining agent behavior and isolating contexts, achieving a 3x recall rate improvement. For runtime efficiency, a "pass segmentation" strategy abstracts code at the block level, merging similar findings to reduce redundancy, with an 80% catch rate demonstrated. Human auditors’ domain knowledge is incorporated via "coot" (concerning operation) items, transforming static analysis into a human-in-the-loop system for root cause analysis. The work, open-sourced for further zero-day discovery, emphasizes balancing LLM capabilities with rule relaxation to enhance precision and scalability.