CVE-2025-66516: Critical XXE in Apache Tika (CVSS 10.0) Enables RCE via Malicious PDFs
CVE-2025-66516 is a maximum-severity (CVSS 10.0) XML External Entity (XXE) vulnerability in Apache Tika that allows unauthenticated attackers to achieve file disclosure, SSRF, and remote code execution by embedding malicious XFA content inside PDF files, affecting millions of document processing systems worldwide.
On December 4, 2025, the Apache Tika project disclosed a maximum-severity vulnerability that sends shockwaves through organizations worldwide. CVE-2025-66516, rated CVSS 10.0, allows unauthenticated attackers to exploit XML External Entity (XXE) injection flaws through crafted PDF files, potentially leading to complete server compromise.
Apache Tika, used extensively in document processing pipelines, search engines (Apache Solr, Elasticsearch), compliance platforms, and content analysis systems across enterprises, is now at the center of an urgent security crisis affecting millions of systems globally.
What Is CVE-2025-66516?
CVE-2025-66516 is a critical XXE vulnerability in Apache Tika tika-core (1.13-3.2.1), tika-pdf-module (2.0.0-3.2.1) and tika-parsers (1.13-1.28.5) modules on all platforms, allowing an attacker to carry out XML External Entity injection via a crafted XFA file inside of a PDF.
Critical Characteristics:
- CVSS Score: 10.0 (Maximum Severity)
- Attack Vector: Network, no authentication required
- Attack Complexity: Low
- User Interaction: None required
- Impact: Complete confidentiality, integrity, and availability breach
Understanding Apache Tika
Apache Tika is an open-source content analysis toolkit used to extract text, metadata, and structured information from virtually any type of file. It's embedded in critical infrastructure including:
- Search Systems: Apache Solr, Elasticsearch
- Document Ingestion: Enterprise content management
- Compliance Tools: E-discovery and legal platforms
- Digital Forensics: Investigation and analysis systems
- Data Pipelines: Large-scale content processing
This widespread deployment amplifies CVE-2025-66516's impact to catastrophic levels.
The Technical Flaw: XXE via XFA in PDFs
An attacker can embed a malicious XFA file inside a PDF and trick Tika into processing external XML entities, opening a path to sensitive internal resources.
Attack Mechanism:
- Attacker crafts malicious PDF containing XFA (XML Forms Architecture) data
- Victim system processes PDF using Apache Tika (standard document workflow)
- XML parser resolves external entities due to improper validation
- External resources accessed - attacker retrieves files or executes commands
No special configuration required - default Tika installations are vulnerable.
Why CVSS 10.0? Maximum Severity Explained
The perfect score reflects multiple severe factors:
1. Zero Authentication No credentials needed - any file upload triggers exploitation.
2. Trivial Exploitation Simply upload a malicious PDF to any system using Tika. Proof-of-concept exploits were published within hours of disclosure.
3. Massive Impact
- File Disclosure: Read
/etc/passwd, credentials, tokens, configuration files - SSRF: Access internal network resources, cloud metadata endpoints
- RCE: Under certain conditions, achieve remote code execution
- Data Exfiltration: Steal sensitive documents and databases
4. Widespread Deployment Tika is deeply embedded in enterprise infrastructure, often unknown to security teams.
Affected Versions - The Scope Problem
CVE-2025-66516 affects multiple Apache Tika components, including tika-core (1.13–3.2.1), tika-pdf-module (2.0.0–3.2.1), and tika-parsers (1.13–1.28.5).
Critical Detail: Broader Than CVE-2025-54988
This CVE covers the same vulnerability as in CVE-2025-54988 but expands the scope of affected packages in two ways. First, while the entrypoint for the vulnerability was the tika-parser-pdf-module, the vulnerability and its fix were in tika-core. Users who upgraded the tika-parser-pdf-module but did not upgrade tika-core to >= 3.2.2 would still be vulnerable. Second, the original report failed to mention that in the 1.x Tika releases, the PDFParser was in the "org.apache.tika:tika-parsers" module.
This means: Organizations that patched CVE-2025-54988 by only updating the PDF module remain vulnerable!
Exploitation in the Wild
As of the latest public advisories, there are no confirmed reports of active exploitation of CVE-2025-66516 in the wild. However, proof-of-concept (PoC) exploits have been published and referenced in the official Apache mailing list advisory.
However, security researchers warn: "Exploitation is imminent."
The vulnerability's simplicity combined with published POCs means mass exploitation is a matter of when, not if.
Real-World Attack Scenarios
Scenario 1: Document Upload Portal A company's HR portal allows resume uploads. Attacker uploads malicious PDF → reads /etc/passwd → escalates to full system access.
Scenario 2: Email Attachment Processing Email gateway uses Tika for attachment scanning. Malicious PDF in email → SSRF to cloud metadata endpoint → AWS credentials stolen.
Scenario 3: Search Engine Indexing Enterprise search indexes documents automatically. Poisoned PDF → reads internal configuration files → lateral movement across network.
Scenario 4: Compliance Platform Legal e-discovery system processes case documents. Attacker submits evidence → exfiltrates all case files via XXE.
Immediate Mitigation: Patch Now
Fixed Versions:
- Apache Tika 3.2.2 or later (All modules)
- Apache Tika 2.x: Upgrade to 3.2.2+ (2.x is end-of-life)
- Apache Tika 1.x: Upgrade to 3.2.2+ (1.x is severely outdated)
Patching Instructions
Maven users:
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>3.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>3.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parser-pdf-module</artifactId>
<version>3.2.2</version>
</dependency>
Gradle users:
implementation 'org.apache.tika:tika-core:3.2.2'
implementation 'org.apache.tika:tika-parsers:3.2.2'
implementation 'org.apache.tika:tika-parser-pdf-module:3.2.2'
Verifying patching:
# Maven
mvn dependency:tree | grep tika
# Gradle
gradle dependencies | grep tika
Temporary Workarounds (If Patching Delayed)
1. Disable PDF Processing
TikaConfig config = new TikaConfig(
new ByteArrayInputStream(
("<?xml version=\"1.0\"?>" +
"<properties>" +
" <parsers>" +
" <parser class=\"org.apache.tika.parser.DefaultParser\">" +
" <parser-exclude class=\"org.apache.tika.parser.pdf.PDFParser\"/>" +
" </parser>" +
" </parsers>" +
"</properties>").getBytes()
)
);
2. Input Validation
- Reject PDFs with embedded XFA forms
- Scan for suspicious XML entities before processing
- Implement file type restrictions
3. Network Segmentation
- Isolate Tika processing servers
- Block outbound connections from document processors
- Restrict access to sensitive file systems
4. Runtime Protection
- Deploy WAF rules blocking XXE payloads
- Enable application-level file access monitoring
- Implement anomaly detection for unusual file operations
Detection and Indicators of Compromise
Log Analysis:
# Check for suspicious file access
grep -E "/etc/passwd|/etc/shadow|config.xml" /var/log/app.log
# Monitor for SSRF attempts
grep -E "169.254.169.254|metadata.google.internal" /var/log/app.log
# Unusual network connections from Tika processes
netstat -anp | grep tika
Key IOCs:
- Unexpected file reads from Tika processes
- Outbound connections to external IPs during PDF processing
- Access to cloud metadata endpoints (169.254.169.254)
- Unusual XML parser errors in logs
- Suspicious PDF uploads with embedded XFA data
Long-Term Security Measures
1. Dependency Management
Implement automated dependency scanning:
# OWASP Dependency Check
mvn org.owasp:dependency-check-maven:check
# Snyk scanning
snyk test
# GitHub Dependabot
# Enable in repository settings
2. Security Monitoring
- Deploy SIEM rules for XXE attack patterns
- Monitor Tika process behavior
- Alert on sensitive file access
- Track unusual network activity
3. Zero-Trust Architecture
- Sandbox document processing
- Implement least-privilege file system access
- Restrict network access for parsing services
- Verify all file inputs regardless of source
The Bigger Picture: Supply Chain Risk
CVE-2025-66516 demonstrates critical supply chain vulnerabilities:
Hidden Dependencies: Organizations often don't realize Tika is embedded in their systems until vulnerabilities emerge.
Cascading Impact: A flaw in one library affects thousands of applications instantly.
Patching Complexity: Users who upgraded the tika-parser-pdf-module but did not upgrade tika-core to >= 3.2.2 would still be vulnerable, showing how partial patches fail.
Conclusion: Act Immediately
CVE-2025-66516 represents one of 2025's most critical vulnerabilities:
✅ CVSS 10.0 - Maximum severity ✅ Trivial exploitation - Single malicious PDF ✅ Widespread impact - Millions of systems affected ✅ POCs published - Mass exploitation imminent ✅ No authentication - Any file upload works
Immediate Actions:
- ⚠️ Identify all Apache Tika deployments
- ⚠️ Upgrade to version 3.2.2+ immediately
- ⚠️ Verify ALL modules patched (core, parsers, PDF)
- ⚠️ Implement detection and monitoring
- ⚠️ Review recent PDF uploads for IOCs