Cybersecurity: Length Extension Attack

A length extension attack exploits a structural weakness in certain cryptographic hash functions, allowing attackers to append data to a message and generate a valid hash without knowing the original message or secret key. This vulnerability affects popular algorithms like MD5, SHA-1, and SHA-256 due to their Merkle-Damgård construction. The attack compromises message integrity and authentication, making modified messages appear authentic.

Key Points

Vulnerable algorithms: MD5, SHA-1, SHA-256, and SHA-512 are susceptible due to Merkle-Damgård construction
Attack requirements: Original hash value, message length (or estimate), and knowledge of padding rules
Attack mechanism: Uses the final hash as an internal state to continue hashing with appended data
Security impact: Compromises integrity and authentication, not confidentiality
Primary defense: Use HMAC (Hash-based Message Authentication Code) or SHA-3

How the Vulnerability Works

The Merkle-Damgård construction processes data in sequential blocks, maintaining an internal state that passes from one block to the next. The final hash output is simply the internal state after processing the last block.

This design creates an exploitable weakness:

The final hash reveals the internal state of the hash function
This state can be used as a starting point to continue hashing
An attacker can append new data and produce a valid hash
No knowledge of the original message or secret key is required

Critical Insight: The hash function cannot distinguish between a genuine message and one that has been extended using this technique.

Attack Prerequisites

Hash of the Original Message

The attacker must obtain the hash value, which represents the final internal state after processing the original message. This is often publicly available or transmitted over networks.

Message Length

The attacker needs to know or estimate the length of the original message. This can be:

Determined from application behavior or API responses
Guessed based on common patterns (e.g., standard token formats)
Brute-forced if the range is limited

Padding Rules

Hash functions apply standardized padding to messages. For MD5, SHA-1, and SHA-256, these padding rules are publicly documented and predictable, making them easy to reconstruct.

Step-by-Step Attack Process

Obtain the original hash - Capture the hash representing the internal state
Reconstruct the padding - Apply the same padding rules the hash function used
Initialize with the hash - Use the obtained hash as the starting internal state
Append malicious data - Add the attacker's chosen content after the padding
Continue hashing - Process the new data blocks using the hash function's algorithm
Generate valid hash - Produce a legitimate hash for the extended message

Practical Example

Legitimate Scenario

Message: secret_key + "user=alice&role=user"
Hash: abc123def456...

The application calculates a hash to verify message integrity and authenticity.

Attack Scenario

Original: secret_key + "user=alice&role=user"
Original Hash: abc123def456...

Attacker reconstructs:
secret_key + "user=alice&role=user" + [padding]

Attacker appends: "&role=admin"

Extended message:
secret_key + "user=alice&role=user" + [padding] + "&role=admin"

New valid hash: xyz789ghi012...

Result: The system accepts the modified message as authentic because the hash validates correctly, even though the attacker never knew the secret_key. The attacker has successfully escalated privileges from user to admin.

Why This Attack Succeeds

The vulnerability exists because of four key factors:

Exposed internal state: The hash output directly reveals the internal state
Stateless continuation: Nothing prevents using this state to process additional blocks
Predictable padding: Standard padding rules are publicly known and easily reconstructed
No finalization: The hash function doesn't cryptographically "seal" the final state

The hash function treats these two scenarios identically:

A message that naturally ends at a certain point
A message that has been artificially extended after that point

Vulnerable Hash Functions

Hash Function	Vulnerability Status	Notes
`MD5`	Highly vulnerable	Also broken for collision resistance; deprecated
`SHA-1`	Highly vulnerable	Structurally weak; phased out in modern systems
`SHA-256`	Vulnerable	Cryptographically strong but still susceptible
`SHA-512`	Vulnerable	Same Merkle-Damgård weakness as SHA-256
`SHA-3`	Not vulnerable	Uses Keccak sponge construction

Important: Even cryptographically strong functions like SHA-256 remain vulnerable to length extension attacks when used for message authentication without proper construction like HMAC.

Mitigation Strategies

Use HMAC (Recommended)

HMAC completely prevents length extension attacks through its double-hashing construction:

HMAC(K, M) = H((K ⊕ opad) || H((K ⊕ ipad) || M))

Why HMAC works:

The internal state after processing the message is not the final output
The secret key is processed both before and after the message
The nested structure prevents attackers from using the output to continue hashing
Standardized and widely supported across programming languages

Alternative Approaches

Use SHA-3: Based on the Keccak sponge construction, SHA-3 is inherently immune to length extension attacks due to its fundamentally different design.

Use authenticated encryption: Employ schemes like AES-GCM or ChaCha20-Poly1305 that provide both encryption and authentication.

Append key at the end: Use H(M || K) instead of H(K || M). This is less secure than HMAC and not recommended for production systems.

Real-World Analogy

Imagine a document with a wax seal:

Secure system: The seal covers the entire document edge, making additions obvious.

Length extension attack:

You see only the final seal impression (the hash)
You can add pages to the document
You can recreate an identical seal for the extended document
The recipient believes all pages are original

The recipient has no way to distinguish between the original sealed document and your extended version because the seal itself provides the information needed to continue sealing.

Security Implications

This attack compromises:

Message integrity: Modified messages appear unaltered and valid
Authentication: Forged messages seem to come from legitimate sources
Authorization: Attackers can escalate privileges or modify permissions
API security: Signed API requests can be manipulated to perform unauthorized actions

Note: This is not a confidentiality attack. The original message content remains unknown to the attacker. The vulnerability affects integrity and authentication only.

Common Vulnerable Scenarios

API authentication: Systems using hash(secret + request_data) for API signatures

Cookie signing: Web applications using hash(secret + cookie_value) to prevent tampering

Token generation: Authentication tokens created with hash(secret + user_data)

File integrity: Systems verifying file integrity with hash(key + file_content)

Summary

Length extension attacks exploit the Merkle-Damgård construction by using the hash output as an internal state to continue processing additional data. While MD5, SHA-1, SHA-256, and SHA-512 are vulnerable when used directly for message authentication, proper use of HMAC completely neutralizes this threat. Modern applications should always use HMAC, SHA-3, or authenticated encryption schemes rather than raw hash functions for message authentication purposes.