Hashing in Cybersecurity
Hashing transforms data of any size into a fixed-length output called a hash value or digest, creating a unique digital fingerprint for that data. This one-way cryptographic technique ensures data integrity and security—even changing a single bit in the input produces a completely different hash. Hash functions are essential for password storage, digital signatures, file verification, and blockchain technology.
Key Points
- Hash functions are one-way: you cannot reverse a hash to get the original data
- The avalanche effect ensures tiny input changes produce drastically different outputs
- Data is processed in fixed-size blocks (typically 512 bits) with padding to ensure proper alignment
- MD5 and SHA-1 are cryptographically broken and must never be used for security
- SHA-256 remains secure for most applications but requires proper implementation
- Use HMAC to prevent length extension attacks when authentication is needed
- For password hashing, use specialized functions like bcrypt, scrypt, or Argon2
How Hash Functions Work
Block-Based Processing
Hash functions divide input data into fixed-size blocks rather than processing entire messages at once. Each block undergoes multiple rounds of mathematical transformations.
| Hash Function | Block Size | Output Size | Status |
|---|---|---|---|
| MD5 | 512 bits | 128 bits | Broken |
| SHA-1 | 512 bits | 160 bits | Broken |
| SHA-256 | 512 bits | 256 bits | Secure |
| SHA-3-256 | 1088 bits | 256 bits | Secure |
Internal State Registers
Hash functions maintain internal registers that accumulate and transform data throughout the hashing process:
| Hash Function | Number of Registers | Total Internal State |
|---|---|---|
| MD5 | 4 registers | 128 bits |
| SHA-1 | 5 registers | 160 bits |
| SHA-256 | 8 registers | 256 bits |
These registers update with each processed block, creating the avalanche effect where changing one input bit affects approximately 50% of the output bits.
The Padding Process
Padding ensures messages align to the required block size and protects data integrity.
Why Padding Is Critical
Alignment: Messages must be exactly divisible into the required block size (512 bits for SHA-256).
Integrity Protection: The padding scheme includes the original message length, preventing attackers from appending data without detection.
How Padding Works
- Append a
1bit immediately after the message ends - Add
0bits until reaching the required length - Reserve the final 64 bits to store the original message length in binary
Padding Example
For a 72-bit message using SHA-256:
Original message: 72 bits
Add '1' bit: + 1 bit = 73 bits
Add '0' bits: + 375 bits = 448 bits
Add length (72): + 64 bits = 512 bits (complete block)
This ensures every message, regardless of original size, processes correctly through the hash function.
SHA-256 Step-by-Step
Step 1: Convert to Binary
Each character in your message converts to 8 bits. A 9-character message becomes 72 bits of binary data.
Step 2: Apply Padding
The 72-bit message expands to a complete 512-bit block using the padding process described above.
Step 3: Divide Into Words
The 512-bit block splits into 16 words of 32 bits each (W[0] through W[15]).
These 16 words expand to 64 words (W[0] through W[63]) using bitwise rotations, logical shifts, and XOR operations.
Step 4: Initialize Working Variables
SHA-256 uses 8 predefined constants (H0 through H7) derived from the square roots of the first 8 prime numbers. These copy into working variables a, b, c, d, e, f, g, h.
Step 5: Execute 64 Rounds
Each round performs complex operations:
- Uses word
W[i]and constantK[i] - Calculates logical functions:
- Ch (choice):
(e AND f) XOR (NOT e AND g) - Maj (majority):
(a AND b) XOR (a AND c) XOR (b AND c) - Σ0 and Σ1: Complex rotation and XOR operations
- Ch (choice):
- Computes temporary values and updates all working variables
This intensive bit mixing ensures the avalanche effect.
Step 6: Generate Final Hash
After 64 rounds, the working variables are added back to the hash values:
H0 = H0 + a
H1 = H1 + b
H2 = H2 + c
...
H7 = H7 + h
Concatenating H0 through H7 produces the final 256-bit hash, typically displayed as a 64-character hexadecimal string.
Comparing Hash Functions
MD5 (Message Digest 5)
Status: Cryptographically broken
- Block size: 512 bits
- Output: 128 bits
- Rounds: 64
Critical vulnerability: Collision attacks demonstrated in 2004 allow attackers to create two different inputs with identical hashes. Never use MD5 for security purposes.
SHA-1 (Secure Hash Algorithm 1)
Status: Deprecated
- Block size: 512 bits
- Output: 160 bits
- Rounds: 80
Critical vulnerability: Practical collision attacks demonstrated in 2017 (SHAttered attack). Major browsers and security standards have deprecated SHA-1.
SHA-256 (Secure Hash Algorithm 256)
Status: Secure and recommended
- Block size: 512 bits
- Output: 256 bits
- Rounds: 64
Recommended for current use. Be aware of length extension attacks and use HMAC when message authentication is required.
SHA-3
Status: Secure and modern
- Uses a different construction (Keccak sponge function)
- Not vulnerable to length extension attacks
- Available in multiple output sizes (SHA-3-224, SHA-3-256, SHA-3-384, SHA-3-512)
Security Considerations
The Avalanche Effect
A secure hash function exhibits the avalanche effect: changing a single bit in the input changes approximately 50% of the output bits. This property provides three critical security features:
- Pre-image resistance: Computationally infeasible to find the original input from a hash
- Collision resistance: Computationally infeasible to find two inputs with the same hash
- Second pre-image resistance: Computationally infeasible to modify an input while maintaining the same hash
Length Extension Attacks
SHA-256 and other Merkle-Damgård construction hashes are vulnerable to length extension attacks when used improperly:
If you know: hash(secret || message)
You can compute: hash(secret || message || additional_data)
Without knowing the secret!
Solution: Use HMAC instead of simple concatenation:
HMAC(key, message) = hash((key ⊕ opad) || hash((key ⊕ ipad) || message))
HMAC prevents length extension attacks by processing the key through two separate hash operations with different padding.
Best Practices
- Never use MD5 or SHA-1 for security-critical applications
- Use SHA-256 or SHA-3 for general cryptographic purposes
- Implement HMAC when message authentication is required
- Use specialized password hashing functions (bcrypt, scrypt, or Argon2) instead of general-purpose hash functions
- Add unique salts to password hashes to prevent rainbow table attacks
- Keep hash functions updated as cryptographic standards evolve
Common Use Cases
Data Integrity Verification
Hash functions verify file integrity during downloads or transfers:
Original file hash: a3f5b8c9d2