
New Video from @Computerphile Explains the Birthday Paradox and Hash Collisions
The video begins with an introduction to the birthday paradox, a concept that, although seemingly distant from cryptography and hash functions, is actually very relevant. The birthday paradox states that in a group of people, the probability that two of them share the same birthday is much higher than one might intuitively think. This principle is used to illustrate the probability of hash collisions, where two different messages produce the same hash. To understand this concept, it is essential to review what a hash function is. A hash function takes a message of any length and produces a hash of fixed length, typically 128, 256, or 512 bits. This hash appears random but is determined by the original message. Hash functions are used for various applications, such as secure password storage and creating digital signatures. For example, a password can be hashed and stored as a hash, making it difficult to retrieve the original password from the hash. Similarly, digital signatures use hashes to verify the integrity and authenticity of documents. The pigeonhole principle is then explained to illustrate why hash collisions are inevitable. If you have more possible messages than possible hashes, it is mathematically certain that some messages will produce the same hash. Although the probability of collision is very low for large hash functions like 256 bits, it becomes more significant for smaller hash functions, like 128 bits. The video then explores the practical implications of hash collisions. For example, if a hash function has a sufficiently small output size, it becomes possible to find collisions, which could allow for the forgery of digital signatures. A contrived example is given where two house sale contracts with different prices could have the same hash, allowing one party to deceive the other. Although this example is theoretical, it illustrates the potential risks of hash collisions. The video concludes by discussing practical measures to manage hash collisions. Although collisions are inevitable, they are extremely rare for large hash functions. Modern systems use 256-bit or larger hash functions to minimize this risk. However, even with very low probabilities, collisions can still occur and cause problems, as illustrated by the example of Google producing two PDFs with the same SHA-1 hash, which caused issues on GitHub. In summary, the video provides an in-depth understanding of the birthday paradox and its application to hash functions, highlighting the importance of designing robust hash functions to avoid collisions and ensure the security of digital systems.