Add some hash to your data! Explore four flavors of one of the key ingredients of effective cybersecurity in this hash algorithm comparison article. Learn about their distinct properties and characteristics, and how they can make a difference to your organization’s data security and privacy compliance
We’re living in a time where the lines between the digital and physical worlds are blurring quickly. Knowing this, it’s more important than ever that every organization secures its sensitive data and other information against cyberattacks and data breaches.
In 2020, 86% of organizations suffered a successful cyberattack, and 69% were compromised by ransomware. But what can do you to protect your data? Secure hashing algorithms are seen as strong and invaluable components against breaches and malware infections.
Image Source: CyberEdge Group 2021 Cyberthreat Defense Report
Which hashing algorithm is the right one for you? There are so many types out there that it has become difficult to select the appropriate one for each task. That’s why we’re going to present you with the following:
- A side-by-side comparison of the most well-known or common hash algorithms,
- A more detailed overview of their key characteristics, and
- Our perspective regarding their strengths and weaknesses.
The Ultimate Hash Algorithm Comparison: MD5 vs. SHA-1 vs. SHA-2 vs. SHA-3
Before we start, let’s define what a hash algorithm is in a few simple words: A hash is a one-way mathematical function (i.e., it can’t be reverse engineered) that converts the input into an unreadable data string output of a set length. For example, you could take the phrase “you are my sunshine” and an entire library of books and apply a hash algorithm to each — both will result in an output of the same size. Hashing functions are largely used to validate the integrity of data and files.
The idea of hashing was firstly introduced by Hans Peter Luhn in 1953 in his article “A new method of recording and searching information” Many things have changed since then, and several new algorithms have come to light to help us keep pace with rapidly changing technologies. Today, there are so many hash algorithms that it can sometimes be confusing or overwhelming!
This is where our hash algorithm comparison article comes into play. We’ve rounded up the best-known algorithms to date to help you understand their ins and out, and clarify your doubts, in a breeze. Let’s start with a quick overview of these popular hash functions.
Hash Algorithm Comparison Table: MD5, SHA-1, SHA-2, SHA-3
|Keys for Comparison||MD5||SHA-1||SHA-2 (224 & 256/384 & 512)||SHA-3 (224/256/384/512)|
|Block Size||512 bits||512 bits||512/1024 bits|
32/576 bits (this is referred to as a Rate [R] for SHA-3 algorithms)
|Hash Digest Size (Output)||128 bits (i.e., 16 bytes), or 32 hexadecimal digits||160 bits (i.e., 20 bytes), or 40 hexadecimal digits||256 bits (i.e., 32 bytes), or 64 hexadecimal digits/512 bits (i.e., 64 bytes), or 128 hexadecimal digits||224/256/384/512 bits (i.e., 28/32/48/64 bytes), or 56/64/96/128 hexadecimal digits|
|Rounds of Operations||64||80 (4 groups of 20 rounds)||64 (for SHA-224 and SHA-256)/80 (SHA0384/SHA-512)||24|
|Collision Level||High — They can be found in seconds, even using an ordinary home computer.||Cheap and easy to find as demonstrated by a 2019 study.||Low — No known collisions found to date.||Low|
|Successful Attacks||Many. Researchers showed concrete evidence in 2004.||Yes, many. The first one called SHAttered happened in 2017.||SHA-256 has never been broken.||Few collision type attacks have been demonstrated.|
|Vulnerable to collisions.||Vulnerable to collisions.||Susceptible to preimage attacks.|
|ApplicationsApplications||Previously used for data encryption, MD5 is now mostly used for verifying the integrity of files against involuntary corruption.||Previously widely used in TLS and SSL.Still used for HMAC (even if it’s recommended to move to a more secure algorithm), and for verifying the integrity of files against involuntary corruption.|
Widely used in:
|Used to replace SHA-2 when necessary (in specific circumstances).|
Now, let’s perk it up a bit and have a look to each algorithm in more details to enable you to find out which one is the right one for you.
1. Message-Digest Algorithm 5 (MD5)
One of the oldest algorithms widely used, M5 is a one-way cryptographic function that converts messages of any lengths and returns a string output of a fixed length of 32 characters. An example of an MD5 hash digest output would look like this: b6c7868ea605a8f951a03f284d08415e.
How? In five steps, according to the Internet Internet Engineering Task Force’s (IETF) RFC 1321:
1. Add padding bits to the original message. You’ll extend the total length of the original input so it’s 64 bits short of any multiple of 512 (i.e., 448 mod 512).
2. Add lengths bits to the end of the padded message. With this operation, the total number of bits in the message becomes a multiple of 512 (i.e., 64 bits).
3. Initialize MD buffer to compute the message digest. This algorithm requires a 128 bits buffer with a specific initial value. The buffer is then divided into four words (A, B, C, D), each of which represents a separate 32-bit register.
4. Process each data block using operations in multiple rounds. Each block is processed using four rounds of 16 operations and adding each output to form the new buffer value.
5. Produce a final 128 bits hash value. The final buffer value is the final output.
In summary, the original input is broken up into fixed-sized blocks, then each one is processed through the compression function alongside the output of the prior round. The final output is a 128 bits message digest.
|Pros of MD5||Cons of MD5|
|Easy way to compare and store smaller hashes.||Slower than other algorithms (which can be good in certain applications), but faster than SHA-1.|
|Useful when you have to compare files or codes to identify any types of changes.||Much less secure and vulnerable to collisions.|
|It’s easy to obtain the same hash function for two distinct inputs.|
2. SHA-1 (Secure Hash Algorithm 1)
Developed by the NSA (National Security Age), SHA-1 is one of the several algorithms included under the umbrella of the “secure hash algorithm” family. In a nutshell, it’s a one-way cryptographic function that converts messages of any lengths and returns a 160 bits hash value as a 40 digits long hexadecimal number. This will look along the lines of this: 0aa12c48afc6ff95c43cd3f74259a184c34cde6d.
Its structure is similar to MD5, but the process to get the message-digest is more complex as summarized in the steps listed below:
1. Add padding bits to the original message. You’ll extend the total length of the original input so it’s 64 bits short of any multiple of 512 (i.e., 448 mod 512).
2. Add length bits to the end of the padded message. With this operation, the total number of bits in the message becomes a multiple of 512 (i.e., 64 bits).
3. Initialize MD buffers to compute the message digest. This algorithm requires two buffers and a long sequence of 32-bit words:
- The two five 32-bit registers (“A, B, C, D, E” and “H0, H1, H2, H3, H4”) have specific initial values, and
- The sequence of 80 32-bit words (W, W, W… W, W).
4. Process the message in successive 512 bits blocks. Each block goes through a complex process of expansion and 80 rounds of compression of 20 steps each. The value obtained after each compression is added to the current buffer (hash state).
5. Produce a final 160 bits hash value. After the last block is processed, the current hash state is returned as the final hash value output.
SHA-1 shouldn’t be used for digital signatures or certificates anymore. Theoretically broken since 2005, it was formally deprecated by the National Institute of Standards and Technology (NIST) in 2011. In 2017, SHA-1 was officially broken (SHAttered) by Google’s academics, who managed to produce two files with the same hash.
|Pros of SHA-1||Cons of SHA-1|
|It’s a slow algorithm. This characteristic made it useful for storing password hashes as it slows down brute force attacks.||Slower than other algorithms, therefore unsuitable for many purposes other than password storage (e.g., when establishing secure connections to websites or comparing files).|
|It can be used to compare files or codes to identify unintentional only corruptions.||Less secure with many vulnerabilities found during the years.|
|Can replace SHA-2 in case of interoperability issues with legacy codes.||Collisions are easy and cheap to find.|
|Key length too short to resist to attacks.|
3. SHA-2 (Secure Hash Algorithm 2) Family of Algorithms
The successor of SHA-1, approved and recommended by NIST, SHA-2 is a family of six algorithms with different digest sizes:
- SHA-224 (truncated version of SHA-256 computed with different initial values),
- SHA-384 (truncated version of SHA-512 computed with different initial values),
- SHA-512/224 (truncated version of SHA-512),
- SHA-512/256 (truncated version of SHA-512).
SHA-256 is widely used, particularly by U.S. government agencies to secure their sensitive data. Much stronger than SHA-1, it includes the most secure hashing algorithm available to the time of writing: SHA-256, also used in Bitcoin transactions.
How the SHA-256 Hashing Algorithm Works
As an example, let’s have a look to how the most used algorithm of the family (SHA-256) works, according to the IETF’s RFC 6234. Once again, the process is similar to its predecessors, but with some added complexity. Here’s a simplified overview:
1. Add padding bits to the original message. You’ll extend the total length of the original input so it’s 64 bits short of any multiple of 512 (i.e., 448 mod 512). So, you’ll need to add a 1 and a bunch of 0s until it equals 448 bits.
2. Add length bits to the end of the padded message. 64 bits are appended to the end of the padded message so that it becomes a multiple of 512.
3. Initialize MD buffer to compute the message digest. The buffer is represented as eight 32-bit registers (A, B, C, D, E, F, G, H).
4. Process the message in successive 512 bits blocks. The message is broken into 512 bits chunks, and each chunk goes through a complex process and 64 rounds of compression. The value obtained after each compression is added to the current hash value.
5. Produce a final 256 bits (or 512) hash value. The final hash value or digest is concatenated (linked together) based on all of the chunk values resulting from the processing step.
For a closer look at the step-by-step process of SHA-256, check out this great article on SHA-256 by Qvault that breaks it all down.
|Pros of SHA-2||Cons of SHA-2|
|It’s resistant to collision, to pre-image and second-preimage attacks.||SHA-256 is slower than its predecessors.|
|It addresses SHA-1’s weaknesses.||Some software may need updating to support SHA-2 encryption.|
|SHA-256 is supported by the latest browsers, OS platforms, mail clients and mobile devices.|
|It generates a longer hash value, offering a higher security level making it the perfect choice for validating and signing digital security certificates and files.|
If you want to know more about the SHA-2 family, check the official SHA-2 standard paper published by NIST
4. SHA-3 (Secure Hash Algorithm 3) Family of Algorithms
SHA-3 is the latest addition to the SHA family. Developed via a public competition promoted by NIST, it’s part of the same standard while being completely different from MD5, SHA-1 and SHA-2.
SHA-3 is based on a new cryptographic approach called sponge construction, used by Keccak. Basically, the data are “absorbed” into the sponge, then the result is “squeezed” out, just like a sponge absorbs and releases water.
It’s important to note that NIST doesn’t see SHA-3 as a full replacement of SHA-2; rather, it’s a way to improve the robustness of its overall hash algorithm toolkit.
SHA-3 is a family of four algorithms with different hash functions plus two extendable output functions can be used for domain hashing, randomized hashing, stream encryption, and to generate MAC addresses:
- SHAKE-128 (output function), and
- SHAKE-256 (output function).
How the SHA3-224 Hashing Algorithm Works
How does it work? We’ll base our example on one member of the SHA-3 family: SHA3-224.
1. Add padding bits to the original message. This way the total length is an exact multiple of the rate of the corresponding hash function. In this case, as we’ve chosen SHA3-224, it must be a multiple of 1152 bits (144 bytes). The SHA-3 process largely falls within two main categories of actions: “absorbing” and “squeezing,” each of which we’ll discuss in the next sections.
2. Absorb the padded message values to start calculating the hash value. The padded message is partitioned into fixed size blocks. Then each block goes through a series of permutation rounds of five operations a total of 24 times. At the end, we get an internal state size of 1600 bits.
3. Squeeze to extract the hash value. This is where the message is extracted (squeezed out). The 1600 bits obtained with the absorption operation is segregated on the basis of the related rate and capacity (the “r” and “c” we mentioned in the image caption above).
4. Produce the final hash value. Finally, the first 224 bits are extracted from the 1152 bits (SHA3-224’s rate). The extracted value of 224 bits is the hash digest of the whole message.
|Pros of SHA-3||Cons of SHA-3|
|It’s flexible as it allows variable lengths for the input and output, making it ideal for a hash function.||Susceptible to collision attacks.|
|Its instances use a single permutation for all security strengths, cutting down implementation costs.||Much slower than SHA-2 (software only issue).|
|The SHA3 family of algorithms enables performance-security trade-offs by choosing the suitable capacity-rate pair.||Lack of hardware and software support.|
|Not vulnerable to length extension attacks.|
|Much faster than its predecessors when cryptography is handled by hardware components.|
More information about the SHA-3 family is included in the official SHA-3 standard paper published by NIST.
We hope that this hash algorithm comparison article gives you a better understanding of these important functions. This way, you can choose the best tools to enhance your data protection level.
Hashing has become an essential component of cybersecurity and is used nearly everywhere. From digital signatures to password storage, from signing certificates (for codes, emails, and documents) to SSL/TLS certificates, just to name some. However, no algorithm will last forever, therefore it’s important to be always up to date with the latest trends and hash standards. Thinking about it… what about the future?
The Future of Hash Algorithms
As technology gets more sophisticated, so do the bad guys. And the world is evolving fast. From digital currencies to augmented and virtual reality, from the expansion of IoT to the raise of metaverse and quantum computing, these new ecosystems will need highly secure hash algorithms.
This means that the question now isn’t if we’ll still need hash algorithms; rather, it’s about whether the actual secure algorithms (e.g., SHA-256, still unbroken) will withstand future challenges.
Take quantum computing for example: with its high computational power and speed, it’s easy to figure out that sooner or later a quantum computer large enough may compromise today’s best hash algorithms. However, there is good news regarding the impact of quantum computing on hash algorithms:
- Quantum computing isn’t here yet,
- Quantum computing is thought to impact public key encryption algorithms (not hashing algorithms like SHA-256 or SHA-3), and
- SHA-256 is thought to be quantum resistant.
To be on the safe side, though, NIST is already gearing up for the migration to post-quantum cryptography.
Final Thoughts on Hash Algorithm Comparison MD5, SHA-1, SHA-2 & SHA-3
While it will be obviously impossible for organizations to go back and keep the digital and physical worlds separated, there are ways to address the challenging threats coming from quickly evolving technology and this new, hybrid world.
Hash algorithms aren’t the only security solution you should have in your organization’s defense arsenal. However, they’re certainly an essential part of it. We hope that this hash algorithm comparison has helped you to better understand the secure hash algorithm world and identify the best hash functions for you.
No hash algorithm is perfect, but they’re constantly being improved to keep up with the attacks from increasingly sophisticated threat actors. Don’t waste any more time — include the right hash algorithms in your security strategy and implementations. Future proof your data and organization now!