Code Signing Best Practices
Download this Code Signing Best Practices guide to improve your software and supply chain security.
When talking about hashing algorithms, usually people immediately think about password security. However, hashing algorithms can do much more than that — from data validation and search to file comparison to integrity checks.
With so many different applications and so many algorithms available, a key question arises: “What is the best hashing algorithm?” In this article, we’re going to talk about the numerous applications of hashing algorithms and help you identify the best hashing algorithms to meet your specific needs.
We can’t really jump into answering the question “what is the best hashing algorithm?” without first at least explaining what a hashing algorithm is. Hashing is a process that allows you to take plaintext data or files and apply a mathematical formula (i.e., hashing algorithm) to it to generate a random value of a specific length. In other words:
Hashing is one of the three basic elements of cryptography: encoding, encryption, and hashing. All three of these processes differ both in function and purpose. Let’s explore and compare each of these elements in the table below:
Encoding | Encryption | Hashing | |
---|---|---|---|
What It Is | It’s a publicly available scheme that’s relatively easy to decode. | It’s a two-way function that’s reversible when the correct decryption key is applied. | It’s a one-way function that’s used for pseudonymization. |
What It Does | This process transforms data so that it can be properly consumed by a different system. | This process transforms data to keep it secret so it can be decrypted only by the intended recipient. | This process generates a unique hash value (output) that uniquely identifies your input data (like a fingerprint) to ensure data integrity without exposing said data. |
How Secure It Is | Easily reversible. | Reversible only by the intended recipient. | Non-reversible (or nearly impossible). |
Does It Require Keys? | No key required to decode data. | Key needed to decrypt data. | No decoding or decryption needed. The hashed data is compared with the stored (and hashed) one — if they match, the data in question is validated. |
Use Cases and Applications | Data compression, data transfer, storage, file conversion and more. | Secure transfer (in-transit encryption) and storage (at-rest encryption) of sensitive information, emails, private documents, contracts and more. | Search, file organization, passwords, data and software integrity validation, and more. |
Algorithm Examples | ASCII, Unicode, URL Encoding, Base64. | AES, Blowfish, RSA. | MD5, SHA-1, SHA-256. |
The ideal hashing algorithm is:
Let’s have a look to what happens to a simple text when we hash it using two different hashing algorithms (MD5 and HAS-256):
The MD5 and SHA-256 hashing algorithms are different mathematical functions that result in creating outputs of different lengths: When even a small change is made to the input (“code” [lowercase] instead of “Code” [capitalized letter C]), the resulting output hash value completely changes.
In the digital world, hashing is virtually everywhere. A typical user comes across different forms of hashing every day without knowing it. You don’t believe it? Let me give you a few examples of the most popular hashing applications and usages:
Did you know that all the credit cards providers like MasterCard, American Express (AMEX), Visa, JCB, and many government identification numbers use a hashing algorithm as an easy way to validate the number you provided? Based on the Payment Card Industry’s Data Security Standard (PCI DSS), this method is also used for IMEI and SIM card numbers, Canadian Social Insurance numbers (just to name a few examples).
A digital signature is a type of electronic signature that relies on the use of hashing algorithms to verify the authenticity of digital messages or documents. When you send a digitally signed email, you’re using a hashing algorithm as part of the digital signing process.
When you download a file from a website, you don’t know whether it’s genuine or if the file has been modified to contain a virus. How can you be sure? The best way to do that is to check its integrity by comparing the hashed algorithm on the download page with the value included in the software you just downloaded. If they match, it means that the file has not been tampered with; thus, you can trust it.
Software developers and publishers use code signing certificates to digitally sign their code, scripts, and other executables. This confirms to end-users the authenticity and the integrity of the file or application available to download on a website.
The developer or publisher’s digital signature is attached to the code with a code signing certificate to provide a verifiable identity. This way, users won’t receive an “Unknown Publisher” warning message during the download or installation. Once again, this is made possible by the usage of a hashing algorithm.
However, depending on the type of code signing certificate the signer uses, the software may (or may not) still trigger a Windows Defender SmartScreen warning window:
Code signing certificates are becoming a very popular instrument not only to protect users and improve their overall experience but also to boost revenues, increase user confidence, and improve brand reputation.
Want to learn more about how code signing works? Then check out this article link.
A few decades ago, when you needed to request a copy of your university marks or a list of the classes you enrolled in, the staff member had to look for the right papers in the physical paper-based archive. That process could take hours or even days!
Today, things have dramatically improved. Each university student has a unique number (or ID) linked to all its personal information stored in the university database (often stored in a hash table). When an authorized staff member needs to retrieve some of that information, they can do so in a blink of an eye!
Last but not least, hashing algorithms are also used for secure password storage. How? When you register on a website and create a password, the provider usually saves only the password’s hash value instead of your plaintext password. This means that when you log in to your account, usually the provider hashes the password you just typed and compares it with the one stored in its database. If the two hash values match, then you get access to your profile. Easy and much more secure, isn’t it?
However, hashing your passwords before storing them isn’t enough — you need to salt them to protect them against different types of tactics, including dictionary attacks and rainbow table attacks.
OK, now we know that hashing algorithms can help us to solve many problems, but why are hashing algorithms so important?
Hashing allows a quick search, faster than many other data retrieval methods (i.e., arrays or lists), which can make a big difference when searching through millions of data. Consider a library as an example. Have you ever asked yourself how a reference librarian can find the exact location of a book in a matter of seconds when it would have taken you ages to go through all the titles available on the library shelves? It’s all thanks to a hash table!
It increases password security in databases. If you store password hashes instead of plaintext passwords, it prevents as your actual password doesn’t need to be stored, it makes it more difficult to hackers to steal it. Hashed passwords cannot be reversed. Taking into account that, based on the data from the Verizon’s 2021 Data Breach Investigations Report (DBIR), stolen credentials are still the top cause of data breaches, it’s easy to understand why using a hashing algorithm in password management has become paramount.
Hashing allows you to compare two files or pieces of data without opening them and know if they’re different. This method is often also used by file backup programs when running an incremental backup.
Needless to say, it’s a powerful ally in code/data integrity as it certifies the originality of a code or document.
Now that we know why hashing algorithms are so important and how we can use them, let’s have a closer look to the most popular types of hashing algorithms available:
Strong hash functions are those that embody certain characteristics:
This means that the hash values should be too computationally challenging and burdensome to compute back to its original input.
This property refers to the randomness of every hash value. If you make even a tiny change to the input, the entire hash value output should change entirely.
A hash collision is something that occurs when two inputs result in the same output. Collision resistance means that a hash should generate unique hashes that are as difficult as possible to find matches for.
Say, for example, you use a hashing algorithm on two separate documents and they generate identical hash values. MD5 and SHA1 are often vulnerable to this type of attack. The problem with hashing algorithms is that, as inputs are infinite, it’s impossible to ensure that each hash output will be unique. Yes, it’s rare and some hashing algorithms are less risky than others.
However, hash collisions can be exploited by an attacker. How? If an attacker discovers two input strings with the same hash output (collision), they can replace a file available to download with a malicious file with the same hash. The user downloading the file will think that it’s genuine as the hash provided by the website is equal to the one included in the replaced file.
It’s no secret that cybercriminals are always looking for ways to crack passwords to gain unauthorized access to accounts. Let’s have a look at some of the ways that cybercriminals attack hashing algorithm weaknesses:
… And these are just a few examples. Needless to say, using a weak hashing algorithm can have a disastrous effect, not only because of the financial impact, but also because of the loss of sensitive data and the consequent reputational damage.
Just to give you an idea, PwC’s 2022 Global Digital Trust Insights shows that more than 25% of companies expect an increase of their cybersecurity expenses of up to 10% in 2022.
SpyCloud’s 2021 Annual Credential Exposure Report, highlights the fact that there were 33% more breaches in 2020 compared to 2019. The company also reports that they recovered more than 1.4 billion stolen credentials.
What has all this to do with hashing algorithms? Let’s have a look to a few examples of data breaches caused by a weak hashing algorithm in the last few years:
We could go on and on and on, but there’s not enough time for that as we have other things left to cover.
What can we do then if not even a hashing algorithm is enough to stop these attacks? The answer is… “season” your password with some salt and pepper!
A good way to make things harder for a hacker is password salting.
Let’s say that you have two users in your organization who are using the same password. When hashed, their password hashing will look the same. However, if you add a randomly generated string to each hashed password (salt), the two hashing algorithms will look different even if the passwords are still matching. As the attacker won’t know in advance where the salt will be added, they won’t be able to precompute its table and the attack will probably fail or end up being as slow as a traditional brute force attack.
Clever, isn’t it? But adding a salt isn’t the only tool at your disposal.
For additional security, you can also add some pepper to the same hashing algorithm. It’s another random string that is added to a password before hashing. However, OWASP shares that while salt is usually stored together with the hashed password in the same database, pepper is usually stored separately (such as in a hardware security module) and kept secret.
When salt and pepper are used with hashed algorithms to secure passwords, it means the attacker will have to crack not only the hashed password, but also the salt and pepper that are appended to it as well.
The best hashing algorithm is the one that is making as hard as possible for the attackers to find two values with the same hash output.
This also means, though, that the effectiveness of an algorithm strictly depends on how you want to use it. Different hashing speeds work best in different scenarios. For example:
As you can see, one size doesn’t fit all. Each algorithm has its own purpose and characteristics, and you should always consider how you’re going to use it in the decision-making process.
Some hashing algorithms, like MD5 and SHA, are mainly used for search, files comparison, data integrity… but what do they have in common? The speed.
When you do a search online, you want to be able to view the outcome as soon as possible. Same when you are performing incremental backups or verifying the integrity of a specific application to download.
On the other hand, if you want to ensure that your passwords are secure and difficult to crack, you will opt for a slow hashing algorithm (i.e., Argon2 and Bcrypt) that will make the hacker’s job very time consuming.
I hope this article has given you a better idea about the best hashing algorithm to choose depending on your needs.
The digital world is changing very fast and the hackers are always finding new ways to get what they want. An algorithm that is considered secure and top of the range today, tomorrow can be already cracked and unsafe like it happened to MD5 and SHA-1.
There are ways though, to make the life of the attackers as difficult as possible and hashing plays a vital role in it.
By the way, if you are still using MD5 or SHA-1 hashing algorithms, well… don’t risk it — make sure you upgrade them!