How Does Hashing Work? A Look at One-Way Cryptographic Functions
7ec70902b313dd171b99870be4e2253a. Do you want to know how a hash function transformed the foreword to OWASP Mobile application security verification standard (MASVS) book in that gibberish, short line? Find it out in simple terms, and learn how the same technique can secure your organization’s sensitive data from snoopers
Privitar’s cybersecurity report reveals that 24% of interviewed U.S. customers have stopped doing business with (or now do less business) with companies that suffered data breaches. This devastating blow to an organization’s reputation and customer relationships could happen to any organization that fails to protect its customers’ data — including yours. But what can you do to help make your data more secure and minimize your risk?
Hash functions are invaluable weapons in any company’s security arsenal. From password protection to secure authentication and data message integrity, nearly everybody uses them for data security in one form or another. However, even if you’ve heard of it, you may not know how hashing really works. That’s where we come in.
Discover in simple terms how hash functions work — their applications, peculiarities, and how they can transform any message into a secret, irreversible code. Ready to unveil the arcane? Let’s get the ball rolling!
What’s a Hash Function? A Brief Overview
A cryptographic hash function transforms any input of any length into a unique, fixed-size output (hash or digest). Consider the steps for cooking a hash brown: you take the ingredients, chop them and mix them all together to get your shredded fried potato. This gives you an output of a specific size (i.e., so many servings). This same concept applies to data when you use a hash function. Hashing takes all the data, mixes them up, grinds ‘em down, and produces a small data string of a specific size.
No matter how large the input, the received output will always be the same size. This is how we managed to compress the complete foreword to OWASP’s book into only 32 characters. Amazing, right?
In this case, we’ve used the message digest (MD) 5 hash function for the sake of ease and brevity. But there are many other hash functions out there, each of which returns a different fixed-length value depending on various factors. Some of them have shorter outputs while others will return 68 character-long digests; some hash functions are also more secure than others:
- Message digest 5 (MD5). It produces a 32-hexadecimal character (128 bit) hash value. It’s an old algorithm that’s susceptible to vulnerabilities, but it’s still used in some instances to compute checksums to verify the integrity of signed executables.
- Secure hashing algorithm 1 (SHA-1). It produces a 40-hexadecimal digit (160 bit) digest. Vulnerable to brute force attacks, it’s now deprecated.
- Secure hashing algorithm 256 (SHA-256). It’s part of the SHA2 family of hash functions, but it’s much more secure than its predecessor. Among its applications, we find digital signatures, blockchain, and password hashing. Its output is made of 64 hex characters (256 bits).
And there are more beyond this narrow list. Because hash functions are just like hash browns recipes: there are many different ones to choose from; you just have to pick and choose the tried-and-true ones you like the most and that best serve your specific needs.
So, we now know that hashing changes your original input — like a password, for example — into a gibberish code. Hum… isn’t hashing the same as encryption? Not quite. Let’s find out why.
The Properties of Hash Functions
We know that it can be confusing, but there is a fundamental difference between hashing and encryption. True, they’re both cryptographic functions, but encryption is a two-way function (encrypting and decrypting), whereas hashing is a one-way function. This means that encrypted data can be decrypted by the right person (i.e., the secret key holder), enabling you to recover the original data. However, with strong hash functions, it’s virtually impossible to reverse the process. So, this means once you hash data, you can’t get back to the original input. Why? Because strong hash functions are:
- Virtually irreversible. Yup! They’re one-way functions. Thus, you can’t simply revert them back into their inputs. Well, theoretically, you could if you had millions of years and resources at your disposal, but it’s basically impossible, as demonstrated in this entertaining video by Matthew Weathers. You still don’t believe it? Try to convert back the digest I used as an example at the beginning of this article. Or, if you want to get even more practical, try to get your whole potato back once you’ve chopped it to prepare your hash brown.
- Collision resistant. Collisions happen, but with strong hash functions, there’s usually a very little chance that two different inputs end up having the same hash digest (output). In other words, every input should generate a unique output.
- Reasonably fast. Some hashing algorithms are fast; others are slow, but different speed algorithms serve different needs depending on the usage. Think about website connections: you’ll need a fast-hashing algorithm to avoid users waiting for too long for a website to load. But if you use the hashing algorithm to protect your passwords, you’ll definitely want to go for a slower one to slow down attackers’ brute force attempts.
- Deterministic. No matter the size of the input, the output size will remain unchanged. Not convinced? Let’s try it out. Write a sentence, copy and paste it into your favorite hash function converter (I used one of those available on GIT), and check how many digits the output has. Next, take a longer sentence, run it through the same algorithm and check the output size. If you did everything correctly, you’ll notice that both outputs will be of the same length. Oh, man! Wouldn’t that be cool if it would work with food too? No matter how much you eat, you’d always keep the same waist size and won’t have to listen to another lecture from your doctor. That would be amazing!
- Very sensible to small changes. This is often described as the “avalanche effect.” Even the smallest change to the input will result in a completely different output. Look what happens to the digest of the OWASP foreword example if I add a colon after the word “Foreword”:
What a shocking difference, eh? Now that you know what hashing is, let’s move on to the juiciest part of this article: we’ll explore, step by step, how a simple input of any size (some text, a photo, a song, you name it) gives a fixed-length hexadecimal output.
How Does a Hash Function Work?
OK, now that we know what hashing is and what it does, it’s time to look at how hashing works. To make things easier to understand, why don’t we do a little experiment and transform an input of choice into a fixed-length output? For this demonstration, we’ve selected one of the most famous quotes from one of my favorite books, George Orwell’s dystopian novel, 1984: “War Is Peace, Freedom Is Slavery, Ignorance Is Strength”
Before we start going into the process itself, though, we must bear in mind that any digital content is made up of a series of zeros and ones when represented in binary. This binary code (i.e., computer code) is the language your computer translates your input into to process it. After that, through a series of calculations through the hash function, your input then changes into the gibberish hexadecimal character codes we talked about earlier. Let’s go back to the process now.
1. The original input is divided into smaller blocks, all equal in size. If some blocks don’t have enough data to reach the same size as the others, additional 1s or 0s are added to fill the gap (this is called padding).
2. Each individual block is run through a hash function. The complexity of the mathematical algorithm depends on the chosen hash function and each operation results in a hash value output.
3. The resulting output is your gibberish hash. In this case, as we’ve chosen to use the SHA-256 algorithm, our
“War Is Peace, Freedom Is Slavery, Ignorance Is Strength”
has become: 6104cd1b85af07b7687b7f642700bb0925eb8ad3e5cfa8495a306440a0559ad3
Hint: If I would have used MD5, the resulting output would have been:
9532a15d7ff1e95e2572e52b49c0f05e
or, with SHA-1: 7826309f4eb49876496a71a32f62d27f9f26fc84
Of course, these are just examples to help you understand how hashing works. For instance, the process would have been slightly different in the case of password hashing. And this takes us to our next point.
How Secure Is a Hash Function?
Are hash functions 100% secure against attacks? Not really. And, if you’ve worked in IT security long enough, you’ll know that no matter what you do or what you come up with, nothing is totally hacker-proof. Ever.
Hash functions, like everything else in the digital world, can be broken. A weak hash algorithm can enable an attacker to find out its original input purely from the digest (preimage attack). Or, it could facilitate to the identification of another input that produces the same digest (second preimage attack). It happened to the now-deprecated MD5 and SHA-1, for example. How?
- Brute force attacks. The hacker tries any combination of characters till he finds the right one, et voila! Your password is gone.
- Rainbow table. Often used by password cracking software, this massive lookup table contains a list of common password-hash value combinations that the attacker uses to compare with those included in the hacked database.
- Birthday attack. Do you remember when we talked about collisions? This is based exactly on the chance that two different inputs produce the same output. Yes, collisions are usually rare, but it’s game over once the attacker finds one.
How can you avoid all this? Make sure you use a strong hash function, less susceptible to collisions, and that has all the characteristics we’ve listed at the beginning of this article. For example, the still-unbroken SHA-256 could be a good choice for many hashing applications.
If you want to protect your passwords, you should use a slower hashing algorithm like Argon2id, bcrypt or PBKDF2. you should also add salt and pepper to them (i.e., random values appended to the password before hashing). Yup! You read it right. Just like you add these spices to your hash brown to make it taste better, using both a salt and pepper in cryptography also inserts additional layers to your security. But how does salting and peppering work when it comes to data? You basically attach a randomly generated string to the password during the hashing process (salt) and, another random string before hashing the password (pepper). This one will be stored separately in a hardware security module (HSM) for extra security.
Adding salt and pepper to your stored password hashes will make the password hacking process much more labor-intensive and time-consuming, putting off potential attackers.
Now that you know how hashing works and how you can beef up its security, let’s see why we need it.
What Do We Use Hashing For?
Since their invention in the early 50s, hash functions have been used in many different ways and for several crucial purposes.
Password Security
Websites usually don’t store passwords in plain text. This way if they get hacked, the attacker won’t get access to your password. So how do they store them? They hash them and store only the password hash value, not the plaintext password itself.
- When you register an account on a website, you type in a username and password.
- The website then hashes the password (often after adding salt and pepper) and stores its resulting hash digest in the database.
- Next time you’ll log in, the website will compare the hash of the password you entered with the stored hash. If they match, you’re in.
Hint: When you hash your passwords, always remember to use a strong hash function and don’t forget your salt. You don’t want your organization to be the victim of a data breach like it happened to OpenSubtitles in 2021.
Secure Browsing
Have you ever noticed the small padlock and the HTTPS on your browser’s address bar? These are signs that the website is using a secure socket layer/transport layer security (SSL/TLS) certificate, thus all data transmission is encrypted. As part of the process to establish that secure connection (i.e., the SSL/TLS handshake), hashing is used to authenticate the web server to the user’s client that’s trying to connect to it. It also helps to ensure the integrity of the data by indicating whether the data has been modified. How does it work?
- When you visit an HTTPS website, your browser receives a digitally signed SSL/TLS server certificate (i.e., a leaf certificate).
- The browser decrypts the digital signature and verifies the authenticity of the received hash by comparing it to the one it generated by itself.
- The browser then creates and secures a unique session key, encrypting it prior to sending it to the server via the insecure channel.
- Once the session key has been decrypted and verified by the server, the HTTPS (encrypted) connection is established.
Integrity Checks
How can you be sure that the file you’ve just downloaded hasn’t been changed by a malicious third party (e.g., contains a virus or is corrupted)? Use a checksum! Another useful implementation of hash functions:
- Download the signed code and copy its checksum.
- Generate its checksum from the directory you saved the file on.
- Compare the two checksums. If they match, you’re good to go. If they don’t, either your download failed or, the code has been changed somehow (stay away from it).
You can find an example of the whole checksum verification process in our previous article, “How to Check an MD5 Checksum.” By the way, the same procedure can be used to check for collisions/identical files.
Digital Signatures
Are you confirming your invitation to a job interview and want to make sure that it’s received by the recruiter unchanged? Once again, hash functions come to the rescue. How?
- Once your message is converted into a hash value using your private key, the email is sent to the recruiter together with your digital signature.
- The recruiter’s client generates a hash value of the message on the back end using the same hash function. It then decrypts the message using your public key.
- The client compares the two hashes. If they match, it means your email hasn’t been changed during transmission. You’ll then be able to meet your recruiter at the agreed date and time and, with a bit of luck, get your new job!
In summary, as any tiny change will result in a non-matching digest, strong hash functions can offer a good level of data integrity protection. From signing software to indexing and retrieving items in databases, from verifying data blocks in cryptocurrencies to optimizing caching in the browser, it’s impressive how many different applications hashes have nowadays.
And you? How are you planning to use hashes to protect the integrity of your emails, files, software and other executables?
Final Thoughts on How Does Hashing Work? A Look at One-Way Cryptographic Functions
Woo! You’ve just learned how hashing works and discovered some of the most common and practical uses. This is invaluable knowledge in a time when technology rapidly advances, and the need to find new ways to ensure integrity, confidentiality and authenticity of data has never been greater.
Now that you know the techniques and security mechanisms behind it, put these hashing algorithms to work within your organization and help protect your software products’ integrity. But don’t forget to make these highly versatile, one-way cryptographic functions part of your organization’s security strategy.
Incorporating secure hash functions into your data security processes helps you:
- Comply with privacy regulations,
- Protect the integrity of your sensitive data,
- Make your customers feel safer knowing your products came from you, and
- Make it too time-consuming and resource-intensive to try to fake your authentic messages and data.
What more could you want? You’ve learned it, let’s hash it!