Hash functions
You can find hash functions in use in lots of places, most notably in password verification.
When you enter your password on an online website, a hash of the password is computed which is then sent to the server for verification of the password. The passwords stored on the server are actually computed hash values of the original passwords. This is to prevent attackers accessing original passwords when they are sent from client to server.
These hash functions are also used in the creation of message authentication codes and digital signatures. To recap, they work by taking a data block of any length – for example an email message – and processing it through the hashing algorithm. The result is a number which is called a hash, or a message digest.
The hash has a fixed length which depends entirely on which algorithm has been used to create it. For example, if the same algorithm was used to hash a 500-byte email and then a document of 50 MB, the resultant hashes would be exactly the same size in bits. However, the actual value will be different; and that’s how each file on the system, even with a single bit difference, can result in an entirely different hash.
A good hash function uses a one-way hashing algorithm, or in other words, the hash cannot be converted back into the original key.
Types of hash function include:
- Message Digest 5 (MD5): which has a length of 128-bits and is considered cryptographically weak
- SHA or SHA-1: which has a digest length of 160-bits
- SHA-2: includes a number hash functions, with digests from 224 to 512 bits in length. The most widely used are the 256-bit and 512-bit versions, which are usually referred to as SHA-256 and SHA-512 respectively
Returning to password verification, if we were to take Password123 as an example, using the SHA-256 hash algorithm generator, it would appear as:
008c70392e3abfbd0fa47bbc2ed96aa99bd49e159727fcba0f2e6abeb3a9d601
If you changed the word to password123 (no capital at the beginning, the hash would change to:
5ac6cdc50507fadeaec28b95f31035410e7937ceca62f38bf569cca54ddc5f32
Properties of a hash function
As mentioned earlier, hash functions use a one-way mathematic algorithm to generate hashes. Now let’s explore some of the other properties that make up a cryptographic hash algorithm.
First, the input can be of any length, but the output has a fixed length. Although we say the input can be of any length, there are limits – but they are very large!
Next, they’re collision free, meaning that it’s unfeasible to find two inputs that generate the same hash value.
Note: It’s been shown that MD5 and SHA-1 aren’t collision free, which is why they’re no longer recommended for use. Most security professionals and cryptography experts recommend SHA-2 for all new applications.
Finally, hash algorithms are designed to be computationally fast to run compared to other kinds of symmetric and asymmetric algorithms.
Simple use of a hash function
Figure 1: Use of a hash function
The above diagram (Figure 1) is a simplified illustration of how a cryptographic hash algorithm can be used.
- First, the sender generates a message.
- The system takes the message and processes it through the algorithm, outputting a message digest (fixed size numeric representation of the message).
- Then the sender sends the original message and the message digest to the recipient.
- The recipient then takes the message and calculates their own version of the message digest.
- Finally, the recipient compares the digest sent with the original message with the one they calculated. If the two message digests are the same, this verifies the integrity of the message.
If the data is changed or manipulated during transmission, the resultant hash will be different. Any change at all to the original message will result in a completely different digest being generated – even with just a single space or comma out of place.
You can already see how useful hash functions can be by telling you if something is legitimate or not. But if you apply a hash to data, does this always mean that the message can’t be altered by hackers? Sadly not, however, there are extra security measures available that verify data integrity.
For example, to prevent replay attacks, or the insertion of messages into a message-stream, hashing algorithms can add sequence numbers and timestamps to the flow. If a sequence number is added to each message, starting at one, the receiver of a message can determine if it has a missing number.
Despite this, there’s still a major problem with this simple method. Do you know what the problem could be?
- This method is NOT ‘collision free’ meaning there can be two inputs that generate the same hash value, increasing the likelihood of a breach.
- An eavesdropper could intercept the sender’s message and hash, then substitute their own message and corresponding hash.
The correct answer is 2.
An eavesdropper could intercept the sender’s message and hash, then substitute their own message and corresponding hash. The recipient not only needs to know that the message hasn’t been altered, but they also need to know from whom it came, and they need to be able to verify the messages authenticity.
Fortunately, hashing algorithms can also provide authentication – in this case, message authentication.
Message Authentication Code
Like encryption, message authentication codes are a family of three functions. The first function is a key generation function, just like encryption, it accepts the security parameter and produces a random k-bit key. Also similar to encryption, the message authentication code signing function will accept some arbitrary length message and a key. However, unlike the encryption function, it will produce a short, fixed length output called the mac tag. The verify function accepts an arbitrary length message, the mac key and a mac tag, and it will output a single bit – either yes, this message matches this mac tag for this key, or no, it doesn’t. Let’s look at an example (Figure 2).
Figure 2: Use of a message authentication code.
- The sender generates a message.
- The sender then links a secret key, or value, to the message.
- The system takes the resultant output and processes it with a hashing function, producing a message digest.
- Then, the sender transmits the message and message digest to the recipient.
- The recipient takes the message and secret key, which they already have, and calculates a message digest based on both inputs.
- Finally, the recipient compares the received digest with the digest they just calculated.
MACs are predicated on the assumption that only the sender and recipient know the symmetric key value. It’s this assumption that allows the recipient to use the MAC function to verify that the sender was the originator of the message.
The input data could, for example, contain the sequence numbers for use as a sequencing key, enabling the recipient to detect replay, lost, or inserted messages.
You’ve already looked at functions that produce fixed length output in cryptography, hash functions. In fact, you can use hash functions as a building block for message authentication codes. One thing hash functions don’t have natively is a way to incorporate a key. You can’t just take a key and stick it at the front or end of a message and just hash it, otherwise there are attacks. But what you can do is take the key and stick it at the front of the message and hash that, and take that result and stick the message in front of it and hash that. This construction creates your mac tag (see illustration). This more complex and cryptographically secure version of MAC is known as a hash-based message authentication code, or HMAC.
Figure 3: Construction of a hash-based message authentication code.
What’s next?
Hashing is a powerful method for encrypting data, but you should also be wary of its flaws. Next, you’ll hear from our expert Mark for more on hashing. Plus you’ll see a demonstration of a brute force attack, which, in cryptanalysis, is one method of defeating a cryptographic scheme.
One of four primary areas of cryptography, hash functions are the focus of this course, which is designed to inform you of their characteristics, properties, and uses.
A world-leading tech and digital skills organization, we help many of the world’s leading companies to build their tech and digital capabilities via our range of world-class training courses, reskilling bootcamps, work-based learning programs, and apprenticeships. We also create bespoke solutions, blending elements to meet specific client needs.