Keccak hashing algorithm (SHA-3)

What is a Hash?

The hash is one of the main building blocks underlying blockchain technology. That’s why you need to understand hashing and its related concepts—things like hash rate and function and later discuss what is new about the Keccak algorithm.

Hash is the process of taking any input data and running it through an algorithm that then produces output data of a specific and consistent size.

The output data is a hash. The algorithm used in the process is called a hash function. And finally, the rate at which you can push data through the hash function to generate a new hash is your hash rate.

All of this is crucial to blockchain technology. In large part, it’s what gives blockchain many of its unique properties.

How blockchain uses hashes

Essentially, a blockchain is built by constantly adding new data to an ever-expanding list. More specifically, when you’re processing a new batch of transactions, you begin by building on top of the data from the previous batch of transactions.

You then run the whole thing through the hash function to generate a new one. The network then adds that new output to the block. And the next batch of transactions begins on top of the newest one that you just generated.

You Can Input Anything

Importantly, you can input any conceivable string of letters and numbers into a hash function and the algorithm will give you hashes that are always the same size as one another. In fact, no matter the size of the input, you always get a fixed-length output when hashing. The hash represents a string of characters, which act as a “fingerprint” of that input. This makes it easy and quick to verify any given one. At the same time, it’s virtually impossible to figure out what the input was. So hashing is the method used for compressing data. Still, it’s not the typical compression everyone knows, like a .zip or .rar file.

This also provides security. That’s because even the slightest change in any input data will produce a dramatically different output, known as the avalanche effect, which will then have a domino effect throughout the rest of the blockchain. Basically, it makes it impossible to tamper with any data.

We all know fingerprints are small, but they contain a massive amount of data. You know, things like our names, faces, addresses, and other sensitive information. Hashing is similar – it takes an arbitrary-sized piece of data and turns it into a relatively small sequence of characters.

Cryptographic Hash Explained

When you need security and privacy, the cryptographic hash comes into play. The downside of cryptographic hashing is it’s usually slower than the other types of hashes. If you need to hash quickly and you don’t need high-level security – non-cryptographic hashing is better. For example – if you are creating an index of some non-sensitive data.

The main difference between non-cryptographic and cryptographic hashing is the latter is extremely difficult to break. Notice that it’s not impossible. Still, cryptographic hashing makes cracking a hash near-impossible.

For a hash function to be a cryptographic hash, it has to have several properties.

Property #1 – Speed

If you like fancy words – the cryptographic hash functions should be computationally efficient. That means the hashing function should be able to produce a hash in a fraction of a second.

Property #2 – the Avalanche Effect

The avalanche effect means even a minor change in the message will result in a major change in the hash value. Message Hash of the message

  Welcome to DeepRnD	938cc867d801dd7f2fe95047a07d47a6ab20572c1cab2abfff2eaaf6
  Welcome to DeepRnd	3d3beed389afac9993bad17779a7a599ad358661555825bc62127d9e

Using a lower-case “d” in the second message changes the hash code entirely. This is a simple hash function example, but you get the idea. It’s very practical and can quickly show if any data has been altered.

Property #3 – the Cryptographic Hash Function Should Be Deterministic

That means no matter how many times you use a hash function for the same input, you’ll always get the same output. This is obvious since if you got random hashes for the same message, the whole process would be meaningless.

Property #4 – Pre-Image Resistance (One-Way Function)

This means that it’s really hard to get to the input via the output. In simple words, you can’t reverse the cryptographic hash function to get to the data. Still, this doesn’t mean it’s impossible to see the message.

The pre-image resistance property of the cryptographic hash plays a significant role in the hashing vs. encryption debate - you can decrypt an encrypted message, but you can’t do the same for a cryptographic hash.

Property #5 – Collision Resistance

That means that two different messages shouldn’t be able to produce the same hash value. As hash values have a fixed length, this means there are limited output combinations. The inputs, on the other hand, are an infinite number. So, in theory, there’s a chance that two different messages could produce the same hashes.

Still, the hash function in cryptography makes the odds of a hash collision practically negligible.

All of these properties ensure the security and usability of a cryptographic hash. So it’s time to meet the different cryptographic hash functions.

Hashing in Cryptocurrencies

We’ve already explained what Bitcoin is and how the blockchain works, so we’ll go straight ahead to hashing.

The cryptographic hash function is essential to cryptocurrencies since it guarantees one of the blockchain’s most important features – immutability.

Since cryptocurrency blockchains deal with large numbers of transactions, they use hashing. This is a far more practical and secure approach than to keep every record of every single transaction in the ledger.

In Bitcoin’s case, the miners run a series of SHA-256 functions to create the hash of a block. Then the latter receives a timestamp. Once the nodes reach consensus, the block is added to the blockchain. Not only does the block have its own hash, but it also contains the hash of the previous one, thus chaining all of them together.

Because of the avalanche effect, any attempts of tampering with a block are not possible. In case someone tries to change a transaction in a block, they have to alter every consecutive one as well. Such an operation would require so much computing power and time, that it’s practically impossible.

That makes hashing a crucial feature in the blockchain’s security.

Bitcoin uses two hashing algorithms to generate a public address (key) – SHA-256 and RIPEMD-160. This was done by Satoshi Nakamoto to provide better protection for public keys and to decrease the odds of a collision.

Ethereum, on the other hand, uses the Keccak-256 hash algorithm, which is the foundation of SHA-3, of which we'll talk today.

Keccak and SHA-3

At first there was SHA-0, but it had many flaws and didn’t become widely used. SHA-1 tried to fix them, but got broken in 2005.

The SHA-2 family consists of four members – SHA-224, SHA-256, SHA-384, and SHA-512, which differ in the number of bits of their hash values. So far, there hasn’t been a successful attack on the SHA-2 cryptographic hash algorithm, it's commonly used today.

SHA-3 and its subclasses are commonly used today until SHA-3 proves itself as an even more secure function.

SHA-3, also known as Keccak (its original name before it was chosen as the winner of the NIST SHA-3 competition), is a completely new hash algorithm that has nothing to do with SHA-1 and SHA-2.

Indeed, one of the stated reasons why NIST chose Keccak over the other SHA-3 competition finalists, was its dissimilarity to the existing SHA-1/2 algorithms; it was argued that this dissimilarity makes it a better complement to the existing SHA-2 algorithms (which are still considered secure and recommended by NIST), as well as making it less likely that any future cryptanalytic breakthroughs would compromise the security of both SHA-2 and SHA-3.

For some background, the SHA-3 hash function competition was originally announced by NIST in 2007, after some new cryptanalytic attacks had called the security of SHA-1 into question. While the attacks on SHA-1 were mainly of theoretical interest back then, it was feared that further improvements on these techniques might allow practical collision-finding attacks on SHA-1, and that the same techniques might also be applied against SHA-2, which shares a similar design to SHA-1. Thus, NIST decided to hold a competition to select a successor for SHA-2, which would be named SHA-3.

However, while a real world collision attack on SHA-1 was finally demonstrated in 2017, the feared attacks on SHA-2 have failed to materialize. It's nowadays generally accepted that breaking SHA-2 won't be as easy as it seemed ten years ago, and thus all the variants of SHA-2 are still considered secure for the foreseeable future. Thus, NIST decided to select Keccak as SHA-3, and to recommend it as an alternative (not successor) to the SHA-2 hash functions.

There are, however, advantages of SHA3 over its predecesors:

Resistance to length extension attacks.

With SHA-256, given ๐ป(๐‘š) but not ๐‘š, it is easy to find ๐ป(๐‘š‖๐‘š′) for certain suffixes ๐‘š′. Not so with any of the SHA-3 functions.

This means, e.g., that ๐‘š↦๐ป(๐‘˜‖๐‘š) is not a secure message authentication code under key ๐‘˜ when ๐ป is SHA-256, because knowing the authenticator on one message lets you forge the authenticator on another. The length extension property partly moviated the development of HMAC.

In contrast, the key-prefix construction is secure as a MAC when ๐ป is any of the SHA-3 functions—or either of the newer SHA-2 functions SHA-512/224 and SHA-512/256. For SHA-224, which is essentially the 224 bit truncation of SHA-256 (but with a different IV), the adversary has a 2−32 chance of guessing the discarded bits of output in a single trial—small but not negligible.

Performance.

The SHA-2 functions—particularly SHA-512, SHA-512/224, and SHA-512/256—generally have higher performance than the SHA-3 functions. Partly this was out of paranoia and political reasons in the SHA-3 design process.

(In response, one of the SHA-3 finalists was spun out into the much faster BLAKE2, also widely used on the internet today, and the SHA-3 winner Keccak was spun out into the much faster KangarooTwelve.)

Completely different internal design.

SHA-2 uses the Davies–Meyer structure, an instance of the Merkle–Damgรฅrd structure, with a block cipher (sometimes called SHACAL-2) built out of an ARX network, like MD4; SHA-3 uses the sponge structure with the Keccak permutation.

There is no user-visible difference here, but it made a difference for cryptographers' confidence in the designs after many DM/ARX designs based on MD4 were broken in the late '90s and early 2000s.

Conclusion

What all that means is that, if you want a secure and standardized hash function, you can choose either SHA-2 or SHA-3. If you're feeling really paranoid, you may even want to use both, and to design your cryptosystem so that it remains secure even if either one of the hash functions is broken.

However, remember that it could be easier to collide sha3(sha2(X)), than to collide sha2(X) or sha3(X); or it could mean that providing both sha2(X) and sha3(X) could leak more information about X than intented, and make guessing X easier. We'll cover more about hashing best practices in the next articles.

Comments

Popular posts from this blog

CAP Theorem and blockchain

Length extension attack

Contract upgrade anti-patterns