Length extension attack

August 16, 2019

What is length extension?

When a Merkle–Damgård based hash is misused as a message authentication code with construction H(secret ‖ message), and message and the length of secret is known, a length extension attack allows anyone to include extra information at the end of the message and produce a valid hash without knowing the secret. Quick sidebar, before you freak out:

Since HMAC does not use this construction, HMAC hashes are not prone to length extension attacks.

So, a length extension attack is a type of attack where an attacker can use Hash(message1) and the length of message1 to calculate Hash(message1 ‖ message2) for an attacker-controlled message2, without needing to know the content of message1.

Algorithms like MD5, SHA-1 and most of SHA-2 that are based on the Merkle–Damgård construction are susceptible to this kind of attack. Truncated versions of SHA-2, including SHA-384 and SHA-512/256 are not susceptible, nor is the SHA-3 algorithm.

A bit more details

The vulnerable hashing functions work by taking the input message, and using it to transform an internal state. After all of the input has been processed, the hash digest is generated by outputting the internal state of the function. It is possible to reconstruct the internal state from the hash digest, which can then be used to process the new data. In this way, one may extend the message and compute the hash that is a valid signature for the new message.

Usages

In theory when the digest of a message, msg, is known, the digest sha2(msg + padding + newmsg) can be calculated without knowing the original message msg, when padding is of the form \x80\x00...\x00\x??\x??...\x?? and the number of zero bytes is such that length(msg + padding) is a multiple of the block size (64 bytes for SHA256 and 128 bytes for SHA512). The final \x?? bytes are length(msg) as a big-endian 64-bit (SHA256) or 128-bit (SHA512) integer.

In practice, one often knows the message, msg, but the digest is salted, i.e. it is sha2(salt + msg) for some unknown salt. The attack still applies: the digest sha2(salt + msg + padding + newmsg) can be calculated without knowing the salt. A practical example is when a web application generates a signed authentication token of the form user_info.digest(salt + user_info). The user can then append arbitrary content to user_info and generate a valid signature for the new message. If the application parses user_info in an insecure way, the user may authenticate as a different user. For example if user_info is a serialized object of some sort, then properties specified later override previous ones, so username=spongebob,username=admin would authenticate as admin. Pretty neat, right?

Why this happens?

The original hash methods were often based on the Merkle-Damgård construction. With this, we create a hash function using blocks of data. Based on the constuct, Ron Rivest created the MD5 hashing method, and it was widely adopted in the industry. It works by taking a static initialisation vector (IV) and then feeding this into a one-way function (f), along with a block of the message. We feed this output into the next stage, and so on until we get to a message pad at the end:

The one-way function (f) will generally compress the data and produce fewer bits out than are fed in. Unfortunately, the MD construct has many weaknesses, and one of the most serious is the length extension attack. With this an adversary (Eve) can take a hash for an unknown message, and then add additional data to produce a new valid hash. So Bob could take a hash of a password that he and Alice know (“qwerty123”) and the append with a message (“hello”) to produce:

H(Password || Message)

where “||” identifies the appending of one string onto another. Thus when Bob sends a message to Alice, she will prepend the message with the shared password, and generate the same hash. In this way, Bob has proven the message and that he knows the secret password. This is a message authentication code (MAC) and validates that Bob knows a shared secret and the message. But, the construction method is flawed as it is possible for Eve to take a previous hash for a known message, and then append a new method to produce:

H(Password || Original Message || New Message)

In this way, Eve does not know the password but can still generate a valid hash, and add her message onto it. An outline of the code is:

import hashpumpy
import hashlib
import sys

password=b'password'
message= b'message'
addition = b'addition'

if (len(sys.argv)>1):
  password=(sys.argv[1]).encode()
if (len(sys.argv)>2):
  message=(sys.argv[2]).encode()
if (len(sys.argv)>3):
  addition=(sys.argv[3]).encode()


# Compute a previous hash for H(Password || Message)
m = hashlib.sha1()
m.update((password+message))
rtn=m.hexdigest()
print ("Previous hash: ",rtn)


# Compute a  hash for H(Password || Message || Addition)
rtn = hashpumpy.hashpump(rtn, message, addition, len(password))

print ("New hash: ",rtn[0])
print ("New message: ",rtn[1])

m = hashlib.sha1()
m.update(password+rtn[1])
rtn=m.hexdigest()
print ("Computing new hash (password+newdata): ",rtn)

A sample run for a message of “message” and with the addition of the text of “addition” is:

Previous hash:  22583ca8f00efff6296b4b571b9c2e1bcf22a99a
New hash:  dd448d0874b738ca1b85bc00e151fbf16393ce4a
New message:  b'message\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00xaddition'
Computing new hash (password+newdata):  dd448d0874b738ca1b85bc00e151fbf16393ce4a

Conclusion

Truncated versions of SHA-2, including SHA-384 and SHA-512/256 are not susceptible to this attack, nor is the SHA-3 algorithm. So instead of inventing different workarounds like double-hashing, use the resistant algorithms instead.

Deep Research and Development