My Log: Why it is hard to break hash functions

I was going through the skein hash function when one friend asked why it is difficult to break a cryptographic hash function. A comprehensive introduction to hash functions is well beyond the scope of this article, so if you want to refresh your memory, please read it up from elsewhere and comeback.

Any good hash function must have the following features:

Preimage resistance: given a hash h it should be hard to find any message m such that h = hash(m)

Second preimage resistance: given an input m1, it should be hard to find another input, m2 (not equal to m1) such that hash(m1) = hash(m2).

Collision resistance: it should be hard to find two different messages m1 and m2 such that hash(m1) = hash(m2).

When somebody tells you that it is impossible to break a hash function, you ask the question: What does it mean to break a hash function? The short answer is that if you are able to find out a message corresponding to the given message digest, the hash function is said to be broken.

If you are a programmer, you may think:

If hash function is implemented using some algorithm, shouldn’t it be possible to reverse the algorithm and get back the message corresponding to the message digest? If the algorithm runs step-by-step in the CPU, shouldn’t it be possible to trace back through the steps that resulted in the output?

The answer is of course: No.

Why?

The simple answer is that all hash functions can be considered to be one way functions. One-way functions are mathematical functions using which you can compute the output for any input, but extremely difficult to find out the input when any random output value is given. ie., one-way functions cannot be inversed/reversed.

For example, take the integer factorization problem. It is easy to multiply the numbers 101 and 103, but it is very difficult to find the factors of 10403 (If you don’t know the answer in advance, of course). Of course you can find the factors using a computer, but integer factorization becomes harder and harder as the magnitude of the numbers go up. For large enough numbers, even computers find it incredibly difficult to find the prime factors.

Cryptographic hash functions employ some kind of one-way function so that the message digest cannot be easily reversed to form the corresponding message.

Let us examine the Skein hash function and see why it cannot be broken. (Skein is an example of a hash function that uses a block cipher as a basic building block. Many modern hash functions have similar structure.)

The basic structure of the skein function is shown in the following figure:

The basic block in the Skein hash is a function called UBI. As you can see there are several inputs to the hash function. The only variable input is the message. This means that we can pre-compute the output of the first UBI block. So in essence our problem becomes:

You would have assumed by now that to reverse this hash function, we will have to reverse the UBI blocks. Let us take a peek inside the UBI block.

The Skein hash function uses Threefish encryption function in its UBI blocks. The original message is considered as the plain-text input to the first UBI block and it is encrypted using the precomputed value from the config block as the key. This encrypted value is input as the key to the last UBI block.

If we try to reverse the last UBI block, we will understand that what we have to do in essence is to retrieve the key of the Threefish encryption given the plain-text and the cipher text. Now if you know anything about encryption algorithms, you would know that it is impossible to retrieve the key given a {plain text, cipher text} pair for a good enough encryption algorithm. This is a basic property of encryption algorithms.

All this essentially boils down to one insight:

Skein hash function is hard to break because it is hard to find the key for the encryption algorithm used inside the UBI blocks – In case of Skein the block cipher used is ThreeFish. Many popular hash functions use some type of block cipher inside them and that is what gives them their strength.

So now the question becomes:

Why is it hard to find the key of an encryption algorithm?

To find out the answer, we should look inside the specific encryption algorithm used. More on that, later.