Take a simple mathematical operation like addition. Addition takes 2 inputs and produces 1 output (the sum of the two inputs). If you know the 2 inputs, the output is easy to calculate – and there’s only one answer.
321 + 607 = 928
But if you only know the output, how do you know what the two inputs are?
928 = 119 + 809 928 = 680 + 248 928 = 1 + 927 ...
Now you might think that it doesn’t matter – if the two inputs sum to the correct value, then they must be correct. But no.
What happens in a real hash function is that hundreds of one-way operations take place sequentially and the results from earlier operations are used in later operations. So when you try to reverse it (and guess the two inputs in a later stage), the only way to tell if the numbers you are guessing are correct is to work all the way back through the hash algorithm.
If you start guessing numbers (in the later stages) wrong, you’ll end up with an inconsistency in the earlier stages (like 2 + 2 = 53). And you can’t solve it by trial and error, because there are simply too many combinations to guess (more than atoms in the known universe, etc)
In summary, hashing algorithms are specifically designed to perform lots of one-way operations in order to end up with a result that cannot be calculated backwards.
Since this question seems to have attracted some attention, I thought I’d list a few more of the features hashing algorithms use and how they help to make it non-reversible. (As above, these are basic explanations and if you really want to understand, Wikipedia is your friend).
- Bit dependency: A hash algorithm is designed to ensure that each bit of the output is dependent upon every bit in the input. This prevents anyone from splitting the algorithm up and trying to reverse calculate an input from each bit of the output hash separately. In order to solve just one output bit, you have to know the entire input. In other words, when reversing a hash, it’s all or nothing.
- Avalanching: Related to bit dependency, a change in a single bit in the input (from 0 to 1 or vice-versa) is designed to result in a huge change in the internal state of the algorithm and of the final hash value. Since the output changes so dramatically with each input bit change, this stops people from building up relationships between inputs and outputs (or parts thereof).
- Non-linearity: Hashing algorithms always contain non-linear operations – this prevents people from using linear algebra techniques to “solve” the input from a given output. Note the addition example I use above is a linear operation; building a hash algorithm using just addition operators is a really bad idea! In reality, hashing algorithms use many combinations of linear and non-linear operations.
All of this adds up to a situation where the easiest way of finding a matching hash is just to guess a different input, hash it and see if it matches.
Lastly, if you really want to know how hard reversing a hash is, there’s no better substitute than just trying it out for yourself. All good hashing algorithms are openly published and you can find plenty of code samples. Take one and try to code a version that reverses each step; you’ll quickly discover why it’s so hard.