All Posts Uncategorised

Hash Functions – Overview & Explanation

Definition: A hash function is a function which takes a value of arbitrary length as input (preimage) and calculates an output (digest) of fixed length.

In this post we explain what hash functions are and how they work.Hash function input->function->output

This output is usually represented as binary or hexadecimal value.

Properties of hash functions

Hash functions have certain properties.

Deterministic: For the same input you always get the same output.

One way: The calculation can be done only in one direction.

Given a hash output H(M1), it is hard (practically impossible) to find the corresponding input M1.

This means that it is impossible to calculate the input backwards from a given output.

Arbitrary input: It doesn’t matter what you use as input for the hash function. It should always yield an output.

Output: Many hash functions yield outputs with fixed lengths. However, there are also hash functions which have varying output lengths.

Efficiency of operation: In many cases hash functions are fast to calculate. But in some cases slow hash functions are desirable, for example in Proof of Capacity or if you want to protect hashed passwords from brute force attacks.

Uniformity: It is desirable that inputs are mapped evenly over the output range with the same probability. It also means that a tiny change of the input results in a major change in the output. However there are some hashfunctions which are specialized in having similar outputs, if the inputs are similar (so called fuzzy hashes or locality sensitive hashing).

Properties of cryptographic hash functions

For cryptographic hashfunctions there are some more properties.

Collision free: It should be infeasible to find two distinct input values M1 and M2 such that their hashes (outputs) are equal (H(I1) = H(I2).

Of course, collisions exist, since we map an infinite input space to a finite output space. But the output space is so huge, that it is practically impossible to find two inputs which get the same output.

Hiding: If a secret value R which is chosen from a wide uniform distribution and a hash output H(R|M) are given, then it is infeasible to find M. R|M means the strings R and M are concatenated.

This means if you have an input value M and concatenate it with another unknown value R and hash the result, the original input value M cannot be backward calculated. This R is sometimes called salt and it protects against brute force attacks.

The hiding property is very similar to the one way property.

Example of the hiding property:

You want to hash you age in years in order to commit to it. But you don’t want anybody to know your age before you want to reveal it. Now, someone could just try all values between 0 and 120 and calculate their hash output and compare this to your commitment. This would be very easy. Now, if you concatenate it with a random string R which is secret, this trial and error (brute force) doesn’t work.

This R is sometimes called salt.

This is sometimes done with passwords.

Puzzle friendly: A hash function H is puzzle friendly if for every possible n-bit output value Y, if R is chosen from a distribution with high min-entropy (randomly), then it is infeasible to find M such that H(R|M) = y, in time significantly less than 2^n.

This means that the output is spread evenly across the output range independently from the input. So, is no better strategy than trying random values of M repeatedly until you find the right one.

In Bitcoin (PoW) it means that you try to find a hash value of a certain length (numbers of leading zeros) and there is no way of finding an input value M which has a higher chance of yielding such an output.

The puzzle friendly property is not to be confused with the hiding property, although it is very similar and differences are subtle.

Examples of cryptographic hash functions

Here you find some examples of cryptographic hash functions

Applications of hash functions

Hash functions are used for many purposes. Here you find a list of common applications.

  • Proof of work
  • Commit-Reveal Schemes
  • Signatures (they take only a string with limited length)
  • Public Private key infrastructure
  • Hashtables
  • Sharding
  • Comparing against content changes of files