Message Digest

A message digest is another kind of cryptographic algorithm, but it does not do encryption and decryption and does not use a key. It produces a fixed length cryptographic residue (from 160 to 512 bits) from any size file which “characterizes” that file in some sense.

A given message will always produce exactly the same message digest (for a given algorithm). But make any change whatsoever to the file (even a single bit in War and Peace) and you will get a completely different message digest. It is very good at detecting changes to files (malicious changes by a hacker, or from errors in transmission). These changes could happen during transmission, or while stored, possibly for a long time.

Common Message Digest Algorithms

Years ago, one of the first message digest algorithms (MD5) produced a 128-bit digest and it was widely used to verify a downloaded file as being correct. This was actually easy to fool, just by producing a new message digest for a changed file and posting that. MD5 was discovered to be subject to certain kinds of attacks, so was deprecated in 2004.

SHA-1 (Secure Hash Algorithm 1) replaced it with a 160 bit digest. SHA-1 was OK for a while, but it was later compromised and then deprecated in 2010.

A family of new message digest algorithms were published under the name SHA-2, of varying digest sizes: 224, 256, 384 and 512 bits. Those are sometimes referred to as SHA224, SHA256, SHA384 and SHA512. It is more correct to call them SHA2-224, SHA2-256, SHA2-384 and SHA2-512.

Since SHA-2 is heavily based on the design of SHA-1, but with more stages and longer digests it was feared that the hacks against SHA-1 might someday be effective against SHA-1. So in 2012 the SHA-3 standard was released, with the same 224, 256, 384 and 512 bit digest lengths. This used a different scheme than SHA-2, and is kept “in reserve” in the event that SHA-2 is compromised.

Today the recommended message digest algorithm is SHA2-256.

Characteristics of a Good Message Digest Algorithm

    • Good cryptographic dispersion, so that tiny changes are amplified – ideally every bit of the message affects the final digest
    • It is created using many one-way transformations (compare to encryption where every transformation must be one-to-one onto, or reversible). There is no way to recover any part of the message given the digest.
    • It should be extremely difficult or impossible for someone to make changes to a file then make offsetting changes and still produce the same digest (this can easily be done with simpler schemes like a checksum or even CRC-32).

While there are a very large number of even 160-bit digests (2 to the 160th), there are far more possible messages (most of which are total gibberish). While it is possible that two different emails or books could produce the same digest, it is very, very difficult to cause that to happen on purpose.

The total number of books ever published, or even the total number of email messages sent, is a vanishingly tiny number compared to the number of possible 160-bit (let alone 256, 384 or 512-bit) digests. It is unlikely that any two books or emails ever published or sent would produce the same digest, but it is not impossible, just very, very unlikely. These are called collisions, and a message digest is can be broken by being able to produce a collision on purpose.

Conceptual Representations

You can think of message digest as a mathematical function or transform:

MD = SHA(message)

Or for those more visually oriented:

Primary Uses

The main use today for message digests is in Digital Signatures.

To digitally sign a message, you produce a message digest of it, then encrypt that digest with an asymmetric algorithm and your own private key.

To verify a signature, you decrypt the signature with the signer’s public key (from their digital certificate) to recover the original digest, then produce a new digest of the message. If those match, the signature is valid. This lets you know two things:

    • Message integrity (the message has not changed in any way since it was signed)
    • Sender authentication (only the owner of the private key corresponding to the certificate used to validate the signature could have created such a signature

If the signature fails, then one or both of the following is true (there is no way to know which is true, but in either event you should not trust the message):

    • Something has changed in the file since it was signed (could be malicious or from a transmission or storage error)
    • It was signed by some other person than the one whose certificate you are using to validate the signature

HMAC – Hashed Message Authentication Code

There is a variant of  the message digest algorithm that does use a key. The HMAC is defined in RFC 2104, “HMAC: Keyed-Hashing for Message Authentication”, Feb 1997. It is kind of like a “poor man’s digital signature”, but suffers from the same key management problem as symmetric key cryptography (securely communicating the key from sender to recipient).

It is possible to create an HMAC from any regular message digest (see the RFC for details).

From here, continue on to Digital Signature.