A cryptographic hash function is one of the quiet workhorses of modern computing. It takes data of any size and turns it into a short, fixed-length fingerprint, and it does so in a way that is easy to compute forward but practically impossible to reverse. That one-way property underpins password storage, file integrity checks, digital signatures, and blockchains. It is also where our own work left a mark: in 2017 we produced the first practical SHA-1 collision, proving that a widely deployed hash function had a fatal flaw. This guide explains what a cryptographic hash function is, the strict properties it must satisfy, how it differs from the ordinary hashes used elsewhere in software, and where it shows up in the systems you rely on.

What a Hash Function Does

At its simplest, a hash function maps an input of arbitrary length to an output of fixed length. Feed in a single character or an entire movie file, and you get back a digest of exactly the same size, typically shown as a string of hexadecimal characters. SHA-256, for instance, always returns 256 bits, written as 64 hex characters, no matter how large or small the input.

The output goes by several names: a digest, a hash, or a fingerprint. The fingerprint analogy is apt. Just as a fingerprint identifies a person compactly without containing the person, a digest identifies data compactly without containing the data. You cannot rebuild the original from the digest, but you can use the digest to check whether two pieces of data are the same or to detect that something has changed.

It is worth drawing the line with encryption early, because the two are often confused. Encryption is reversible: you scramble data with a key and unscramble it later with the right key, because the whole point is to recover the message. Hashing has no key and no inverse. It is a one-way fingerprint you never intend to reverse. The broader cryptography hub covers how hashing and encryption fit together in larger systems.

The Core Properties

Not every function that shrinks data is a cryptographic hash. To be useful for security, a hash function must satisfy a specific set of guarantees. When any one of them fails, the function is considered broken.

Deterministic

The same input always produces the same output, every time, on every machine. Hash the word cryptography today, next year, or on a different computer, and the digest is identical. Without this, comparing a freshly computed hash against a stored one would be meaningless, so determinism is the foundation everything else builds on.

Fixed-Length Output

The digest is always the same size regardless of the input. An empty string and a multi-gigabyte file both produce, say, a 256-bit SHA-256 digest. This makes hashes cheap to store, compare, and transmit, and it means the size of a digest never reveals anything about the size of what produced it.

Preimage Resistance (One-Way)

Given a digest, it must be computationally infeasible to find any input that produces it. You can go forward from input to digest easily, but you cannot work backward from digest to input. This is the property that lets a system store a fingerprint of sensitive data, such as a password, without storing the data itself. Even someone who steals the stored digest cannot reverse it to recover the original.

Second-Preimage Resistance

Given one specific input, it must be infeasible to find a different input that produces the same digest. The distinction from preimage resistance is subtle but important: here the attacker already has a valid input and digest, and is trying to find a second input that collides with that particular one. This matters when a digest is published for a known file: an attacker should not be able to craft a different file that matches the same published hash.

Collision Resistance

A collision is any two different inputs that produce the same digest. Because inputs are unlimited and outputs are a fixed size, collisions must exist mathematically; there are simply more possible inputs than possible outputs. The security promise is not that collisions do not exist, but that nobody can find one within any practical amount of computing time. Collision resistance is the strongest and hardest of the resistance properties to maintain, and it is precisely the one that fell for SHA-1.

The Avalanche Effect

Change a single bit of the input and, on average, half the output bits flip. The new digest looks completely unrelated to the old one, with no gradual drift. This property guarantees that a hash leaks no hint about how similar two inputs were, and it is what makes hashing so effective for tamper detection: any change, however tiny, produces a loudly different result.

Cryptographic vs Non-Cryptographic Hashes

Not every hash function is cryptographic, and mixing the two up causes real problems. Software uses plenty of fast, simple hashes for purposes that have nothing to do with security, and those are designed against entirely different goals.

A non-cryptographic hash, such as the kind used inside a hash table or a checksum like CRC32, is built for speed and even distribution. Its job is to spread data across buckets quickly or to catch accidental, random corruption, for example a few bits flipped by a faulty network link. It makes no promise that an attacker cannot deliberately engineer a collision, and in fact such collisions are usually easy to produce on purpose.

A cryptographic hash adds the adversarial requirement. It must resist a motivated attacker who is actively trying to find preimages or collisions, not just survive random noise. That extra burden makes cryptographic hashes more complex and slower than their non-cryptographic cousins. The two are not interchangeable. Using a non-cryptographic hash where security is needed is a serious mistake, because what looks like a fingerprint offers no real protection against someone trying to forge a match.

Property Non-cryptographic hash Cryptographic hash
Main goal Speed, even distribution Security against attackers
Resists accidental collisions Yes Yes
Resists deliberate collisions No Yes (when unbroken)
One-way (preimage resistant) Not required Required
Typical uses Hash tables, error-checking Passwords, signatures, integrity

Common Uses

Cryptographic hash functions appear across computing wherever a trustworthy, compact fingerprint is needed. A few uses dominate.

Password Storage

Well-built systems never store your actual password. They store a hash of it, so that a database breach does not hand attackers your credentials directly. When you log in, the system hashes what you typed and compares it to the stored digest. Because hashing is one-way, the stored value cannot be trivially reversed. In practice, password storage adds a unique random salt to each password before hashing, so identical passwords produce different digests, and uses a deliberately slow, purpose-built scheme such as bcrypt, scrypt, or Argon2 rather than a fast general hash, since slowness frustrates large-scale guessing. Our password security guide covers this in depth.

Data Integrity

Software projects publish a hash alongside their downloads. After fetching a file, you hash it yourself and compare; if the digests match, the file arrived intact and unaltered. Thanks to the avalanche effect, a single flipped bit changes the entire digest, so the check is unforgiving. The same principle protects backups, software updates, and any situation where you need to confirm that data has not changed.

Digital Signatures

Signatures combine hashing with public-key cryptography to prove both who created a message and that it has not been altered. Rather than sign an entire document, which would be slow, the signer hashes it first and signs the compact digest. This is exactly why a broken hash function is so dangerous for signatures: if an attacker can find two documents with the same hash, a signature on the harmless one is also valid on the malicious one. Our digital signatures explainer walks through the full mechanism and this failure mode.

Blockchains

Cryptocurrencies and other blockchains lean heavily on hashing. Each block contains the hash of the previous block, chaining them together so that altering any block would change its hash and break every block after it, making tampering evident. Hashing also powers proof-of-work mining, where computers search for an input that produces a digest below a target value, a puzzle that is hard to solve but trivial to verify. SHA-256 is the hash function behind Bitcoin specifically.

When a Hash Function Breaks

A cryptographic hash function is trusted only as long as its core properties hold. When researchers find a way to violate one of them in practice, the function must be retired, even though it technically still produces output.

SHA-1 is the clearest example. Designed in the 1990s, it produced a 160-bit digest and secured much of the early web. Theoretical weaknesses surfaced over time, but the decisive moment came when our team, working with collaborators, produced SHAttered, the first practical SHA-1 collision: two different PDF files that hashed to the same SHA-1 value. That result broke collision resistance in the real world and pushed the industry to abandon SHA-1 for certificates, signatures, and version-control trust anchors. MD5, an older 128-bit function, fell even harder and earlier, with collisions now trivial to generate.

The lesson is that hash functions have lifespans. As analysis improves and computing power grows, designs once considered safe can become breakable. This is why the field maintains current, well-analyzed standards such as SHA-256 (covered in our SHA-256 explainer) and keeps a structurally different backup, SHA-3, in reserve. Using a hash function long after it has been broken is one of the more avoidable mistakes in security.

Frequently Asked Questions

Is hashing the same as encryption?

No. Encryption is reversible with a key, so the original message can be recovered. Hashing is one-way and keyless, producing a fingerprint that cannot be reversed. They are often used together, but they serve different purposes.

Can two different inputs ever have the same hash?

Yes, in theory, because there are infinitely many possible inputs and only a finite set of fixed-length outputs, so collisions must exist. For a secure hash function, the point is that nobody can actually find such a pair within any practical amount of computing time. When someone can, as with SHA-1, the function is considered broken.

Why is SHA-1 broken if it still produces a hash?

It still runs and returns a digest, but researchers found a way to generate two different inputs with the same SHA-1 output. Once collisions are practical, the function can no longer be trusted for signatures or certificates, even though it technically still works. The SHAttered page explains the attack.

Which hash function should I use?

SHA-256 is the safe default for general purposes such as integrity checks and signatures. SHA-3 is a sound alternative built on a different design. For password storage specifically, use a purpose-built, slow, salted scheme such as bcrypt, scrypt, or Argon2 rather than a raw fast hash.