On 23 February 2017, researchers at the Cryptology Group at CWI Amsterdam and Google Research published two different PDF files that share the same SHA-1 hash. The digest is 38762cf7f55934b34d179ae6a4c80cadccbb7f0a for both. The project was called SHAttered, and it was the first time anyone had produced a practical, public collision for the full SHA-1 hash function. You can download the two files from this site and check them yourself: shattered-1.pdf and shattered-2.pdf. They are visibly different documents, yet SHA-1 cannot tell them apart.
That single fact ended a long argument about whether SHA-1 was still safe to trust. This article walks through what a collision is, why SHA-1 fell, what the team actually built, and what changed afterward.
What a hash collision is, and why it matters
A cryptographic hash function takes an input of any size and returns a fixed-length fingerprint. SHA-1 produces 160 bits, usually written as 40 hexadecimal characters. Three properties are supposed to hold: you cannot reverse the output back to the input, you cannot find a second input matching a given output, and you cannot find any two inputs that hash to the same value. That last property is collision resistance.
Collisions always exist in a mathematical sense. There are infinitely many possible inputs and only a finite number of 160-bit outputs, so some inputs must share a digest. The security claim is not that collisions are absent. It is that finding one should be so expensive that no realistic attacker can do it. SHAttered broke that claim for SHA-1.
Why does this matter outside a lab? Digital signatures, certificates, and integrity checks rarely sign the document itself. They sign its hash. A signature, a Git commit ID, a certificate fingerprint, a “this file has not changed” guarantee all collapse the file down to a hash and trust that the hash is unique to that file. If two files share a hash, a signature over one is equally valid over the other. The link between “what was approved” and “what you received” quietly breaks.
Why SHA-1 was vulnerable
SHA-1 was published by the NSA and standardized by NIST in 1995. It is built on the Merkle-Damgård construction, processing a message in 512-bit blocks and mixing each block into an internal state through 80 rounds of additions, rotations, and bitwise operations.
The trouble started early. In 2005, Wang et al. showed that collisions could be found in roughly 2^69 operations, well below the 2^80 that a generic birthday attack on a 160-bit hash would need. That was a theoretical result, far beyond the reach of the hardware of the day, but it marked SHA-1 as weakened. Over the following decade, cryptanalysts including Marc Stevens refined differential attacks that exploit how small, carefully chosen differences in the input propagate through the round function. Each refinement narrowed the gap between theory and a real, buildable attack.
The core weakness is structural. SHA-1’s round function does not diffuse differences strongly enough to stop an attacker from constructing two message blocks whose internal disturbances cancel out by the end. Once you can engineer that cancellation, you can force the internal state to converge, and a collision follows.
What the SHAttered team actually did
The attack was an identical-prefix collision. Both PDFs begin with exactly the same bytes. The researchers then computed a pair of carefully crafted near-collision block sequences that, when appended to that shared prefix, drive SHA-1’s internal state to the same value. Because the internal states match at the point where the colliding blocks end, anything appended after that keeps the hashes equal, a direct consequence of the Merkle-Damgård design.
PDF was a deliberate choice. The format is forgiving enough to hide the colliding blocks inside an object that controls which image is displayed, so the same hash maps to two documents that render with different visible content. That is what turns an abstract pair of byte strings into a believable abuse scenario.
The team behind the work included Marc Stevens, Pierre Karpman, Elie Bursztein, Ange Albertini, and Yarik Markov, spanning CWI Amsterdam and Google. The full write-up, with the mathematics and the engineering detail, is hosted here as shattered.pdf.
The scale of the computation
SHAttered was not a clever shortcut that ran on a laptop. It was a genuinely large computation, and the numbers are the point.
| Approach | SHA-1 evaluations | Notes |
|---|---|---|
| Generic birthday attack | about 2^80 | Brute-force baseline for a 160-bit hash |
| SHAttered identical-prefix attack | about 2^63.1 (roughly 9.2 quintillion) | The actual demonstrated work |
| Speedup over brute force | about 100,000x | Why the attack was feasible at all |
Google described the effort as the equivalent of around 6,500 CPU-years for the first phase and 110 GPU-years for the second. Spread across a large fleet, that is months of work, not centuries. The phased structure matters: an expensive search produces a usable near-collision configuration first, then a cheaper second stage finishes the matching pair. Roughly 9.2 quintillion hash evaluations sounds astronomical, and it is, yet it sits far below the brute-force wall. That gap is exactly what cryptanalysis is meant to find.
Why two PDFs with the same hash is dangerous
Picture a signing workflow. A reviewer approves a contract, and a system signs the SHA-1 hash of that PDF. With a collision pair in hand, an attacker can prepare two contracts in advance that share a hash: a benign one to be approved and a malicious one to be substituted later. The signature taken over the approved file validates perfectly against the swapped file, because the signature only ever covered the hash, and the hash is identical.
The same logic threatens any system that uses SHA-1 as an identity:
- Certificates. A certificate authority signing a SHA-1 certificate could be tricked into vouching for a colliding certificate it never intended to issue.
- Software distribution. A SHA-1 checksum that “proves” a download is authentic proves nothing if a colliding payload exists.
- Version control. Git identifies every commit and object by a SHA-1 hash. A collision means two different trees or blobs could claim the same identifier, which puts repository integrity in question. Git’s reliance on SHA-1 drew immediate scrutiny after the announcement.
The attack does not let someone forge a hash for an arbitrary file you already hold. It lets an attacker who controls both documents produce a matching pair from the start. In signing, escrow, notarization, and supply-chain settings, control of both documents is a perfectly ordinary situation, which is what made the result so uncomfortable.
Real-world fallout and the move to SHA-256
The deprecation of SHA-1 had been on paper for years, but SHAttered turned a recommendation into an emergency. A reproducible artifact is far more persuasive than a complexity estimate.
The response was quick and broad. Browser vendors finished removing trust for SHA-1 TLS certificates, and certificate authorities completed their migration to SHA-256. Within days, Git added a built-in collision detector based on the sha1collisiondetection library, which flags inputs bearing the fingerprints of this class of attack, and the project began its longer effort toward a hardened object format. Protocols, package managers, and signing tools accelerated their own retirements of SHA-1.
The destination for most of that migration was SHA-256, part of the SHA-2 family. SHA-256 has no known practical collision attack, a wider 256-bit output, and a stronger internal design, which is why it became the default for certificates, signatures, and integrity checks. For a fuller picture of how these primitives fit together, see the cryptography pillar.
What SHAttered means today
SHA-1 should not be used where collision resistance matters. That includes signatures, certificates, and any “has this been tampered with” check on data an adversary might influence. For those uses, SHA-256 or stronger is the baseline.
A few nuances are worth keeping straight. SHA-1’s preimage resistance, the difficulty of reversing a hash or matching a hash you did not help create, has not been broken. So a non-security use such as a deduplication key on trusted data is a different risk profile from a signature. The honest guidance is still simple: if security depends on the hash, move on from SHA-1.
The broader lesson outlived the specific break. Cryptographic primitives age. An attack that looks purely theoretical, like the 2005 result, tends to creep toward practicality as analysis sharpens and hardware grows cheaper. SHAttered is the cleanest illustration of that arc: twelve years from “weakened on paper” to “two real files, one hash, downloadable today.” Modern systems that need to prove data has not changed now lean on SHA-256, and approaches such as provably fair verification use SHA-256 commitments so that anyone can independently confirm integrity rather than take it on trust.
If you want to see the break with your own eyes, grab shattered-1.pdf and shattered-2.pdf, run sha1sum on each, and watch two different documents return the same 40-character digest.
FAQ
Was SHA-1 completely broken by SHAttered?
Its collision resistance was broken in practice, which is the property that protects signatures and certificates. The attack produces a pair of files with a matching hash. It does not reverse hashes or let an attacker match a file they had no hand in creating, so SHA-1’s preimage resistance remains intact. For anything security-sensitive, that distinction does not save it: move to SHA-256.
Can someone use this to forge a hash for a file I already have?
No. SHAttered is an identical-prefix collision, meaning the attacker constructs both documents together so they share a hash. It cannot take an existing file you control and manufacture a second file matching it. The danger lives in workflows where an attacker supplies the documents, such as signing, notarization, or escrow.
Why did the researchers use PDF files?
PDF is flexible enough to embed the colliding blocks inside an object that selects which content is shown, so a single hash can correspond to two documents that look different on screen. That makes the threat concrete: it models a benign file being approved and a malicious one being swapped in under the same signature. The pair lives at shattered-1.pdf and shattered-2.pdf.
Is SHA-256 affected by the same attack?
No. SHAttered exploits weaknesses specific to SHA-1’s round function and 160-bit output. SHA-256 uses a different design with a 256-bit digest and has no known practical collision attack, which is why it became the standard replacement across browsers, certificate authorities, and version control.