Latin-1 vs Windows-1252: Two Encodings That Look the Same But Aren't
Latin-1 (ISO-8859-1) and Windows-1252 are so similar that many developers treat them as identical — and browsers historically did too. But there are real differences that can cause bugs in systems that handle both.
What Is Latin-1?
ISO-8859-1, commonly called Latin-1, is an 8-bit single-byte encoding that extends ASCII with 96 additional characters covering most Western European languages. Code points 0–127 are identical to ASCII; 128–159 are defined as control characters; and 160–255 cover accented letters, currency symbols, and special punctuation used in French, German, Spanish, and other Western European languages.
How Windows-1252 Differs
Windows-1252 (also called cp1252) uses the same layout as Latin-1 for code points 0–127 and 160–255 — but repurposes the range 128–159. Instead of control characters, Windows-1252 assigns printable characters including the Euro sign (€, 0x80), curly quotes, dashes, and the trademark symbol. These characters don't exist in Latin-1.
The Browser Quirk
Because real-world Latin-1 pages often contain Windows-1252 characters in the 128–159 range, the HTML5 specification mandates that browsers treat a declared charset=iso-8859-1 as Windows-1252. This means Web developers effectively cannot use true Latin-1 via HTTP — the browser will use cp1252. This quirk is one reason UTF-8 is strongly preferred: it removes all ambiguity.
Identifying the Difference
If you're debugging text that contains Windows-1252 characters like curly quotes or the Euro sign but your system declares Latin-1, those bytes (0x80–0x9F) will decode to control characters and appear as boxes or question marks. Use our byte decoder to inspect the raw bytes in your text and identify which encoding was actually used.