Windows-1252
Microsoft's extension of Latin-1. Assigns printable characters to the C1 control code range (0x80–0x9F), including the euro sign, smart quotes, and em dash. Extremely common in legacy Windows files and web pages.
Byte Structure
Windows-1252 uses fixed 1-byte encoding per character. Characters not in this encoding cannot be represented and must be replaced or transliterated.
When to Use Windows-1252
Windows-1252 is the encoding you'll encounter most often in legacy Windows files and mislabelled web pages. If you're reading a file that claims to be ISO-8859-1 but contains smart quotes, em dashes, or the euro sign, it's almost certainly Windows-1252. Use it only when consuming or producing content that must interoperate with legacy Windows applications.
Sample Characters in Windows-1252
The table below shows how a selection of characters are represented in Windows-1252. Bytes are shown in hexadecimal. Characters marked "not supported" cannot be encoded in Windows-1252 and would need to be replaced or transliterated when converting from Unicode.
| Character | Codepoint | Name | Bytes (Hex) | Bytes (Decimal) | Supported |
|---|---|---|---|---|---|
| A | U+0041 | LATIN CAPITAL LETTER A | 41 | 65 | Yes |
| a | U+0061 | LATIN SMALL LETTER A | 61 | 97 | Yes |
| 0 | U+0030 | DIGIT ZERO | 30 | 48 | Yes |
| $ | U+0024 | DOLLAR SIGN | 24 | 36 | Yes |
| £ | U+00A3 | POUND SIGN | A3 | 163 | Yes |
| © | U+00A9 | COPYRIGHT SIGN | A9 | 169 | Yes |
| € | U+20AC | EURO SIGN | 80 | 128 | Yes |
| α | U+03B1 | GREEK SMALL LETTER ALPHA | not supported | — | |
| А | U+0410 | CYRILLIC CAPITAL LETTER A | not supported | — | |
| 中 | U+4E2D | not supported | — | ||
| あ | U+3042 | HIRAGANA LETTER A | not supported | — | |
| ☺ | U+263A | WHITE SMILING FACE | not supported | — |
Working with Windows-1252 in Code
Every major language has built-in support for encoding conversion. The examples below show how to encode a string to Windows-1252 bytes and decode it back to a Unicode string. Always specify the encoding explicitly — never rely on system defaults, which vary by OS and locale.
# Encode a string to windows-1252 bytes
text = "Hello, 世界"
encoded = text.encode("windows-1252")
# Decode bytes back to a string
decoded = encoded.decode("windows-1252")
// Convert to windows-1252
$bytes = mb_convert_encoding(
"Hello, 世界",
"windows-1252",
"UTF-8"
);
// Convert back to UTF-8
$text = mb_convert_encoding(
$bytes,
"UTF-8",
"windows-1252"
);
// Encode to windows-1252 bytes
const encoder = new TextEncoder(); // UTF-8
const bytes = encoder.encode("Hello, 世界");
// Decode bytes
const decoder = new TextDecoder("windows-1252");
const text = decoder.decode(bytes);
-- Create a database with Windows-1252
CREATE DATABASE mydb
ENCODING 'windows-1252'
LC_COLLATE 'en_US.UTF-8';
-- Check database encoding
SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = current_database();
Compare with Other Encodings
See how Windows-1252 differs from other encodings — which characters each supports and how the byte representations compare.
Windows-1252 FAQ
How is Windows-1252 different from Latin-1?
Windows-1252 extends Latin-1 by replacing the 27 C1 control codes in positions 0x80–0x9F with printable characters: smart quotes (\u201c\u201d \u2018\u2019), em dash (—), ellipsis (…), the euro sign (€), and others. The HTML specification requires browsers to interpret ISO-8859-1 as Windows-1252, so on the web they are effectively the same encoding.
Why do I see "“" or similar garbage when reading Windows-1252 files?
This is mojibake — the result of interpreting Windows-1252 bytes as UTF-8. Windows-1252 bytes 0x80–0xFF are not valid UTF-8, so a UTF-8 decoder produces garbage or replacement characters. The fix is to specify the correct encoding when opening the file: open(file, encoding="windows-1252") in Python, for example.
Is Windows-1252 still in use?
Yes, but declining. Many legacy Windows applications, older Microsoft Office files, and non-UTF-8 western European web pages still use Windows-1252. It remains the default code page (CP1252) in some Windows APIs. When processing files without explicit encoding metadata, Windows-1252 is a reasonable fallback for western European content that fails UTF-8 validation.