Windows-1252

Microsoft's extension of Latin-1. Assigns printable characters to the C1 control code range (0x80–0x9F), including the euro sign, smart quotes, and em dash. Extremely common in legacy Windows files and web pages.

windows-1252
Fixed (1 byte)
1985

Byte Structure

Windows-1252 uses fixed 1-byte encoding per character. Characters not in this encoding cannot be represented and must be replaced or transliterated.

When to Use Windows-1252

Windows-1252 is the encoding you'll encounter most often in legacy Windows files and mislabelled web pages. If you're reading a file that claims to be ISO-8859-1 but contains smart quotes, em dashes, or the euro sign, it's almost certainly Windows-1252. Use it only when consuming or producing content that must interoperate with legacy Windows applications.

Sample Characters in Windows-1252

The table below shows how a selection of characters are represented in Windows-1252. Bytes are shown in hexadecimal. Characters marked "not supported" cannot be encoded in Windows-1252 and would need to be replaced or transliterated when converting from Unicode.

Character Codepoint Name Bytes (Hex) Bytes (Decimal) Supported
A U+0041 LATIN CAPITAL LETTER A 41 65 Yes
a U+0061 LATIN SMALL LETTER A 61 97 Yes
0 U+0030 DIGIT ZERO 30 48 Yes
$ U+0024 DOLLAR SIGN 24 36 Yes
£ U+00A3 POUND SIGN A3 163 Yes
© U+00A9 COPYRIGHT SIGN A9 169 Yes
U+20AC EURO SIGN 80 128 Yes
α U+03B1 GREEK SMALL LETTER ALPHA not supported
А U+0410 CYRILLIC CAPITAL LETTER A not supported
U+4E2D not supported
U+3042 HIRAGANA LETTER A not supported
U+263A WHITE SMILING FACE not supported

Working with Windows-1252 in Code

Every major language has built-in support for encoding conversion. The examples below show how to encode a string to Windows-1252 bytes and decode it back to a Unicode string. Always specify the encoding explicitly — never rely on system defaults, which vary by OS and locale.

# Encode a string to windows-1252 bytes
text = "Hello, 世界"
encoded = text.encode("windows-1252")

# Decode bytes back to a string
decoded = encoded.decode("windows-1252")
// Convert to windows-1252
$bytes = mb_convert_encoding(
    "Hello, 世界",
    "windows-1252",
    "UTF-8"
);

// Convert back to UTF-8
$text = mb_convert_encoding(
    $bytes,
    "UTF-8",
    "windows-1252"
);
// Encode to windows-1252 bytes
const encoder = new TextEncoder(); // UTF-8
const bytes = encoder.encode("Hello, 世界");

// Decode bytes
const decoder = new TextDecoder("windows-1252");
const text = decoder.decode(bytes);
-- Create a database with Windows-1252
CREATE DATABASE mydb
  ENCODING 'windows-1252'
  LC_COLLATE 'en_US.UTF-8';

-- Check database encoding
SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = current_database();

Compare with Other Encodings

See how Windows-1252 differs from other encodings — which characters each supports and how the byte representations compare.

Windows-1252 FAQ

How is Windows-1252 different from Latin-1?

Windows-1252 extends Latin-1 by replacing the 27 C1 control codes in positions 0x80–0x9F with printable characters: smart quotes (\u201c\u201d \u2018\u2019), em dash (—), ellipsis (…), the euro sign (€), and others. The HTML specification requires browsers to interpret ISO-8859-1 as Windows-1252, so on the web they are effectively the same encoding.

Why do I see "“" or similar garbage when reading Windows-1252 files?

This is mojibake — the result of interpreting Windows-1252 bytes as UTF-8. Windows-1252 bytes 0x80–0xFF are not valid UTF-8, so a UTF-8 decoder produces garbage or replacement characters. The fix is to specify the correct encoding when opening the file: open(file, encoding="windows-1252") in Python, for example.

Is Windows-1252 still in use?

Yes, but declining. Many legacy Windows applications, older Microsoft Office files, and non-UTF-8 western European web pages still use Windows-1252. It remains the default code page (CP1252) in some Windows APIs. When processing files without explicit encoding metadata, Windows-1252 is a reasonable fallback for western European content that fails UTF-8 validation.