Windows-1252

Microsoft's extension of Latin-1. Assigns printable characters to the C1 control code range (0x80–0x9F), including the euro sign, smart quotes, and em dash. Extremely common in legacy Windows files and web pages.

IANA Name

windows-1252

Width

Fixed (1 byte)

Introduced

1985

Byte Structure

Windows-1252 uses fixed 1-byte encoding per character. Characters not in this encoding cannot be represented and must be replaced or transliterated.

When to Use Windows-1252

Windows-1252 is the encoding you'll encounter most often in legacy Windows files and mislabelled web pages. If you're reading a file that claims to be ISO-8859-1 but contains smart quotes, em dashes, or the euro sign, it's almost certainly Windows-1252. Use it only when consuming or producing content that must interoperate with legacy Windows applications.

Sample Characters in Windows-1252

The table below shows how a selection of characters are represented in Windows-1252. Bytes are shown in hexadecimal. Characters marked "not supported" cannot be encoded in Windows-1252 and would need to be replaced or transliterated when converting from Unicode.

Character	Codepoint	Name	Bytes (Hex)	Bytes (Decimal)	Supported
A	U+0041	LATIN CAPITAL LETTER A	41	65	Yes
a	U+0061	LATIN SMALL LETTER A	61	97	Yes
0	U+0030	DIGIT ZERO	30	48	Yes
$	U+0024	DOLLAR SIGN	24	36	Yes
£	U+00A3	POUND SIGN	A3	163	Yes
©	U+00A9	COPYRIGHT SIGN	A9	169	Yes
€	U+20AC	EURO SIGN	80	128	Yes
α	U+03B1	GREEK SMALL LETTER ALPHA	not supported		—
А	U+0410	CYRILLIC CAPITAL LETTER A	not supported		—
中	U+4E2D		not supported		—
あ	U+3042	HIRAGANA LETTER A	not supported		—
☺	U+263A	WHITE SMILING FACE	not supported		—

Working with Windows-1252 in Code

Every major language has built-in support for encoding conversion. The examples below show how to encode a string to Windows-1252 bytes and decode it back to a Unicode string. Always specify the encoding explicitly — never rely on system defaults, which vary by OS and locale.

Python

# Encode a string to windows-1252 bytes
text = "Hello, 世界"
encoded = text.encode("windows-1252")

# Decode bytes back to a string
decoded = encoded.decode("windows-1252")

PHP

// Convert to windows-1252
$bytes = mb_convert_encoding(
    "Hello, 世界",
    "windows-1252",
    "UTF-8"
);

// Convert back to UTF-8
$text = mb_convert_encoding(
    $bytes,
    "UTF-8",
    "windows-1252"
);

JavaScript

// Encode to windows-1252 bytes
const encoder = new TextEncoder(); // UTF-8
const bytes = encoder.encode("Hello, 世界");

// Decode bytes
const decoder = new TextDecoder("windows-1252");
const text = decoder.decode(bytes);

SQL (PostgreSQL)

-- Create a database with Windows-1252
CREATE DATABASE mydb
  ENCODING 'windows-1252'
  LC_COLLATE 'en_US.UTF-8';

-- Check database encoding
SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = current_database();

Compare with Other Encodings

See how Windows-1252 differs from other encodings — which characters each supports and how the byte representations compare.

Windows-1252 vs UTF-8 → Windows-1252 vs Latin-1 (ISO-8859-1) →

Windows-1252 FAQ

How is Windows-1252 different from Latin-1?

Windows-1252 extends Latin-1 by replacing the 27 C1 control codes in positions 0x80–0x9F with printable characters: smart quotes (\u201c\u201d \u2018\u2019), em dash (—), ellipsis (…), the euro sign (€), and others. The HTML specification requires browsers to interpret ISO-8859-1 as Windows-1252, so on the web they are effectively the same encoding.

Why do I see "â€œ" or similar garbage when reading Windows-1252 files?

This is mojibake — the result of interpreting Windows-1252 bytes as UTF-8. Windows-1252 bytes 0x80–0xFF are not valid UTF-8, so a UTF-8 decoder produces garbage or replacement characters. The fix is to specify the correct encoding when opening the file: open(file, encoding="windows-1252") in Python, for example.

Is Windows-1252 still in use?

Yes, but declining. Many legacy Windows applications, older Microsoft Office files, and non-UTF-8 western European web pages still use Windows-1252. It remains the default code page (CP1252) in some Windows APIs. When processing files without explicit encoding metadata, Windows-1252 is a reasonable fallback for western European content that fails UTF-8 validation.

← All Encodings Browse Characters →