Character Encodings

A character encoding is a mapping between numbers and characters. Different encodings cover different sets of characters and use different numbers of bytes. Here are all the encodings covered in this reference.

Encoding FAQ

Which encoding should I use?

Use UTF-8 for almost everything. It covers all Unicode characters, is ASCII-compatible, and is the default for HTML5, JSON, XML, and most modern databases and programming languages. The only common reason to choose a different encoding is compatibility with legacy systems.

What's the difference between a character set and an encoding?

A character set (like Unicode) is the abstract list of characters and their assigned numbers (codepoints). An encoding (like UTF-8) is the concrete rule for turning those numbers into bytes. Unicode is a character set; UTF-8, UTF-16, and UTF-32 are different encodings of Unicode.

Are UTF-8 and Unicode the same thing?

No. Unicode is a character standard that assigns a unique number to every character. UTF-8 is one way to encode those numbers as bytes. UTF-8 is by far the most popular encoding of Unicode, which is why they're often conflated, but they're distinct concepts.