UTF-8: The Encoding That Took Over the Web
UTF-8 is now the dominant character encoding on the internet, used by over 98% of websites. But this dominance wasn't inevitable — it was the result of clever e...
In-depth guides on Unicode, character encoding, and developer tools.
UTF-8 is now the dominant character encoding on the internet, used by over 98% of websites. But this dominance wasn't inevitable — it was the result of clever e...
Before Unicode, the computing world was fragmented. Different countries and vendors used different character sets, and software that worked in one locale often...
Mojibake (文字化け) is a Japanese term meaning "character transformation" — the garbled text that appears when a file encoded in one character set is decoded us...
ASCII — the American Standard Code for Information Interchange — was published in 1963 and defines 128 characters using 7-bit codes (0–127). It's one of the mos...
HTML entities provide a way to include characters in HTML that would otherwise be interpreted as markup or that aren't easily typeable. They come in two forms:...
JavaScript strings are encoded internally using UTF-16, a legacy inherited from the days when Unicode was expected to fit within 65,536 characters. Understandin...
Latin-1 (ISO-8859-1) and Windows-1252 are so similar that many developers treat them as identical — and browsers historically did too. But there are real differ...
With over 149,000 assigned characters and space for more than a million, Unicode needs a clear organizational structure. One of the primary organizational units...
A Unicode script is a collection of characters used to write one or more languages. Scripts are a higher-level abstraction than blocks: while a block is simply...
The Byte Order Mark (BOM) is a special Unicode character — U+FEFF — that appears at the very beginning of a text file or stream. Its original purpose was to sig...