Articles — Unicode and Character Encoding Guides

Jun 2026

The Unicode Private Use Area: Custom Characters for Special Needs

Unicode reserves certain ranges of code points as the Private Use Area (PUA) — regions where organizations and individuals can assign their own characters witho...

04

Jun 2026

Combining Characters: Building Complex Glyphs from Simple Parts

Combining characters are Unicode code points that attach to the preceding base character to modify its appearance. Rather than encoding every possible letter +...

28

May 2026

Bidirectional Text: How Unicode Handles Arabic and Hebrew

Most writing systems read left-to-right, but Arabic, Hebrew, Persian, and several other scripts read right-to-left. Text that mixes both directions — such as an...

21

May 2026

How Emoji Are Encoded in Unicode

Emoji may seem simple — colorful pictographs you tap on a phone — but their encoding in Unicode is surprisingly complex. Many emoji are single code points; othe...

14

May 2026

Zero-Width Characters: Invisible but Surprisingly Powerful

Zero-width characters are Unicode code points that take up no horizontal space when rendered. They're invisible in most contexts yet can have significant effect...

07

May 2026

Unicode Normalization: NFC, NFD, NFKC, and NFKD Explained

Unicode allows some characters to be represented in multiple equivalent ways. The letter é, for instance, can be encoded as a single precomposed character (U+00...

30

Apr 2026

Shift-JIS and the History of Japanese Text Encoding

Shift-JIS is a variable-width encoding for Japanese text developed by Microsoft and ASCII Corporation in 1982. It was designed to encode the tens of thousands o...

23

Apr 2026

UTF-32: When Fixed-Width Character Encoding Makes Sense

UTF-32 encodes every Unicode character in exactly 4 bytes — no variable-width complexity, no surrogate pairs, no multi-byte sequences. This simplicity comes at...

08

Apr 2026

UTF-8: The Encoding That Took Over the Web

UTF-8 is now the dominant character encoding on the internet, used by over 98% of websites. But this dominance wasn't inevitable — it was the result of clever e...

01

Apr 2026

What is Unicode? A Developer's Introduction

Before Unicode, the computing world was fragmented. Different countries and vendors used different character sets, and software that worked in one locale often...

25

Mar 2026

Understanding Mojibake: When Text Goes Wrong

Mojibake (文字化け) is a Japanese term meaning "character transformation" — the garbled text that appears when a file encoded in one character set is decoded us...

18

Mar 2026

The ASCII Table: 128 Characters That Built the Digital World

ASCII — the American Standard Code for Information Interchange — was published in 1963 and defines 128 characters using 7-bit codes (0–127). It's one of the mos...