Combining Characters: Building Complex Glyphs from Simple Parts

· 2 min read

Combining characters are Unicode code points that attach to the preceding base character to modify its appearance. Rather than encoding every possible letter + diacritic combination as a unique code point, Unicode allows accents, dots, tildes, and other marks to be represented as separate combining characters that stack onto any base letter.

How Combining Characters Work

A combining character has zero advance width — it doesn't move the cursor forward. Instead, it positions itself relative to the immediately preceding base character. The combining acute accent (U+0301) stacks above the preceding letter; the combining cedilla (U+0327) attaches below. Text renderers use each character's combining class and positioning information to place these marks correctly.

Precomposed vs Decomposed Forms

Many common accented letters exist in Unicode as both a precomposed single code point (e.g., é as U+00E9) and a decomposed sequence (e + U+0301). These are canonically equivalent — they represent the same character. Unicode normalization can convert between these forms. NFC prefers precomposed forms; NFD decomposes everything.

Combining Diacritical Marks Block

The primary block for combining diacritical marks spans U+0300–U+036F. It contains 112 combining characters covering virtually every accent used in European languages and phonetic transcription. Additional combining characters appear in blocks for specific scripts and in the Combining Diacritical Marks Supplement (U+1DC0–U+1DFF). Browse them in our Unicode blocks reference.

Zalgo Text

The "Zalgo" text effect — where characters appear surrounded by a halo of stacked diacritics — exploits the fact that Unicode allows an arbitrary number of combining characters on a single base character. Strings like h̴̡̢͕̻̗̱e̞͎̫̪̹̮̩̊͋̈́̇ͅl̡̛͔̝͇̘̗͛l̩̲͕̮̲͛o̱̩̟͕̲̣͆͐͌ use dozens of combining characters per letter. This is valid Unicode but can break text layout and overflow fixed-size buffers in naive text processing code.

More Articles

View all articles