Unicode Normalization: NFC, NFD, NFKC, and NFKD Explained

· 2 min read

Unicode allows some characters to be represented in multiple equivalent ways. The letter é, for instance, can be encoded as a single precomposed character (U+00E9) or as the base letter e (U+0065) followed by a combining acute accent (U+0301). These representations look identical but have different byte sequences. Normalization transforms text into a canonical form to eliminate such ambiguities.

The Four Normal Forms

Unicode defines four normalization forms. NFD (Canonical Decomposition) decomposes precomposed characters into their base + combining character sequences. NFC (Canonical Decomposition followed by Canonical Composition) decomposes and then recomposes, producing the shortest canonical form — this is the most common choice for storage and interchange. NFKD and NFKC additionally perform compatibility decompositions, converting characters like the ligature to fi and the superscript ² to 2.

Why Normalization Matters

Without normalization, string comparison fails for equivalent strings. Searching for café in a database might miss documents containing the decomposed form. File systems on macOS (HFS+) use NFD; Windows and most Linux systems use NFC. Cross-platform file operations can produce surprising mismatches when the same filename looks identical but has different byte sequences.

Combining Characters

Normalization is closely related to combining characters — code points that modify the preceding base character. Unicode defines over 800 combining characters covering diacritics, mathematical notation, and phonetic symbols. Browse combining characters in our character browser by filtering for the "Mark" category.

Practical Advice

For most applications, normalize to NFC on input and compare in NFC. Use NFKC when you want to treat visually similar characters as equivalent (e.g., for search). Avoid storing unnormalized text in databases — use your language's normalization library to normalize on the way in. Each character's decomposition mapping is visible in our character browser.

More Articles

View all articles