Unicode Blocks: How the Standard Organizes 150,000+ Characters

· 2 min read

With over 149,000 assigned characters and space for more than a million, Unicode needs a clear organizational structure. One of the primary organizational units is the block — a named, contiguous range of code points allocated to a particular script, symbol set, or purpose.

What Is a Unicode Block?

A Unicode block is a range of code points with a name and a defined span. Blocks are fixed in size and position — they don't move as new characters are added. Within a block, only a subset of code points may be assigned; the rest are reserved for future use. You can browse all blocks in our Unicode blocks reference.

Notable Blocks

The Basic Latin block (U+0000–U+007F) is identical to ASCII and contains the characters used in almost all programming languages. Latin Extended-A and Latin Extended-B add accented characters for European languages. The CJK Unified Ideographs block (U+4E00–U+9FFF) contains the core set of Chinese, Japanese, and Korean characters — a result of the controversial Han Unification process.

More exotic blocks include Emoticons (U+1F600–U+1F64F), Musical Symbols (U+1D100–U+1D1FF), Ancient Egyptian Hieroglyphs (U+13000–U+1342F), and Linear B Syllabary (U+10000–U+1007F) for the ancient Mycenaean Greek writing system.

Blocks vs Scripts

Unicode blocks and scripts are related but distinct concepts. A block is a range of code points; a script is a writing system. Some scripts span multiple blocks (Latin uses several), and some blocks contain characters from multiple scripts (the Letterlike Symbols block includes characters used in math, not just letters). For filtering characters by writing system, scripts are often more useful than blocks.

Using Blocks for Discovery

Browsing by block is one of the best ways to discover unfamiliar characters. Our blocks page shows each block's range, assigned character count, and links to the characters within it. From there you can jump to any character's detail page to see its encoding in UTF-8, UTF-16, and other formats.

More Articles

View all articles