Unicode Scripts: Mapping Writing Systems to Code Points

· 2 min read

A Unicode script is a collection of characters used to write one or more languages. Scripts are a higher-level abstraction than blocks: while a block is simply a range of code points, a script groups characters that belong to the same writing tradition regardless of where they fall in the code space.

Major Scripts in Unicode

Unicode defines around 170 scripts, from widely used ones like Latin (used for English, French, German, and hundreds of other languages) and Han (Chinese, Japanese, Korean) to rarely used historic scripts like Cuneiform and Linear B. Browse them all in our scripts reference.

One Script, Many Languages

The Latin script is a good example of a script serving many languages. Arabic letters, Hebrew letters, Cyrillic letters, and Devanagari all have their own scripts even though they serve similar roles in their respective languages. By grouping characters into scripts, Unicode enables better font selection, spell-checking, and text rendering.

Script vs Block: A Practical Difference

Consider the Latin script: its characters are spread across many blocks including Basic Latin, Latin Extended-A, Latin Extended-B, Latin Extended Additional, and several Supplement blocks. Conversely, the Letterlike Symbols block (U+2100–U+214F) contains characters assigned to multiple scripts. When building a font or performing script detection, the Script property is more reliable than the Block.

Script Detection in Practice

Each character in Unicode has a Script property. You can look up any character's script in our character browser. Script information is used by internationalization libraries, browsers, and operating systems to apply appropriate shaping engines and fonts. For bidirectional text (Arabic, Hebrew), the script helps determine text direction.

More Articles

View all articles