Shift-JIS and the History of Japanese Text Encoding

· 2 min read

Shift-JIS is a variable-width encoding for Japanese text developed by Microsoft and ASCII Corporation in 1982. It was designed to encode the tens of thousands of kanji characters used in Japanese while remaining compatible with single-byte ASCII — a significant engineering challenge.

The Encoding Challenge for Japanese

Japanese uses three writing systems simultaneously: hiragana (~46 syllabic characters), katakana (~46 more), and kanji (thousands of Chinese-derived ideographs). No single-byte encoding can cover this range. Shift-JIS uses a clever scheme: single bytes in the range 0x00–0x7F encode ASCII characters; bytes in certain ranges (0x81–0x9F, 0xE0–0xFC) are first bytes of two-byte kanji sequences; and bytes 0xA1–0xDF encode half-width katakana.

The "Shift" in Shift-JIS

The name comes from the fact that the two-byte range is "shifted" away from the ASCII range to avoid ambiguity. Earlier encodings like JIS X 0208 used escape sequences to switch between character sets; Shift-JIS eliminated the need for mode switching, making it more practical for operating systems and applications.

Shift-JIS and the Web

Shift-JIS was the dominant encoding for Japanese websites through the 2000s. Transitioning to UTF-8 was sometimes difficult because Shift-JIS byte sequences can contain bytes that happen to be valid ASCII characters in unexpected positions. A famous example: the yen sign byte 0x5C, which in ASCII is a backslash, caused path separator bugs in Japanese software for years.

Today

Modern Japanese websites overwhelmingly use UTF-8. Shift-JIS remains important for legacy systems, game ROMs, embedded devices, and older Japanese software. You can compare Shift-JIS byte sequences with Unicode representations for any character in our character browser.

More Articles

View all articles