Big5

Traditional Chinese encoding used in Taiwan, Hong Kong, and Macau. Variable-width: single-byte for ASCII, double-byte for Chinese. The name refers to the five major Taiwanese IT companies that developed it.

IANA Name

Big5

Width

Variable (1–2 bytes)

Introduced

1984

Byte Structure

Big5 uses variable-width encoding (1–2 bytes per character). Characters not in this encoding cannot be represented and must be replaced or transliterated.

When to Use Big5

Big5 is the standard encoding for Traditional Chinese, used in Taiwan, Hong Kong, and Macau. You'll need it when processing Traditional Chinese content from legacy sources — older Taiwanese web pages, legacy desktop software, or historical documents. New systems should use UTF-8.

Sample Characters in Big5

The table below shows how a selection of characters are represented in Big5. Bytes are shown in hexadecimal. Characters marked "not supported" cannot be encoded in Big5 and would need to be replaced or transliterated when converting from Unicode.

Character	Codepoint	Name	Bytes (Hex)	Bytes (Decimal)	Supported
A	U+0041	LATIN CAPITAL LETTER A	41	65	Yes
a	U+0061	LATIN SMALL LETTER A	61	97	Yes
0	U+0030	DIGIT ZERO	30	48	Yes
$	U+0024	DOLLAR SIGN	24	36	Yes
£	U+00A3	POUND SIGN	A2 47	162 71	Yes
©	U+00A9	COPYRIGHT SIGN	not supported		—
€	U+20AC	EURO SIGN	not supported		—
α	U+03B1	GREEK SMALL LETTER ALPHA	A3 5C	163 92	Yes
А	U+0410	CYRILLIC CAPITAL LETTER A	not supported		—
中	U+4E2D		A4 A4	164 164	Yes
あ	U+3042	HIRAGANA LETTER A	C6 A6	198 166	Yes
☺	U+263A	WHITE SMILING FACE	not supported		—

Working with Big5 in Code

Every major language has built-in support for encoding conversion. The examples below show how to encode a string to Big5 bytes and decode it back to a Unicode string. Always specify the encoding explicitly — never rely on system defaults, which vary by OS and locale.

Python

# Encode a string to big5 bytes
text = "Hello, 世界"
encoded = text.encode("Big5")

# Decode bytes back to a string
decoded = encoded.decode("Big5")

PHP

// Convert to big5
$bytes = mb_convert_encoding(
    "Hello, 世界",
    "Big5",
    "UTF-8"
);

// Convert back to UTF-8
$text = mb_convert_encoding(
    $bytes,
    "UTF-8",
    "Big5"
);

JavaScript

// Encode to Big5 bytes
const encoder = new TextEncoder(); // UTF-8
const bytes = encoder.encode("Hello, 世界");

// Decode bytes
const decoder = new TextDecoder("Big5");
const text = decoder.decode(bytes);

SQL (PostgreSQL)

-- Create a database with Big5
CREATE DATABASE mydb
  ENCODING 'Big5'
  LC_COLLATE 'en_US.UTF-8';

-- Check database encoding
SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = current_database();

Compare with Other Encodings

See how Big5 differs from other encodings — which characters each supports and how the byte representations compare.

Big5 vs UTF-8 → Big5 vs GBK →

Big5 FAQ

What is the difference between Big5 and Big5-HKSCS?

Big5 is the original Traditional Chinese encoding from Taiwan covering about 13,000 characters. Big5-HKSCS (Hong Kong Supplementary Character Set) adds thousands of characters used in Hong Kong, including Cantonese characters and characters specific to Hong Kong government, legal, and business use. For Hong Kong content, Big5-HKSCS is the correct variant.

Is Big5 still in use?

Big5 remains in legacy Taiwanese government documents, older Taiwanese and Hong Kong websites, and some Traditional Chinese desktop software. Modern Taiwanese web content has largely migrated to UTF-8. When processing Traditional Chinese legacy data, always detect or specify the encoding — Big5 and UTF-8 can be ambiguous in some byte ranges.

← All Encodings Browse Characters →