Big5

Traditional Chinese encoding used in Taiwan, Hong Kong, and Macau. Variable-width: single-byte for ASCII, double-byte for Chinese. The name refers to the five major Taiwanese IT companies that developed it.

Big5
Variable (1–2 bytes)
1984

Byte Structure

Big5 uses variable-width encoding (1–2 bytes per character). Characters not in this encoding cannot be represented and must be replaced or transliterated.

When to Use Big5

Big5 is the standard encoding for Traditional Chinese, used in Taiwan, Hong Kong, and Macau. You'll need it when processing Traditional Chinese content from legacy sources — older Taiwanese web pages, legacy desktop software, or historical documents. New systems should use UTF-8.

Sample Characters in Big5

The table below shows how a selection of characters are represented in Big5. Bytes are shown in hexadecimal. Characters marked "not supported" cannot be encoded in Big5 and would need to be replaced or transliterated when converting from Unicode.

Character Codepoint Name Bytes (Hex) Bytes (Decimal) Supported
A U+0041 LATIN CAPITAL LETTER A 41 65 Yes
a U+0061 LATIN SMALL LETTER A 61 97 Yes
0 U+0030 DIGIT ZERO 30 48 Yes
$ U+0024 DOLLAR SIGN 24 36 Yes
£ U+00A3 POUND SIGN A2 47 162 71 Yes
© U+00A9 COPYRIGHT SIGN not supported
U+20AC EURO SIGN not supported
α U+03B1 GREEK SMALL LETTER ALPHA A3 5C 163 92 Yes
А U+0410 CYRILLIC CAPITAL LETTER A not supported
U+4E2D A4 A4 164 164 Yes
U+3042 HIRAGANA LETTER A C6 A6 198 166 Yes
U+263A WHITE SMILING FACE not supported

Working with Big5 in Code

Every major language has built-in support for encoding conversion. The examples below show how to encode a string to Big5 bytes and decode it back to a Unicode string. Always specify the encoding explicitly — never rely on system defaults, which vary by OS and locale.

# Encode a string to big5 bytes
text = "Hello, 世界"
encoded = text.encode("Big5")

# Decode bytes back to a string
decoded = encoded.decode("Big5")
// Convert to big5
$bytes = mb_convert_encoding(
    "Hello, 世界",
    "Big5",
    "UTF-8"
);

// Convert back to UTF-8
$text = mb_convert_encoding(
    $bytes,
    "UTF-8",
    "Big5"
);
// Encode to Big5 bytes
const encoder = new TextEncoder(); // UTF-8
const bytes = encoder.encode("Hello, 世界");

// Decode bytes
const decoder = new TextDecoder("Big5");
const text = decoder.decode(bytes);
-- Create a database with Big5
CREATE DATABASE mydb
  ENCODING 'Big5'
  LC_COLLATE 'en_US.UTF-8';

-- Check database encoding
SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = current_database();

Compare with Other Encodings

See how Big5 differs from other encodings — which characters each supports and how the byte representations compare.

Big5 FAQ

What is the difference between Big5 and Big5-HKSCS?

Big5 is the original Traditional Chinese encoding from Taiwan covering about 13,000 characters. Big5-HKSCS (Hong Kong Supplementary Character Set) adds thousands of characters used in Hong Kong, including Cantonese characters and characters specific to Hong Kong government, legal, and business use. For Hong Kong content, Big5-HKSCS is the correct variant.

Is Big5 still in use?

Big5 remains in legacy Taiwanese government documents, older Taiwanese and Hong Kong websites, and some Traditional Chinese desktop software. Modern Taiwanese web content has largely migrated to UTF-8. When processing Traditional Chinese legacy data, always detect or specify the encoding — Big5 and UTF-8 can be ambiguous in some byte ranges.