Big5
Traditional Chinese encoding used in Taiwan, Hong Kong, and Macau. Variable-width: single-byte for ASCII, double-byte for Chinese. The name refers to the five major Taiwanese IT companies that developed it.
Byte Structure
Big5 uses variable-width encoding (1–2 bytes per character). Characters not in this encoding cannot be represented and must be replaced or transliterated.
When to Use Big5
Big5 is the standard encoding for Traditional Chinese, used in Taiwan, Hong Kong, and Macau. You'll need it when processing Traditional Chinese content from legacy sources — older Taiwanese web pages, legacy desktop software, or historical documents. New systems should use UTF-8.
Sample Characters in Big5
The table below shows how a selection of characters are represented in Big5. Bytes are shown in hexadecimal. Characters marked "not supported" cannot be encoded in Big5 and would need to be replaced or transliterated when converting from Unicode.
| Character | Codepoint | Name | Bytes (Hex) | Bytes (Decimal) | Supported |
|---|---|---|---|---|---|
| A | U+0041 | LATIN CAPITAL LETTER A | 41 | 65 | Yes |
| a | U+0061 | LATIN SMALL LETTER A | 61 | 97 | Yes |
| 0 | U+0030 | DIGIT ZERO | 30 | 48 | Yes |
| $ | U+0024 | DOLLAR SIGN | 24 | 36 | Yes |
| £ | U+00A3 | POUND SIGN | A2 47 | 162 71 | Yes |
| © | U+00A9 | COPYRIGHT SIGN | not supported | — | |
| € | U+20AC | EURO SIGN | not supported | — | |
| α | U+03B1 | GREEK SMALL LETTER ALPHA | A3 5C | 163 92 | Yes |
| А | U+0410 | CYRILLIC CAPITAL LETTER A | not supported | — | |
| 中 | U+4E2D | A4 A4 | 164 164 | Yes | |
| あ | U+3042 | HIRAGANA LETTER A | C6 A6 | 198 166 | Yes |
| ☺ | U+263A | WHITE SMILING FACE | not supported | — |
Working with Big5 in Code
Every major language has built-in support for encoding conversion. The examples below show how to encode a string to Big5 bytes and decode it back to a Unicode string. Always specify the encoding explicitly — never rely on system defaults, which vary by OS and locale.
# Encode a string to big5 bytes
text = "Hello, 世界"
encoded = text.encode("Big5")
# Decode bytes back to a string
decoded = encoded.decode("Big5")
// Convert to big5
$bytes = mb_convert_encoding(
"Hello, 世界",
"Big5",
"UTF-8"
);
// Convert back to UTF-8
$text = mb_convert_encoding(
$bytes,
"UTF-8",
"Big5"
);
// Encode to Big5 bytes
const encoder = new TextEncoder(); // UTF-8
const bytes = encoder.encode("Hello, 世界");
// Decode bytes
const decoder = new TextDecoder("Big5");
const text = decoder.decode(bytes);
-- Create a database with Big5
CREATE DATABASE mydb
ENCODING 'Big5'
LC_COLLATE 'en_US.UTF-8';
-- Check database encoding
SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = current_database();
Compare with Other Encodings
See how Big5 differs from other encodings — which characters each supports and how the byte representations compare.
Big5 FAQ
What is the difference between Big5 and Big5-HKSCS?
Big5 is the original Traditional Chinese encoding from Taiwan covering about 13,000 characters. Big5-HKSCS (Hong Kong Supplementary Character Set) adds thousands of characters used in Hong Kong, including Cantonese characters and characters specific to Hong Kong government, legal, and business use. For Hong Kong content, Big5-HKSCS is the correct variant.
Is Big5 still in use?
Big5 remains in legacy Taiwanese government documents, older Taiwanese and Hong Kong websites, and some Traditional Chinese desktop software. Modern Taiwanese web content has largely migrated to UTF-8. When processing Traditional Chinese legacy data, always detect or specify the encoding — Big5 and UTF-8 can be ambiguous in some byte ranges.