GBK

Chinese national standard encoding for Simplified Chinese. Superset of GB2312. Variable-width: single-byte for ASCII, double-byte for Chinese characters. Dominant encoding for Simplified Chinese on Windows.

IANA Name

GBK

Width

Variable (1–2 bytes)

Introduced

1993

Byte Structure

GBK uses variable-width encoding (1–2 bytes per character). Characters not in this encoding cannot be represented and must be replaced or transliterated.

When to Use GBK

GBK is the standard encoding for Simplified Chinese on Windows and in legacy Chinese software. It's required when producing or consuming Simplified Chinese content for older Windows applications, legacy databases, or older Chinese websites. New systems targeting Chinese users should use UTF-8 or GB18030.

Sample Characters in GBK

The table below shows how a selection of characters are represented in GBK. Bytes are shown in hexadecimal. Characters marked "not supported" cannot be encoded in GBK and would need to be replaced or transliterated when converting from Unicode.

Character	Codepoint	Name	Bytes (Hex)	Bytes (Decimal)	Supported
A	U+0041	LATIN CAPITAL LETTER A	41	65	Yes
a	U+0061	LATIN SMALL LETTER A	61	97	Yes
0	U+0030	DIGIT ZERO	30	48	Yes
$	U+0024	DOLLAR SIGN	24	36	Yes
£	U+00A3	POUND SIGN	not supported		—
©	U+00A9	COPYRIGHT SIGN	not supported		—
€	U+20AC	EURO SIGN	not supported		—
α	U+03B1	GREEK SMALL LETTER ALPHA	A6 C1	166 193	Yes
А	U+0410	CYRILLIC CAPITAL LETTER A	A7 A1	167 161	Yes
中	U+4E2D		D6 D0	214 208	Yes
あ	U+3042	HIRAGANA LETTER A	A4 A2	164 162	Yes
☺	U+263A	WHITE SMILING FACE	not supported		—

Working with GBK in Code

Every major language has built-in support for encoding conversion. The examples below show how to encode a string to GBK bytes and decode it back to a Unicode string. Always specify the encoding explicitly — never rely on system defaults, which vary by OS and locale.

Python

# Encode a string to gbk bytes
text = "Hello, 世界"
encoded = text.encode("GBK")

# Decode bytes back to a string
decoded = encoded.decode("GBK")

PHP

// Convert to gbk
$bytes = mb_convert_encoding(
    "Hello, 世界",
    "GBK",
    "UTF-8"
);

// Convert back to UTF-8
$text = mb_convert_encoding(
    $bytes,
    "UTF-8",
    "GBK"
);

JavaScript

// Encode to GBK bytes
const encoder = new TextEncoder(); // UTF-8
const bytes = encoder.encode("Hello, 世界");

// Decode bytes
const decoder = new TextDecoder("GBK");
const text = decoder.decode(bytes);

SQL (PostgreSQL)

-- Create a database with GBK
CREATE DATABASE mydb
  ENCODING 'GBK'
  LC_COLLATE 'en_US.UTF-8';

-- Check database encoding
SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = current_database();

Compare with Other Encodings

See how GBK differs from other encodings — which characters each supports and how the byte representations compare.

GBK vs UTF-8 → GBK vs Big5 →

GBK FAQ

What is the difference between GBK and GB18030?

GBK is a Microsoft extension of GB2312, covering most Simplified Chinese characters used in Windows applications. GB18030 is a mandatory Chinese national standard that extends GBK to cover all Unicode codepoints, including minority-language scripts and emoji. GB18030 is a strict superset of GBK. New software for China should use UTF-8 or GB18030; GBK is for legacy compatibility only.

Is GBK the same as GB2312?

No. GB2312 (1981) is the original Simplified Chinese standard covering about 7,445 characters. GBK extends it with over 20,000 additional characters and a broader range of lead bytes. Both use double-byte encoding for Chinese characters, but GBK covers significantly more of the Unicode CJK Unified Ideographs range.

← All Encodings Browse Characters →