UTF-32 BE

Unicode

Fixed-width encoding using 4 bytes per character. Big-endian byte order. Rarely used in practice.

IANA Name

UTF-32BE

Width

Fixed (4 bytes)

BOM

00 00 FE FF

Introduced

2003

Byte Structure

UTF-32 is the simplest Unicode encoding: every character is stored as exactly 4 bytes. The codepoint value is stored directly as a 32-bit integer. This makes random access trivial but wastes memory for ASCII-heavy text.

When to Use UTF-32 BE

UTF-32 is rarely used for storage or transmission because it's memory-inefficient — every character costs 4 bytes regardless of how simple it is. Its main advantage is simplicity: random access to the Nth character is O(1) since every code unit is the same size. You'll see it in some Unix/Linux locale settings and in Python's internal string representation on some builds.

Sample Characters in UTF-32 BE

The table below shows how a selection of characters are represented in UTF-32 BE. Bytes are shown in hexadecimal. Characters marked "not supported" cannot be encoded in UTF-32 BE and would need to be replaced or transliterated when converting from Unicode.

Character	Codepoint	Name	Bytes (Hex)	Bytes (Decimal)	Supported
A	U+0041	LATIN CAPITAL LETTER A	00 00 00 41	0 0 0 65	Yes
a	U+0061	LATIN SMALL LETTER A	00 00 00 61	0 0 0 97	Yes
0	U+0030	DIGIT ZERO	00 00 00 30	0 0 0 48	Yes
$	U+0024	DOLLAR SIGN	00 00 00 24	0 0 0 36	Yes
£	U+00A3	POUND SIGN	00 00 00 A3	0 0 0 163	Yes
©	U+00A9	COPYRIGHT SIGN	00 00 00 A9	0 0 0 169	Yes
€	U+20AC	EURO SIGN	00 00 20 AC	0 0 32 172	Yes
α	U+03B1	GREEK SMALL LETTER ALPHA	00 00 03 B1	0 0 3 177	Yes
А	U+0410	CYRILLIC CAPITAL LETTER A	00 00 04 10	0 0 4 16	Yes
中	U+4E2D		00 00 4E 2D	0 0 78 45	Yes
あ	U+3042	HIRAGANA LETTER A	00 00 30 42	0 0 48 66	Yes
☺	U+263A	WHITE SMILING FACE	00 00 26 3A	0 0 38 58	Yes

Working with UTF-32 BE in Code

Every major language has built-in support for encoding conversion. The examples below show how to encode a string to UTF-32 BE bytes and decode it back to a Unicode string. Always specify the encoding explicitly — never rely on system defaults, which vary by OS and locale.

Python

# Encode a string to utf-32be bytes
text = "Hello, 世界"
encoded = text.encode("UTF-32BE")

# Decode bytes back to a string
decoded = encoded.decode("UTF-32BE")

PHP

// Convert to utf-32be
$bytes = mb_convert_encoding(
    "Hello, 世界",
    "UTF-32BE",
    "UTF-8"
);

// Convert back to UTF-8
$text = mb_convert_encoding(
    $bytes,
    "UTF-8",
    "UTF-32BE"
);

JavaScript

// Encode to UTF-8 bytes
const encoder = new TextEncoder(); // UTF-8
const bytes = encoder.encode("Hello, 世界");

// Decode bytes
const decoder = new TextDecoder("UTF-32BE");
const text = decoder.decode(bytes);

SQL (PostgreSQL)

-- Create a database with UTF-32 BE
CREATE DATABASE mydb
  ENCODING 'UTF-32BE'
  LC_COLLATE 'en_US.UTF-8';

-- Check database encoding
SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = current_database();

Compare with Other Encodings

See how UTF-32 BE differs from other encodings — which characters each supports and how the byte representations compare.

UTF-32 BE vs UTF-8 → UTF-32 BE vs UTF-32 LE → UTF-32 BE vs UTF-16 BE →

UTF-32 BE FAQ

Why is UTF-32 rarely used for storage?

UTF-32 uses exactly 4 bytes per character regardless of complexity. An ASCII character like "A" costs 4 bytes instead of 1 in UTF-8 — a 4× overhead for English text. Since most real-world text is ASCII or Latin-script heavy, UTF-32 is extremely space-inefficient. It is occasionally used internally in applications where constant-time random access to codepoints is valuable.

What is the advantage of UTF-32 over other Unicode encodings?

UTF-32's only practical advantage is simplicity: every codepoint occupies exactly 4 bytes, so random access to the Nth character is O(1) — just multiply N by 4. In UTF-8 and UTF-16 you must scan from the start because characters have variable width. Some text-processing libraries use UTF-32 as a working format for this reason, even if they store and transmit in UTF-8.

← All Encodings Browse Characters →