UTF-32 LE
UnicodeFixed-width encoding using 4 bytes per character. Simple to process but memory-inefficient. Little-endian byte order.
Byte Structure
UTF-32 is the simplest Unicode encoding: every character is stored as exactly 4 bytes. The codepoint value is stored directly as a 32-bit integer. This makes random access trivial but wastes memory for ASCII-heavy text.
When to Use UTF-32 LE
UTF-32 is rarely used for storage or transmission because it's memory-inefficient — every character costs 4 bytes regardless of how simple it is. Its main advantage is simplicity: random access to the Nth character is O(1) since every code unit is the same size. You'll see it in some Unix/Linux locale settings and in Python's internal string representation on some builds.
Sample Characters in UTF-32 LE
The table below shows how a selection of characters are represented in UTF-32 LE. Bytes are shown in hexadecimal. Characters marked "not supported" cannot be encoded in UTF-32 LE and would need to be replaced or transliterated when converting from Unicode.
| Character | Codepoint | Name | Bytes (Hex) | Bytes (Decimal) | Supported |
|---|---|---|---|---|---|
| A | U+0041 | LATIN CAPITAL LETTER A | 41 00 00 00 | 65 0 0 0 | Yes |
| a | U+0061 | LATIN SMALL LETTER A | 61 00 00 00 | 97 0 0 0 | Yes |
| 0 | U+0030 | DIGIT ZERO | 30 00 00 00 | 48 0 0 0 | Yes |
| $ | U+0024 | DOLLAR SIGN | 24 00 00 00 | 36 0 0 0 | Yes |
| £ | U+00A3 | POUND SIGN | A3 00 00 00 | 163 0 0 0 | Yes |
| © | U+00A9 | COPYRIGHT SIGN | A9 00 00 00 | 169 0 0 0 | Yes |
| € | U+20AC | EURO SIGN | AC 20 00 00 | 172 32 0 0 | Yes |
| α | U+03B1 | GREEK SMALL LETTER ALPHA | B1 03 00 00 | 177 3 0 0 | Yes |
| А | U+0410 | CYRILLIC CAPITAL LETTER A | 10 04 00 00 | 16 4 0 0 | Yes |
| 中 | U+4E2D | 2D 4E 00 00 | 45 78 0 0 | Yes | |
| あ | U+3042 | HIRAGANA LETTER A | 42 30 00 00 | 66 48 0 0 | Yes |
| ☺ | U+263A | WHITE SMILING FACE | 3A 26 00 00 | 58 38 0 0 | Yes |
Working with UTF-32 LE in Code
Every major language has built-in support for encoding conversion. The examples below show how to encode a string to UTF-32 LE bytes and decode it back to a Unicode string. Always specify the encoding explicitly — never rely on system defaults, which vary by OS and locale.
# Encode a string to utf-32le bytes
text = "Hello, 世界"
encoded = text.encode("UTF-32LE")
# Decode bytes back to a string
decoded = encoded.decode("UTF-32LE")
// Convert to utf-32le
$bytes = mb_convert_encoding(
"Hello, 世界",
"UTF-32LE",
"UTF-8"
);
// Convert back to UTF-8
$text = mb_convert_encoding(
$bytes,
"UTF-8",
"UTF-32LE"
);
// Encode to UTF-8 bytes
const encoder = new TextEncoder(); // UTF-8
const bytes = encoder.encode("Hello, 世界");
// Decode bytes
const decoder = new TextDecoder("UTF-32LE");
const text = decoder.decode(bytes);
-- Create a database with UTF-32 LE
CREATE DATABASE mydb
ENCODING 'UTF-32LE'
LC_COLLATE 'en_US.UTF-8';
-- Check database encoding
SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = current_database();
Compare with Other Encodings
See how UTF-32 LE differs from other encodings — which characters each supports and how the byte representations compare.
UTF-32 LE FAQ
Why is UTF-32 rarely used for storage?
UTF-32 uses exactly 4 bytes per character regardless of complexity. An ASCII character like "A" costs 4 bytes instead of 1 in UTF-8 — a 4× overhead for English text. Since most real-world text is ASCII or Latin-script heavy, UTF-32 is extremely space-inefficient. It is occasionally used internally in applications where constant-time random access to codepoints is valuable.
What is the advantage of UTF-32 over other Unicode encodings?
UTF-32's only practical advantage is simplicity: every codepoint occupies exactly 4 bytes, so random access to the Nth character is O(1) — just multiply N by 4. In UTF-8 and UTF-16 you must scan from the start because characters have variable width. Some text-processing libraries use UTF-32 as a working format for this reason, even if they store and transmit in UTF-8.