Unicode Text Encoder
Type or paste any text to see the byte representation in every major Unicode encoding. Results update live as you type. All computation happens in your browser — nothing is sent to a server.
How to read this table
- Each row represents one Unicode character (codepoint), not one byte.
- Bytes are shown as uppercase hexadecimal, space-separated. For example, E2 82 AC is the euro sign in UTF-8.
- UTF-8 uses 1–4 bytes per character. ASCII characters use 1 byte.
- UTF-16 uses 2 bytes for most characters, 4 for emoji and supplementary characters.
- UTF-32 always uses exactly 4 bytes per character.
Frequently Asked Questions
Why does the same character have different bytes in different encodings?
Different encodings use different rules to convert character numbers (codepoints) to bytes. UTF-8 uses a variable-length scheme where ASCII characters take 1 byte and others take 2–4. UTF-16 uses 2 bytes for most characters. UTF-32 always uses exactly 4 bytes. The bytes are different representations of the same underlying codepoint.
What does LE/BE mean in UTF-16 and UTF-32?
LE (Little-Endian) and BE (Big-Endian) describe the byte order for multi-byte values. Little-endian stores the least significant byte first (used by x86 processors and Windows). Big-endian stores the most significant byte first (used in network protocols). For the letter A (U+0041): UTF-16 LE is 41 00, UTF-16 BE is 00 41.
Why do emoji take 4 bytes in UTF-8 but only 2 in UTF-32?
Emoji have codepoints above U+FFFF (in Unicode's supplementary planes). UTF-8 requires 4 bytes for these high codepoints due to its variable-length encoding scheme. UTF-32 uses exactly 4 bytes for every character, so supplementary characters don't cost more. UTF-16 uses surrogate pairs (two 2-byte units) for supplementary characters, also totalling 4 bytes.