Unicode Text Encoder

Type or paste any text to see the byte representation in every major Unicode encoding. Results update live as you type. All computation happens in your browser — nothing is sent to a server.

Enter text above to see encodings.

How to read this table

  • Each row represents one Unicode character (codepoint), not one byte.
  • Bytes are shown as uppercase hexadecimal, space-separated. For example, E2 82 AC is the euro sign in UTF-8.
  • UTF-8 uses 1–4 bytes per character. ASCII characters use 1 byte.
  • UTF-16 uses 2 bytes for most characters, 4 for emoji and supplementary characters.
  • UTF-32 always uses exactly 4 bytes per character.

Encoding links

Frequently Asked Questions

Why does the same character have different bytes in different encodings?

Different encodings use different rules to convert character numbers (codepoints) to bytes. UTF-8 uses a variable-length scheme where ASCII characters take 1 byte and others take 2–4. UTF-16 uses 2 bytes for most characters. UTF-32 always uses exactly 4 bytes. The bytes are different representations of the same underlying codepoint.

What does LE/BE mean in UTF-16 and UTF-32?

LE (Little-Endian) and BE (Big-Endian) describe the byte order for multi-byte values. Little-endian stores the least significant byte first (used by x86 processors and Windows). Big-endian stores the most significant byte first (used in network protocols). For the letter A (U+0041): UTF-16 LE is 41 00, UTF-16 BE is 00 41.

Why do emoji take 4 bytes in UTF-8 but only 2 in UTF-32?

Emoji have codepoints above U+FFFF (in Unicode's supplementary planes). UTF-8 requires 4 bytes for these high codepoints due to its variable-length encoding scheme. UTF-32 uses exactly 4 bytes for every character, so supplementary characters don't cost more. UTF-16 uses surrogate pairs (two 2-byte units) for supplementary characters, also totalling 4 bytes.