EUC-JP
Extended Unix Code for Japanese. Variable-width encoding common in Unix/Linux Japanese environments and older web pages.
Byte Structure
EUC-JP uses variable-width encoding (1–3 bytes per character). Characters not in this encoding cannot be represented and must be replaced or transliterated.
When to Use EUC-JP
EUC-JP was the dominant encoding for Japanese on Unix/Linux systems and older web servers. You'll encounter it when processing legacy Japanese Unix content, emails, or Usenet archives. Modern Japanese applications use UTF-8.
Sample Characters in EUC-JP
The table below shows how a selection of characters are represented in EUC-JP. Bytes are shown in hexadecimal. Characters marked "not supported" cannot be encoded in EUC-JP and would need to be replaced or transliterated when converting from Unicode.
| Character | Codepoint | Name | Bytes (Hex) | Bytes (Decimal) | Supported |
|---|---|---|---|---|---|
| A | U+0041 | LATIN CAPITAL LETTER A | 41 | 65 | Yes |
| a | U+0061 | LATIN SMALL LETTER A | 61 | 97 | Yes |
| 0 | U+0030 | DIGIT ZERO | 30 | 48 | Yes |
| $ | U+0024 | DOLLAR SIGN | 24 | 36 | Yes |
| £ | U+00A3 | POUND SIGN | A1 F2 | 161 242 | Yes |
| © | U+00A9 | COPYRIGHT SIGN | 8F A2 ED | 143 162 237 | Yes |
| € | U+20AC | EURO SIGN | not supported | — | |
| α | U+03B1 | GREEK SMALL LETTER ALPHA | A6 C1 | 166 193 | Yes |
| А | U+0410 | CYRILLIC CAPITAL LETTER A | A7 A1 | 167 161 | Yes |
| 中 | U+4E2D | C3 E6 | 195 230 | Yes | |
| あ | U+3042 | HIRAGANA LETTER A | A4 A2 | 164 162 | Yes |
| ☺ | U+263A | WHITE SMILING FACE | not supported | — |
Working with EUC-JP in Code
Every major language has built-in support for encoding conversion. The examples below show how to encode a string to EUC-JP bytes and decode it back to a Unicode string. Always specify the encoding explicitly — never rely on system defaults, which vary by OS and locale.
# Encode a string to euc-jp bytes
text = "Hello, 世界"
encoded = text.encode("EUC-JP")
# Decode bytes back to a string
decoded = encoded.decode("EUC-JP")
// Convert to euc-jp
$bytes = mb_convert_encoding(
"Hello, 世界",
"EUC-JP",
"UTF-8"
);
// Convert back to UTF-8
$text = mb_convert_encoding(
$bytes,
"UTF-8",
"EUC-JP"
);
// Encode to EUC-JP bytes
const encoder = new TextEncoder(); // UTF-8
const bytes = encoder.encode("Hello, 世界");
// Decode bytes
const decoder = new TextDecoder("EUC-JP");
const text = decoder.decode(bytes);
-- Create a database with EUC-JP
CREATE DATABASE mydb
ENCODING 'EUC-JP'
LC_COLLATE 'en_US.UTF-8';
-- Check database encoding
SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = current_database();
Compare with Other Encodings
See how EUC-JP differs from other encodings — which characters each supports and how the byte representations compare.
EUC-JP FAQ
When would I use EUC-JP instead of Shift-JIS?
EUC-JP was the standard Japanese encoding on Unix/Linux systems, older Japanese web servers, and Japanese email (ISO-2022-JP is related). If you are processing Japanese email archives, Unix system logs, or web content from legacy Japanese servers (especially pre-2005), EUC-JP is common. Shift-JIS is more common in Windows contexts.
How do I detect whether a Japanese file is Shift-JIS, EUC-JP, or UTF-8?
Use a library: chardet or charset-normalizer in Python, or uchardet on the command line. Valid UTF-8 is usually detectable reliably. Distinguishing Shift-JIS from EUC-JP is harder because their byte ranges overlap — the source system context is often the best clue. Always verify the result by checking that the decoded text renders correctly.