EUC-JP

Extended Unix Code for Japanese. Variable-width encoding common in Unix/Linux Japanese environments and older web pages.

IANA Name

EUC-JP

Width

Variable (1–3 bytes)

Introduced

1991

Byte Structure

EUC-JP uses variable-width encoding (1–3 bytes per character). Characters not in this encoding cannot be represented and must be replaced or transliterated.

When to Use EUC-JP

EUC-JP was the dominant encoding for Japanese on Unix/Linux systems and older web servers. You'll encounter it when processing legacy Japanese Unix content, emails, or Usenet archives. Modern Japanese applications use UTF-8.

Sample Characters in EUC-JP

The table below shows how a selection of characters are represented in EUC-JP. Bytes are shown in hexadecimal. Characters marked "not supported" cannot be encoded in EUC-JP and would need to be replaced or transliterated when converting from Unicode.

Character	Codepoint	Name	Bytes (Hex)	Bytes (Decimal)	Supported
A	U+0041	LATIN CAPITAL LETTER A	41	65	Yes
a	U+0061	LATIN SMALL LETTER A	61	97	Yes
0	U+0030	DIGIT ZERO	30	48	Yes
$	U+0024	DOLLAR SIGN	24	36	Yes
£	U+00A3	POUND SIGN	A1 F2	161 242	Yes
©	U+00A9	COPYRIGHT SIGN	8F A2 ED	143 162 237	Yes
€	U+20AC	EURO SIGN	not supported		—
α	U+03B1	GREEK SMALL LETTER ALPHA	A6 C1	166 193	Yes
А	U+0410	CYRILLIC CAPITAL LETTER A	A7 A1	167 161	Yes
中	U+4E2D		C3 E6	195 230	Yes
あ	U+3042	HIRAGANA LETTER A	A4 A2	164 162	Yes
☺	U+263A	WHITE SMILING FACE	not supported		—

Working with EUC-JP in Code

Every major language has built-in support for encoding conversion. The examples below show how to encode a string to EUC-JP bytes and decode it back to a Unicode string. Always specify the encoding explicitly — never rely on system defaults, which vary by OS and locale.

Python

# Encode a string to euc-jp bytes
text = "Hello, 世界"
encoded = text.encode("EUC-JP")

# Decode bytes back to a string
decoded = encoded.decode("EUC-JP")

PHP

// Convert to euc-jp
$bytes = mb_convert_encoding(
    "Hello, 世界",
    "EUC-JP",
    "UTF-8"
);

// Convert back to UTF-8
$text = mb_convert_encoding(
    $bytes,
    "UTF-8",
    "EUC-JP"
);

JavaScript

// Encode to EUC-JP bytes
const encoder = new TextEncoder(); // UTF-8
const bytes = encoder.encode("Hello, 世界");

// Decode bytes
const decoder = new TextDecoder("EUC-JP");
const text = decoder.decode(bytes);

SQL (PostgreSQL)

-- Create a database with EUC-JP
CREATE DATABASE mydb
  ENCODING 'EUC-JP'
  LC_COLLATE 'en_US.UTF-8';

-- Check database encoding
SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = current_database();

Compare with Other Encodings

See how EUC-JP differs from other encodings — which characters each supports and how the byte representations compare.

EUC-JP vs UTF-8 → EUC-JP vs Shift-JIS →

EUC-JP FAQ

When would I use EUC-JP instead of Shift-JIS?

EUC-JP was the standard Japanese encoding on Unix/Linux systems, older Japanese web servers, and Japanese email (ISO-2022-JP is related). If you are processing Japanese email archives, Unix system logs, or web content from legacy Japanese servers (especially pre-2005), EUC-JP is common. Shift-JIS is more common in Windows contexts.

How do I detect whether a Japanese file is Shift-JIS, EUC-JP, or UTF-8?

Use a library: chardet or charset-normalizer in Python, or uchardet on the command line. Valid UTF-8 is usually detectable reliably. Distinguishing Shift-JIS from EUC-JP is harder because their byte ranges overlap — the source system context is often the best clue. Always verify the result by checking that the decoded text renders correctly.

← All Encodings Browse Characters →