KOI8-R

Russian character encoding widely used in Unix systems and early internet. Designed so that stripping the high bit gives readable (if lowercase) ASCII. Still encountered in Russian email and Usenet archives.

IANA Name

KOI8-R

Width

Fixed (1 byte)

Introduced

1993

Byte Structure

KOI8-R uses fixed 1-byte encoding per character. Characters not in this encoding cannot be represented and must be replaced or transliterated.

When to Use KOI8-R

KOI8-R is encountered in legacy Russian and Cyrillic-script content — older email, Usenet archives, and pre-Unicode Unix systems. UTF-8 is the correct choice for all new systems, but you'll need this encoding when reading or writing to legacy sources.

Sample Characters in KOI8-R

The table below shows how a selection of characters are represented in KOI8-R. Bytes are shown in hexadecimal. Characters marked "not supported" cannot be encoded in KOI8-R and would need to be replaced or transliterated when converting from Unicode.

Character	Codepoint	Name	Bytes (Hex)	Bytes (Decimal)	Supported
A	U+0041	LATIN CAPITAL LETTER A	41	65	Yes
a	U+0061	LATIN SMALL LETTER A	61	97	Yes
0	U+0030	DIGIT ZERO	30	48	Yes
$	U+0024	DOLLAR SIGN	24	36	Yes
£	U+00A3	POUND SIGN	not supported		—
©	U+00A9	COPYRIGHT SIGN	BF	191	Yes
€	U+20AC	EURO SIGN	not supported		—
α	U+03B1	GREEK SMALL LETTER ALPHA	not supported		—
А	U+0410	CYRILLIC CAPITAL LETTER A	E1	225	Yes
中	U+4E2D		not supported		—
あ	U+3042	HIRAGANA LETTER A	not supported		—
☺	U+263A	WHITE SMILING FACE	not supported		—

Working with KOI8-R in Code

Every major language has built-in support for encoding conversion. The examples below show how to encode a string to KOI8-R bytes and decode it back to a Unicode string. Always specify the encoding explicitly — never rely on system defaults, which vary by OS and locale.

Python

# Encode a string to koi8-r bytes
text = "Hello, 世界"
encoded = text.encode("KOI8-R")

# Decode bytes back to a string
decoded = encoded.decode("KOI8-R")

PHP

// Convert to koi8-r
$bytes = mb_convert_encoding(
    "Hello, 世界",
    "KOI8-R",
    "UTF-8"
);

// Convert back to UTF-8
$text = mb_convert_encoding(
    $bytes,
    "UTF-8",
    "KOI8-R"
);

JavaScript

// Encode to KOI8-R bytes
const encoder = new TextEncoder(); // UTF-8
const bytes = encoder.encode("Hello, 世界");

// Decode bytes
const decoder = new TextDecoder("KOI8-R");
const text = decoder.decode(bytes);

SQL (PostgreSQL)

-- Create a database with KOI8-R
CREATE DATABASE mydb
  ENCODING 'KOI8-R'
  LC_COLLATE 'en_US.UTF-8';

-- Check database encoding
SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = current_database();

Compare with Other Encodings

See how KOI8-R differs from other encodings — which characters each supports and how the byte representations compare.

KOI8-R vs UTF-8 → KOI8-R vs ISO-8859-5 (Cyrillic) →

KOI8-R FAQ

What is KOI8-R used for?

KOI8-R is a character encoding used in specific regional or application contexts. It encodes a defined character set in a fixed-width byte format. For new systems, UTF-8 is the recommended encoding — it supports all Unicode characters and is the universal standard for the web and modern software.

How do I convert KOI8-R to UTF-8?

In Python: decoded = bytes_data.decode("KOI8-R"), then re-encode as UTF-8 with decoded.encode("utf-8"). In PHP: mb_convert_encoding($string, "UTF-8", "KOI8-R"). Always verify the output after conversion by checking that the text renders correctly.

← All Encodings Browse Characters →