KOI8-R

Russian character encoding widely used in Unix systems and early internet. Designed so that stripping the high bit gives readable (if lowercase) ASCII. Still encountered in Russian email and Usenet archives.

KOI8-R
Fixed (1 byte)
1993

Byte Structure

KOI8-R uses fixed 1-byte encoding per character. Characters not in this encoding cannot be represented and must be replaced or transliterated.

When to Use KOI8-R

KOI8-R is encountered in legacy Russian and Cyrillic-script content — older email, Usenet archives, and pre-Unicode Unix systems. UTF-8 is the correct choice for all new systems, but you'll need this encoding when reading or writing to legacy sources.

Sample Characters in KOI8-R

The table below shows how a selection of characters are represented in KOI8-R. Bytes are shown in hexadecimal. Characters marked "not supported" cannot be encoded in KOI8-R and would need to be replaced or transliterated when converting from Unicode.

Character Codepoint Name Bytes (Hex) Bytes (Decimal) Supported
A U+0041 LATIN CAPITAL LETTER A 41 65 Yes
a U+0061 LATIN SMALL LETTER A 61 97 Yes
0 U+0030 DIGIT ZERO 30 48 Yes
$ U+0024 DOLLAR SIGN 24 36 Yes
£ U+00A3 POUND SIGN not supported
© U+00A9 COPYRIGHT SIGN BF 191 Yes
U+20AC EURO SIGN not supported
α U+03B1 GREEK SMALL LETTER ALPHA not supported
А U+0410 CYRILLIC CAPITAL LETTER A E1 225 Yes
U+4E2D not supported
U+3042 HIRAGANA LETTER A not supported
U+263A WHITE SMILING FACE not supported

Working with KOI8-R in Code

Every major language has built-in support for encoding conversion. The examples below show how to encode a string to KOI8-R bytes and decode it back to a Unicode string. Always specify the encoding explicitly — never rely on system defaults, which vary by OS and locale.

# Encode a string to koi8-r bytes
text = "Hello, 世界"
encoded = text.encode("KOI8-R")

# Decode bytes back to a string
decoded = encoded.decode("KOI8-R")
// Convert to koi8-r
$bytes = mb_convert_encoding(
    "Hello, 世界",
    "KOI8-R",
    "UTF-8"
);

// Convert back to UTF-8
$text = mb_convert_encoding(
    $bytes,
    "UTF-8",
    "KOI8-R"
);
// Encode to KOI8-R bytes
const encoder = new TextEncoder(); // UTF-8
const bytes = encoder.encode("Hello, 世界");

// Decode bytes
const decoder = new TextDecoder("KOI8-R");
const text = decoder.decode(bytes);
-- Create a database with KOI8-R
CREATE DATABASE mydb
  ENCODING 'KOI8-R'
  LC_COLLATE 'en_US.UTF-8';

-- Check database encoding
SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = current_database();

Compare with Other Encodings

See how KOI8-R differs from other encodings — which characters each supports and how the byte representations compare.

KOI8-R FAQ

What is KOI8-R used for?

KOI8-R is a character encoding used in specific regional or application contexts. It encodes a defined character set in a fixed-width byte format. For new systems, UTF-8 is the recommended encoding — it supports all Unicode characters and is the universal standard for the web and modern software.

How do I convert KOI8-R to UTF-8?

In Python: decoded = bytes_data.decode("KOI8-R"), then re-encode as UTF-8 with decoded.encode("utf-8"). In PHP: mb_convert_encoding($string, "UTF-8", "KOI8-R"). Always verify the output after conversion by checking that the text renders correctly.