Default variant (just the HTML document without any character
set specification) is suitable only for ISO 8859-1 (Latin1) Web pages
and not suitable for Russian Web pages:
direct character set specification is
required in this case in accordance with
standards.
For those who are too lazy to look, here are directly related quotes from the
Hypertext Transfer Protocol -- HTTP/1.1 (RFC 2616)
:
3.4.1 Missing Charset
Some HTTP/1.0 software has interpreted a Content-Type header without
charset parameter incorrectly to mean "recipient should guess."
Senders wishing to defeat this behavior MAY include a charset
parameter even when the charset is ISO-8859-1 and SHOULD do so when
it is known that it will not confuse the recipient.
Unfortunately, some older HTTP/1.0 clients did not deal properly with
an explicit charset parameter. HTTP/1.1 recipients MUST respect the
charset label provided by the sender; and those user agents that have
a provision to "guess" a charset MUST use the charset from the
content-type field if they support that charset, rather than the
recipient's preference, when initially displaying a document. See
section 3.7.1.
3.7.1 Canonicalization and Text Defaults
...
The "charset" parameter is used with some media types to define the
character set (section 3.4) of the data. When no explicit charset
parameter is provided by the sender, media subtypes of the "text"
type are defined to have a default charset value of "ISO-8859-1" when
received via HTTP. Data in character sets other than "ISO-8859-1" or
its subsets MUST be labeled with an appropriate charset value. See
section 3.4.1 for compatibility problems.
There are two methods to specify that your document is written in a
specific character set, such as KOI8-R:
If you do not use one of these, your document is treated as a
Latin1
document, i.e. the default character set (ISO 8859-1) is assumed
in accordance with
standards.
WARNING: Lots of people never bother
to follow standards, so many Russian pages written in KOI8-R or windows-1251
or any other Cyrillic code table
aren't accompanied by any sort of charset= attribute.
As a result, some browsers
that adhere to standards may display such documents using the
Latin1 (ISO 8859-1)
character set which renders Russian text completely unreadable.
Don't blame your browser for this, contact the author of the page
instead and ask him/her to fix the page using one of the
two methods described here. See also
My Impressions (in Russian) about the
current Russian-language Web state.
Here are the most popular Russian encodings and their correspondent
registered character set names for use in Web pages:
| KOI8 (Unix) | charset=koi8-r |
| CP1251 (Windows) | charset=windows-1251 |
| ISO 8859-5 (SunOs) | charset=iso-8859-5 |
| CP866 (Dos) | charset=cp866 |
| MacCyrillic (Mac) | charset=x-mac-cyrillic
(unregistered, but commonly recognized)
|