Apache charset negotiation via .var files

Choosing the right document charset in accordance with the Accept-Charset HTTP request header field is an extremely urgent problem for countries with several character code tables in active use, such as Russia or Japan.

Apache's .var files are the perfect tool to address these issues. Apache supports charset negotiation via .var files starting from v1.2b1.

As an example, try this a.var file. If you use a proxy, reload it a couple times to be sure, I don't put any anti-caching directives here for simplicity. It works with the following .htaccess settings on this server:

 AddType "text/html; charset=koi8-r" .html8 AddType "text/html; charset=windows-1251" .htmlw

If your browser generates proper Accept-Charset field, this example will automatically select a document in correct character set. When your browser accepts both KOI8-R and CP1251, the KOI8-R document will be chosen with the probability of 10%.

 URI: a; vary="type" URI: b.html8 Content-Type: text/html; charset=koi8-r; qs=0.1 URI: a.htmlw Content-Type: text/html; charset=windows-1251

It is convenient to store documents in a single character set and convert them on the fly. Sometimes it is possible to load conversion modules directly into HTTPD, but it is very implementation-dependent and may require server re-building, so CGI scripts look like a more general solution for this. In my previous example, instead of two files in different character sets there may be one CGI script that accepts character set as an argument and converts the file correspondingly. For example, you can use the trans Character Encoding Converter Generator Package to convert between most Russian character sets via UNICODE.

WARNING: This method requires that correct Accept-... fields are received from browsers.