[9c90b1b] | 1 | <?xml version="1.0" encoding="ISO-8859-1"?>
|
---|
| 2 | <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
|
---|
| 3 | "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
|
---|
| 4 | <!ENTITY % general-entities SYSTEM "../../general.ent">
|
---|
| 5 | %general-entities;
|
---|
| 6 | ]>
|
---|
| 7 |
|
---|
| 8 | <sect1 id="locale-issues" xreflabel="Locale Related Issues">
|
---|
| 9 | <?dbhtml filename="locale-issues.html"?>
|
---|
| 10 |
|
---|
| 11 | <sect1info>
|
---|
[3cd5c69] | 12 | <othername>$LastChangedBy$</othername>
|
---|
| 13 | <date>$Date$</date>
|
---|
[9c90b1b] | 14 | </sect1info>
|
---|
| 15 |
|
---|
| 16 | <title>Locale Related Issues</title>
|
---|
| 17 |
|
---|
| 18 | <para>This page contains information about locale related problems and
|
---|
| 19 | issues. In this paragraph you'll find a generic overview of things that can
|
---|
| 20 | come up when configuring your system for various locales. The previous
|
---|
| 21 | sentence and the remainder of this paragraph must still be
|
---|
| 22 | revised/completed.</para>
|
---|
| 23 |
|
---|
| 24 | <sect2>
|
---|
| 25 |
|
---|
[6473e74] | 26 | <title>Package Specific Locale Issues</title>
|
---|
[9c90b1b] | 27 |
|
---|
[aed14afe] | 28 | <para>For package-specific issues, find the concerned package from the list
|
---|
[9c90b1b] | 29 | below and follow the link to view the available information. If a package
|
---|
[aed14afe] | 30 | is not listed here, it does not mean there are no known locale-specific
|
---|
| 31 | issues or problems with that package. It only means that this page has not
|
---|
| 32 | been updated with the locale-specific information regarding that package.
|
---|
| 33 | Please reference the BLFS Wiki page for a particular package for any
|
---|
| 34 | additional locale-specific information. </para>
|
---|
[9c90b1b] | 35 |
|
---|
| 36 | <itemizedlist>
|
---|
| 37 |
|
---|
| 38 | <title>List of Packages with Locale Related Issues</title>
|
---|
| 39 |
|
---|
[3cd5c69] | 40 | <listitem>
|
---|
| 41 | <para><xref linkend="locale-mc"/></para>
|
---|
| 42 | </listitem>
|
---|
[9c90b1b] | 43 | <listitem>
|
---|
| 44 | <para><xref linkend="locale-unzip"/></para>
|
---|
| 45 | </listitem>
|
---|
[f6b83352] | 46 | <listitem>
|
---|
| 47 | <para><xref linkend="locale-nano"/></para>
|
---|
| 48 | </listitem>
|
---|
[9c90b1b] | 49 |
|
---|
| 50 | </itemizedlist>
|
---|
| 51 |
|
---|
[3cd5c69] | 52 | <sect3 id="locale-mc" xreflabel="MC-&mc-version;">
|
---|
| 53 |
|
---|
| 54 | <title><xref linkend="mc"/></title>
|
---|
| 55 |
|
---|
| 56 | <para>This package makes the assumption that <quote>characters</quote>
|
---|
| 57 | and <quote>bytes</quote> are the same thing. This is not true in UTF-8
|
---|
| 58 | based locales. Due to this assumption <application>MC</application> will
|
---|
| 59 | incorrectly position characters on the screen. After the cursor is moved
|
---|
| 60 | a bit the screen becomes totally unreadable, as illustrated on
|
---|
[60a31248] | 61 | <ulink url="&files-anduin;/mc-bad.png">this
|
---|
[3cd5c69] | 62 | screenshot</ulink> (taken in a ru_RU.UTF-8 locale). Additionally, input
|
---|
| 63 | of non-ASCII characters in the editor is impossible, even after selecting
|
---|
| 64 | <quote>Other 8-bit</quote> encoding from the menu.</para>
|
---|
| 65 |
|
---|
| 66 | </sect3>
|
---|
| 67 |
|
---|
[9c90b1b] | 68 | <sect3 id="locale-unzip" xreflabel="UnZip-&unzip-version;">
|
---|
| 69 |
|
---|
| 70 | <title><xref linkend="unzip"/></title>
|
---|
| 71 |
|
---|
| 72 | <note>
|
---|
| 73 | <para>Use of <application>UnZip</application> in the
|
---|
| 74 | <application>JDK</application>, <application>Mozilla</application>,
|
---|
[c151abf] | 75 | <application>DocBook</application> or any other BLFS package
|
---|
| 76 | installation is not a problem, as BLFS instructions never use
|
---|
[9c90b1b] | 77 | <application>UnZip</application> to extract a file with non-ASCII
|
---|
[c151abf] | 78 | characters in the file's name.</para>
|
---|
[9c90b1b] | 79 | </note>
|
---|
| 80 |
|
---|
| 81 | <para>The <application>UnZip</application> package assumes that filenames
|
---|
| 82 | stored in the ZIP archives created on non-Unix systems are encoded in
|
---|
| 83 | CP850, and that they should be converted to ISO-8859-1 when writing files
|
---|
| 84 | onto the filesystem. Such assumptions are not always valid. In fact,
|
---|
| 85 | inside the ZIP archive, filenames are encoded in the DOS codepage that is
|
---|
| 86 | in use in the relevant country, and the filenames on disk should be in
|
---|
| 87 | the locale encoding. In MS Windows, the OemToChar() C function (from
|
---|
| 88 | <filename>User32.DLL</filename>) does the correct conversion (which is
|
---|
| 89 | indeed the conversion from CP850 to a superset of ISO-8859-1 if MS
|
---|
| 90 | Windows is set up to use the US English language), but there is no
|
---|
| 91 | equivalent in Linux.</para>
|
---|
| 92 |
|
---|
| 93 | <para>When using <command>unzip</command> to unpack a ZIP archive
|
---|
| 94 | containing non-ASCII filenames, the filenames are damaged because
|
---|
[c151abf] | 95 | <command>unzip</command> uses improper conversion when any of its
|
---|
| 96 | encoding assumptions are incorrect. For example, in the ru_RU.KOI8-R
|
---|
| 97 | locale, conversion of filenames from CP866 to KOI8-R is required, but
|
---|
| 98 | conversion from CP850 to ISO-8859-1 is done, which produces filenames
|
---|
| 99 | consisting of undecipherable characters instead of words (the closest
|
---|
| 100 | equivalent understandable example for English-only users is rot13). There
|
---|
| 101 | are several ways around this limitation:</para>
|
---|
[9c90b1b] | 102 |
|
---|
| 103 | <para>1) For unpacking ZIP archives with filenames containing non-ASCII
|
---|
| 104 | characters, use <ulink url="http://www.winzip.com/">WinZip</ulink> while
|
---|
| 105 | running the <ulink url="http://www.winehq.com/">Wine</ulink> Windows
|
---|
| 106 | emulator.</para>
|
---|
| 107 |
|
---|
| 108 | <para>2) After running <command>unzip</command>, fix the damage made to
|
---|
| 109 | the filenames using the <command>convmv</command> tool
|
---|
| 110 | (<ulink url="http://j3e.de/linux/convmv/"/>). The following is an example
|
---|
| 111 | for the ru_RU.KOI8-R locale:</para>
|
---|
| 112 |
|
---|
| 113 | <blockquote>
|
---|
| 114 | <para>Step 1. Undo the conversion done by
|
---|
| 115 | <command>unzip</command>:</para>
|
---|
| 116 |
|
---|
| 117 | <screen><userinput>convmv -f iso-8859-1 -t cp850 -r --nosmart --notest \
|
---|
[57b11363] | 118 | <replaceable></path/to/unzipped/files></replaceable></userinput></screen>
|
---|
[9c90b1b] | 119 |
|
---|
| 120 | <para>Step 2. Do the correct conversion instead:</para>
|
---|
| 121 |
|
---|
| 122 | <screen><userinput>convmv -f cp866 -t koi8-r -r --nosmart --notest \
|
---|
[57b11363] | 123 | <replaceable></path/to/unzipped/files></replaceable></userinput></screen>
|
---|
[9c90b1b] | 124 | </blockquote>
|
---|
| 125 |
|
---|
| 126 | <para>3) Apply this patch to unzip:
|
---|
| 127 | <ulink url="https://bugzilla.altlinux.ru/attachment.cgi?id=532"/></para>
|
---|
| 128 |
|
---|
| 129 | <para>It allows to specify the assumed filename encoding in the ZIP
|
---|
| 130 | archive using the <option>-O charset_name</option> option and the
|
---|
| 131 | on-disk filename encoding using the <option>-I charset_name</option>
|
---|
| 132 | option. Defaults: the on-disk filename encoding is the locale encoding,
|
---|
| 133 | the encoding inside the ZIP archive is guessed according to the builtin
|
---|
| 134 | table based on the locale encoding. For US English users, this still
|
---|
| 135 | means that unzip converts from CP850 to ISO-8859-1 by default.</para>
|
---|
| 136 |
|
---|
| 137 | <para>Caveat: this method works only with 8-bit locale encodings, not
|
---|
| 138 | with UTF-8. Attempting to use a patched <command>unzip</command> in UTF-8
|
---|
| 139 | locales may result in a segmentation fault and is probably a security
|
---|
| 140 | risk.</para>
|
---|
| 141 |
|
---|
| 142 | </sect3>
|
---|
| 143 |
|
---|
[f6b83352] | 144 | <sect3 id="locale-nano" xreflabel="Nano-&nano-version;">
|
---|
| 145 |
|
---|
| 146 | <title><xref linkend="nano"/></title>
|
---|
| 147 |
|
---|
| 148 | <para>The current stable version of <application>Nano</application>
|
---|
| 149 | (&nano-version;) does not support UTF-8 character encodings. A
|
---|
| 150 | development version is available which addresses these issues. This
|
---|
| 151 | version can be downloaded at <ulink
|
---|
| 152 | url="http://www.nano-editor.org/dist/v1.3/nano-1.3.11.tar.gz"/>.
|
---|
| 153 | Instructions for installing this version are the same as those found on
|
---|
| 154 | the <xref linkend="nano"/> page.</para>
|
---|
| 155 |
|
---|
| 156 | </sect3>
|
---|
| 157 |
|
---|
[9c90b1b] | 158 | </sect2>
|
---|
| 159 |
|
---|
| 160 | </sect1>
|
---|