1 | <?xml version="1.0" encoding="ISO-8859-1"?>
|
---|
2 | <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
|
---|
3 | "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
|
---|
4 | <!ENTITY % general-entities SYSTEM "../../general.ent">
|
---|
5 | %general-entities;
|
---|
6 | ]>
|
---|
7 |
|
---|
8 | <sect1 id="locale-issues" xreflabel="Locale Related Issues">
|
---|
9 | <?dbhtml filename="locale-issues.html"?>
|
---|
10 |
|
---|
11 | <sect1info>
|
---|
12 | <othername>$LastChangedBy$</othername>
|
---|
13 | <date>$Date$</date>
|
---|
14 | </sect1info>
|
---|
15 |
|
---|
16 | <title>Locale Related Issues</title>
|
---|
17 |
|
---|
18 | <para>This page contains information about locale related problems and
|
---|
19 | issues. In this paragraph you'll find a generic overview of things that can
|
---|
20 | come up when configuring your system for various locales. The previous
|
---|
21 | sentence and the remainder of this paragraph must still be
|
---|
22 | revised/completed.</para>
|
---|
23 |
|
---|
24 | <sect2>
|
---|
25 |
|
---|
26 | <title>Package Specific Locale Issues</title>
|
---|
27 |
|
---|
28 | <para>For package-specific issues, find the concerned package from the list
|
---|
29 | below and follow the link to view the available information. If a package
|
---|
30 | is not listed here, it does not mean there are no known locale-specific
|
---|
31 | issues or problems with that package. It only means that this page has not
|
---|
32 | been updated with the locale-specific information regarding that package.
|
---|
33 | Please reference the BLFS Wiki page for a particular package for any
|
---|
34 | additional locale-specific information. </para>
|
---|
35 |
|
---|
36 | <itemizedlist>
|
---|
37 |
|
---|
38 | <title>List of Packages with Locale Related Issues</title>
|
---|
39 |
|
---|
40 | <listitem>
|
---|
41 | <para><xref linkend="locale-mc"/></para>
|
---|
42 | </listitem>
|
---|
43 | <listitem>
|
---|
44 | <para><xref linkend="locale-unzip"/></para>
|
---|
45 | </listitem>
|
---|
46 | <listitem>
|
---|
47 | <para><xref linkend="locale-nano"/></para>
|
---|
48 | </listitem>
|
---|
49 |
|
---|
50 | </itemizedlist>
|
---|
51 |
|
---|
52 | <sect3 id="locale-mc" xreflabel="MC-&mc-version;">
|
---|
53 |
|
---|
54 | <title><xref linkend="mc"/></title>
|
---|
55 |
|
---|
56 | <para>This package makes the assumption that <quote>characters</quote>
|
---|
57 | and <quote>bytes</quote> are the same thing. This is not true in UTF-8
|
---|
58 | based locales. Due to this assumption <application>MC</application> will
|
---|
59 | incorrectly position characters on the screen. After the cursor is moved
|
---|
60 | a bit the screen becomes totally unreadable, as illustrated on
|
---|
61 | <ulink url="&files-anduin;/mc-bad.png">this
|
---|
62 | screenshot</ulink> (taken in a ru_RU.UTF-8 locale). Additionally, input
|
---|
63 | of non-ASCII characters in the editor is impossible, even after selecting
|
---|
64 | <quote>Other 8-bit</quote> encoding from the menu.</para>
|
---|
65 |
|
---|
66 | </sect3>
|
---|
67 |
|
---|
68 | <sect3 id="locale-unzip" xreflabel="UnZip-&unzip-version;">
|
---|
69 |
|
---|
70 | <title><xref linkend="unzip"/></title>
|
---|
71 |
|
---|
72 | <note>
|
---|
73 | <para>Use of <application>UnZip</application> in the
|
---|
74 | <application>JDK</application>, <application>Mozilla</application>,
|
---|
75 | <application>DocBook</application> or any other BLFS package
|
---|
76 | installation is not a problem, as BLFS instructions never use
|
---|
77 | <application>UnZip</application> to extract a file with non-ASCII
|
---|
78 | characters in the file's name.</para>
|
---|
79 | </note>
|
---|
80 |
|
---|
81 | <para>The <application>UnZip</application> package assumes that filenames
|
---|
82 | stored in the ZIP archives created on non-Unix systems are encoded in
|
---|
83 | CP850, and that they should be converted to ISO-8859-1 when writing files
|
---|
84 | onto the filesystem. Such assumptions are not always valid. In fact,
|
---|
85 | inside the ZIP archive, filenames are encoded in the DOS codepage that is
|
---|
86 | in use in the relevant country, and the filenames on disk should be in
|
---|
87 | the locale encoding. In MS Windows, the OemToChar() C function (from
|
---|
88 | <filename>User32.DLL</filename>) does the correct conversion (which is
|
---|
89 | indeed the conversion from CP850 to a superset of ISO-8859-1 if MS
|
---|
90 | Windows is set up to use the US English language), but there is no
|
---|
91 | equivalent in Linux.</para>
|
---|
92 |
|
---|
93 | <para>When using <command>unzip</command> to unpack a ZIP archive
|
---|
94 | containing non-ASCII filenames, the filenames are damaged because
|
---|
95 | <command>unzip</command> uses improper conversion when any of its
|
---|
96 | encoding assumptions are incorrect. For example, in the ru_RU.KOI8-R
|
---|
97 | locale, conversion of filenames from CP866 to KOI8-R is required, but
|
---|
98 | conversion from CP850 to ISO-8859-1 is done, which produces filenames
|
---|
99 | consisting of undecipherable characters instead of words (the closest
|
---|
100 | equivalent understandable example for English-only users is rot13). There
|
---|
101 | are several ways around this limitation:</para>
|
---|
102 |
|
---|
103 | <para>1) For unpacking ZIP archives with filenames containing non-ASCII
|
---|
104 | characters, use <ulink url="http://www.winzip.com/">WinZip</ulink> while
|
---|
105 | running the <ulink url="http://www.winehq.com/">Wine</ulink> Windows
|
---|
106 | emulator.</para>
|
---|
107 |
|
---|
108 | <para>2) After running <command>unzip</command>, fix the damage made to
|
---|
109 | the filenames using the <command>convmv</command> tool
|
---|
110 | (<ulink url="http://j3e.de/linux/convmv/"/>). The following is an example
|
---|
111 | for the ru_RU.KOI8-R locale:</para>
|
---|
112 |
|
---|
113 | <blockquote>
|
---|
114 | <para>Step 1. Undo the conversion done by
|
---|
115 | <command>unzip</command>:</para>
|
---|
116 |
|
---|
117 | <screen><userinput>convmv -f iso-8859-1 -t cp850 -r --nosmart --notest \
|
---|
118 | <replaceable></path/to/unzipped/files></replaceable></userinput></screen>
|
---|
119 |
|
---|
120 | <para>Step 2. Do the correct conversion instead:</para>
|
---|
121 |
|
---|
122 | <screen><userinput>convmv -f cp866 -t koi8-r -r --nosmart --notest \
|
---|
123 | <replaceable></path/to/unzipped/files></replaceable></userinput></screen>
|
---|
124 | </blockquote>
|
---|
125 |
|
---|
126 | <para>3) Apply this patch to unzip:
|
---|
127 | <ulink url="https://bugzilla.altlinux.ru/attachment.cgi?id=532"/></para>
|
---|
128 |
|
---|
129 | <para>It allows to specify the assumed filename encoding in the ZIP
|
---|
130 | archive using the <option>-O charset_name</option> option and the
|
---|
131 | on-disk filename encoding using the <option>-I charset_name</option>
|
---|
132 | option. Defaults: the on-disk filename encoding is the locale encoding,
|
---|
133 | the encoding inside the ZIP archive is guessed according to the builtin
|
---|
134 | table based on the locale encoding. For US English users, this still
|
---|
135 | means that unzip converts from CP850 to ISO-8859-1 by default.</para>
|
---|
136 |
|
---|
137 | <para>Caveat: this method works only with 8-bit locale encodings, not
|
---|
138 | with UTF-8. Attempting to use a patched <command>unzip</command> in UTF-8
|
---|
139 | locales may result in a segmentation fault and is probably a security
|
---|
140 | risk.</para>
|
---|
141 |
|
---|
142 | </sect3>
|
---|
143 |
|
---|
144 | <sect3 id="locale-nano" xreflabel="Nano-&nano-version;">
|
---|
145 |
|
---|
146 | <title><xref linkend="nano"/></title>
|
---|
147 |
|
---|
148 | <para>The current stable version of <application>Nano</application>
|
---|
149 | (&nano-version;) does not support UTF-8 character encodings. A
|
---|
150 | development version is available which addresses these issues. This
|
---|
151 | version can be downloaded at <ulink
|
---|
152 | url="http://www.nano-editor.org/dist/v1.3/nano-1.3.11.tar.gz"/>.
|
---|
153 | Instructions for installing this version are the same as those found on
|
---|
154 | the <xref linkend="nano"/> page.</para>
|
---|
155 |
|
---|
156 | </sect3>
|
---|
157 |
|
---|
158 | </sect2>
|
---|
159 |
|
---|
160 | </sect1>
|
---|