1 | <?xml version="1.0" encoding="ISO-8859-1"?>
|
---|
2 | <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
|
---|
3 | "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
|
---|
4 | <!ENTITY % general-entities SYSTEM "../../general.ent">
|
---|
5 | %general-entities;
|
---|
6 | ]>
|
---|
7 |
|
---|
8 | <sect1 id="locale-issues" xreflabel="Locale Related Issues">
|
---|
9 | <?dbhtml filename="locale-issues.html"?>
|
---|
10 |
|
---|
11 | <sect1info>
|
---|
12 | <date>$Date$</date>
|
---|
13 | </sect1info>
|
---|
14 |
|
---|
15 | <title>Locale Related Issues</title>
|
---|
16 |
|
---|
17 | <para>This page contains information about locale related problems and
|
---|
18 | issues. In the following paragraphs you'll find a generic overview of
|
---|
19 | things that can come up when configuring your system for various locales.
|
---|
20 | Many (but not all) existing locale related problems can be classified
|
---|
21 | and fall under one of the headings below. The severity ratings below use
|
---|
22 | the following criteria:</para>
|
---|
23 |
|
---|
24 | <itemizedlist>
|
---|
25 | <listitem>
|
---|
26 | <para>Critical: The program doesn't perform its main function.
|
---|
27 | The fix would be very intrusive, it's better to search for a
|
---|
28 | replacement.</para>
|
---|
29 | </listitem>
|
---|
30 | <listitem>
|
---|
31 | <para>High: Part of the functionality that the program provides
|
---|
32 | is not usable. If that functionality is required, it's better to
|
---|
33 | search for a replacement.</para>
|
---|
34 | </listitem>
|
---|
35 | <listitem>
|
---|
36 | <para>Low: The program works in all typical use cases, but lacks
|
---|
37 | some functionality normally provided by its equivalents.</para>
|
---|
38 | </listitem>
|
---|
39 | </itemizedlist>
|
---|
40 |
|
---|
41 | <para>If there is a known workaround for a specific package, it will
|
---|
42 | appear on that package's page. For the most recent information
|
---|
43 | about locale related issues for individual packages, check the
|
---|
44 | <ulink url="&blfs-wiki;/BlfsNotes">User Notes</ulink> in the BLFS
|
---|
45 | Wiki.</para>
|
---|
46 |
|
---|
47 | <sect2 id="locale-not-valid-option"
|
---|
48 | xreflabel="Needed Encoding Not a Valid Option">
|
---|
49 |
|
---|
50 | <title>The Needed Encoding is Not a Valid Option in the Program</title>
|
---|
51 |
|
---|
52 | <para>Severity: Critical</para>
|
---|
53 |
|
---|
54 | <para>Some programs require the user to specify the character encoding
|
---|
55 | for their input or output data and present only a limited choice of
|
---|
56 | encodings. This is the case for the <option>-X</option> option in
|
---|
57 | <!-- <xref linkend="a2ps"/> and --><xref linkend="enscript"/>,
|
---|
58 | the <option>-input-charset</option> option in unpatched
|
---|
59 | <xref linkend="cdrtools"/>, and the character sets offered for display
|
---|
60 | in the menu of <xref linkend="Links"/>. If the required encoding is not
|
---|
61 | in the list, the program usually becomes completely unusable. For
|
---|
62 | non-interactive programs, it may be possible to work around this by
|
---|
63 | converting the document to a supported input character set before
|
---|
64 | submitting to the program.</para>
|
---|
65 |
|
---|
66 | <para>A solution to this type of problem is to implement the necessary
|
---|
67 | support for the missing encoding as a patch to the original program or to
|
---|
68 | find a replacement.</para>
|
---|
69 |
|
---|
70 | </sect2>
|
---|
71 |
|
---|
72 | <sect2 id="locale-assumed-encoding"
|
---|
73 | xreflabel="Program Assumes Encoding">
|
---|
74 |
|
---|
75 | <title>The Program Assumes the Locale-Based Encoding of External
|
---|
76 | Documents</title>
|
---|
77 |
|
---|
78 | <para>Severity: High for non-text documents, low for text
|
---|
79 | documents</para>
|
---|
80 |
|
---|
81 | <para>Some programs, <xref linkend="nano"/> or
|
---|
82 | <xref linkend="joe"/> for example, assume that documents are always
|
---|
83 | in the encoding implied by the current locale. While this assumption
|
---|
84 | may be valid for the user-created documents, it is not safe for
|
---|
85 | external ones. When this assumption fails, non-ASCII characters are
|
---|
86 | displayed incorrectly, and the document may become unreadable.</para>
|
---|
87 |
|
---|
88 | <para>If the external document is entirely text based, it can be
|
---|
89 | converted to the current locale encoding using the
|
---|
90 | <command>iconv</command> program.</para>
|
---|
91 |
|
---|
92 | <para>For documents that are not text-based, this is not possible.
|
---|
93 | In fact, the assumption made in the program may be completely
|
---|
94 | invalid for documents where the Microsoft Windows operating system
|
---|
95 | has set de facto standards. An example of this problem is ID3v1 tags
|
---|
96 | in MP3 files (see the <ulink url="&blfs-wiki;/ID3v1Coding">BLFS Wiki
|
---|
97 | ID3v1Coding page</ulink>
|
---|
98 | for more details). For these cases, the only solution is to find a
|
---|
99 | replacement program that doesn't have the issue (e.g., one that
|
---|
100 | will allow you to specify the assumed document encoding).</para>
|
---|
101 |
|
---|
102 | <para>Among BLFS packages, this problem applies to
|
---|
103 | <xref linkend="nano"/>, <xref linkend="joe"/>, and all media players
|
---|
104 | except <xref linkend="audacious"/>.</para>
|
---|
105 |
|
---|
106 | <para>Another problem in this category is when someone cannot read
|
---|
107 | the documents you've sent them because their operating system is
|
---|
108 | set up to handle character encodings differently. This can happen
|
---|
109 | often when the other person is using Microsoft Windows, which only
|
---|
110 | provides one character encoding for a given country. For example,
|
---|
111 | this causes problems with UTF-8 encoded TeX documents created in
|
---|
112 | Linux. On Windows, most applications will assume that these documents
|
---|
113 | have been created using the default Windows 8-bit encoding.
|
---|
114 | </para>
|
---|
115 |
|
---|
116 | <para>In extreme cases, Windows encoding compatibility issues may be
|
---|
117 | solved only by running Windows programs under
|
---|
118 | <ulink url="http://www.winehq.com/">Wine</ulink>.</para>
|
---|
119 |
|
---|
120 | </sect2>
|
---|
121 |
|
---|
122 | <sect2 id="locale-wrong-filename-encoding"
|
---|
123 | xreflabel="Wrong Filename Encoding">
|
---|
124 |
|
---|
125 | <title>The Program Uses or Creates Filenames in the Wrong Encoding</title>
|
---|
126 |
|
---|
127 | <para>Severity: Critical</para>
|
---|
128 |
|
---|
129 | <para>The POSIX standard mandates that the filename encoding is
|
---|
130 | the encoding implied by the current LC_CTYPE locale category. This
|
---|
131 | information is well-hidden on the page which specifies the behavior
|
---|
132 | of <application>Tar</application> and <application>Cpio</application>
|
---|
133 | programs. Some programs get it wrong by default (or simply don't
|
---|
134 | have enough information to get it right). The result is that they
|
---|
135 | create filenames which are not subsequently shown correctly by
|
---|
136 | <command>ls</command>, or they refuse to accept filenames that
|
---|
137 | <command>ls</command> shows properly. For the <xref linkend="glib2"/>
|
---|
138 | library, the problem can be corrected by setting the
|
---|
139 | <envar>G_FILENAME_ENCODING</envar> environment variable to the special
|
---|
140 | "@locale" value. <application>Glib2</application> based programs that
|
---|
141 | don't respect that environment variable are buggy.</para>
|
---|
142 |
|
---|
143 | <para>The <xref linkend="zip"/> and <xref linkend="unzip"/> have this
|
---|
144 | problem because they hard-code the expected filename encoding.
|
---|
145 | <application>UnZip</application> contains a hard-coded conversion table
|
---|
146 | between the CP850 (DOS) and ISO-8859-1 (UNIX) encodings and uses this table
|
---|
147 | when extracting archives created under DOS or Microsoft Windows. However,
|
---|
148 | this assumption only works for those in the US and not for anyone using a
|
---|
149 | UTF-8 locale. Non-ASCII characters will be mangled in the extracted
|
---|
150 | filenames.</para>
|
---|
151 |
|
---|
152 | <!--<para>On the other hand,
|
---|
153 | <application>Nautilus CD Burner</application> checks names of
|
---|
154 | files added to its window for UTF-8 validity. This is wrong for
|
---|
155 | users of non-UTF-8 locales. Also,
|
---|
156 | <application>Nautilus CD Burner</application> unconditionally
|
---|
157 | calls <command>mkisofs</command> with the
|
---|
158 | <parameter>-input-charset UTF-8</parameter> parameter, which is
|
---|
159 | only correct in UTF-8 locales.</para>-->
|
---|
160 |
|
---|
161 | <para>The general rule for avoiding this class of problems is to
|
---|
162 | avoid installing broken programs. If this is impossible, the
|
---|
163 | <ulink url="http://j3e.de/linux/convmv/">convmv</ulink>
|
---|
164 | command-line tool can be used to fix filenames created by these
|
---|
165 | broken programs, or intentionally mangle the existing filenames
|
---|
166 | to meet the broken expectations of such programs.</para>
|
---|
167 |
|
---|
168 | <para>In other cases, a similar problem is caused by importing
|
---|
169 | filenames from a system using a different locale with a tool that
|
---|
170 | is not locale-aware (e.g., <!--<xref linkend="nfs-utils"/> or-->
|
---|
171 | <xref linkend="openssh"/>). In order to avoid mangling non-ASCII
|
---|
172 | characters when transferring files to a system with a different
|
---|
173 | locale, any of the following methods can be used:</para>
|
---|
174 |
|
---|
175 | <itemizedlist>
|
---|
176 | <listitem>
|
---|
177 | <para>Transfer anyway, fix the damage with
|
---|
178 | <command>convmv</command>.</para>
|
---|
179 | </listitem>
|
---|
180 | <listitem>
|
---|
181 | <para>On the sending side, create a tar archive with the
|
---|
182 | <parameter>--format=posix</parameter> switch passed to
|
---|
183 | <command>tar</command> (this will be the default in a future
|
---|
184 | version of <command>tar</command>).</para>
|
---|
185 | </listitem>
|
---|
186 | <listitem>
|
---|
187 | <para>Mail the files as attachments. Mail clients specify the
|
---|
188 | encoding of attached filenames.</para>
|
---|
189 | </listitem>
|
---|
190 | <listitem>
|
---|
191 | <para>Write the files to a removable disk formatted with a FAT or
|
---|
192 | FAT32 filesystem.</para>
|
---|
193 | </listitem>
|
---|
194 | <listitem>
|
---|
195 | <para>Transfer the files using Samba.</para>
|
---|
196 | </listitem>
|
---|
197 | <listitem>
|
---|
198 | <para>Transfer the files via FTP using RFC2640-aware server
|
---|
199 | (this currently means only wu-ftpd, which has bad security history)
|
---|
200 | and client (e.g., lftp).</para>
|
---|
201 | </listitem>
|
---|
202 | </itemizedlist>
|
---|
203 |
|
---|
204 | <para>The last four methods work because the filenames are automatically
|
---|
205 | converted from the sender's locale to UNICODE and stored or sent in this
|
---|
206 | form. They are then transparently converted from UNICODE to the
|
---|
207 | recipient's locale encoding.</para>
|
---|
208 |
|
---|
209 | </sect2>
|
---|
210 |
|
---|
211 | <sect2 id="locale-wrong-multibyte-characters"
|
---|
212 | xreflabel="Breaks Multibyte Characters">
|
---|
213 |
|
---|
214 | <title>The Program Breaks Multibyte Characters or Doesn't Count
|
---|
215 | Character Cells Correctly</title>
|
---|
216 |
|
---|
217 | <para>Severity: High or critical</para>
|
---|
218 |
|
---|
219 | <para>Many programs were written in an older era where multibyte
|
---|
220 | locales were not common. Such programs assume that C "char" data
|
---|
221 | type, which is one byte, can be used to store single characters.
|
---|
222 | Further, they assume that any sequence of characters is a valid
|
---|
223 | string and that every character occupies a single character cell.
|
---|
224 | Such assumptions completely break in UTF-8 locales. The visible
|
---|
225 | manifestation is that the program truncates strings prematurely
|
---|
226 | (i.e., at 80 bytes instead of 80 characters). Terminal-based
|
---|
227 | programs don't place the cursor correctly on the screen, don't react
|
---|
228 | to the "Backspace" key by erasing one character, and leave junk
|
---|
229 | characters around when updating the screen, usually turning the
|
---|
230 | screen into a complete mess.</para>
|
---|
231 |
|
---|
232 | <para>Fixing this kind of problems is a tedious task from a
|
---|
233 | programmer's point of view, like all other cases of retrofitting new
|
---|
234 | concepts into the old flawed design. In this case, one has to redesign
|
---|
235 | all data structures in order to accommodate to the fact that a complete
|
---|
236 | character may span a variable number of "char"s (or switch to wchar_t
|
---|
237 | and convert as needed). Also, for every call to the "strlen" and
|
---|
238 | similar functions, find out whether a number of bytes, a number of
|
---|
239 | characters, or the width of the string was really meant. Sometimes it
|
---|
240 | is faster to write a program with the same functionality from scratch.
|
---|
241 | </para>
|
---|
242 |
|
---|
243 | <para>Among BLFS packages, this problem applies to
|
---|
244 | <xref linkend="xine-ui"/> and all the shells.</para>
|
---|
245 |
|
---|
246 | </sect2>
|
---|
247 |
|
---|
248 | <sect2 id="locale-wrong-manpage-encoding"
|
---|
249 | xreflabel="Incorrect Manual Page Encoding">
|
---|
250 |
|
---|
251 | <title>The Package Installs Manual Pages in Incorrect or
|
---|
252 | Non-Displayable Encoding</title>
|
---|
253 |
|
---|
254 | <para>Severity: Low</para>
|
---|
255 |
|
---|
256 | <para>LFS expects that manual pages are in the language-specific (usually
|
---|
257 | 8-bit) encoding, as specified on the <ulink
|
---|
258 | url="&lfs-root;/chapter08/man-db.html">LFS Man DB page</ulink>. However,
|
---|
259 | some packages install translated manual pages in UTF-8 encoding (e.g.,
|
---|
260 | Shadow, already dealt with), or manual pages in languages not in the table.
|
---|
261 | Not all BLFS packages have been audited for conformance with the
|
---|
262 | requirements put in LFS (the large majority have been checked, and fixes
|
---|
263 | placed in the book for packages known to install non-conforming manual
|
---|
264 | pages). If you find a manual page installed by any of BLFS packages that is
|
---|
265 | obviously in the wrong encoding, please remove or convert it as needed, and
|
---|
266 | report this to BLFS team as a bug.</para>
|
---|
267 |
|
---|
268 | <para>You can easily check your system for any non-conforming manual pages
|
---|
269 | by copying the following short shell script to some accessible location,
|
---|
270 |
|
---|
271 | <screen><literal>#!/bin/sh
|
---|
272 | # Begin checkman.sh
|
---|
273 | # Usage: find /usr/share/man -type f | xargs checkman.sh
|
---|
274 | for a in "$@"
|
---|
275 | do
|
---|
276 | # echo "Checking $a..."
|
---|
277 | # Pure-ASCII manual page (possibly except comments) is OK
|
---|
278 | grep -v '.\\"' "$a" | iconv -f US-ASCII -t US-ASCII >/dev/null 2>&1 \
|
---|
279 | && continue
|
---|
280 | # Non-UTF-8 manual page is OK
|
---|
281 | iconv -f UTF-8 -t UTF-8 "$a" >/dev/null 2>&1 || continue
|
---|
282 | # Found a UTF-8 manual page, bad.
|
---|
283 | echo "UTF-8 manual page: $a" >&2
|
---|
284 | done
|
---|
285 | # End checkman.sh
|
---|
286 | </literal></screen>
|
---|
287 |
|
---|
288 | and then issuing the following command (modify the command below if the
|
---|
289 | <command>checkman.sh</command> script is not in your <envar>PATH</envar>
|
---|
290 | environment variable):</para>
|
---|
291 |
|
---|
292 | <screen><userinput>find /usr/share/man -type f | xargs checkman.sh</userinput></screen>
|
---|
293 |
|
---|
294 | <para>Note that if you have manual pages installed in any location other
|
---|
295 | than <filename class='directory'>/usr/share/man</filename> (e.g.,
|
---|
296 | <filename class='directory'>/usr/local/share/man</filename>), you must
|
---|
297 | modify the above command to include this additional location.</para>
|
---|
298 |
|
---|
299 | </sect2>
|
---|
300 |
|
---|
301 | </sect1>
|
---|