1 | <?xml version="1.0" encoding="ISO-8859-1"?>
|
---|
2 | <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
|
---|
3 | "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
|
---|
4 | <!ENTITY % general-entities SYSTEM "../../general.ent">
|
---|
5 | %general-entities;
|
---|
6 | ]>
|
---|
7 |
|
---|
8 | <sect1 id="locale-issues" xreflabel="Locale Related Issues">
|
---|
9 | <?dbhtml filename="locale-issues.html"?>
|
---|
10 |
|
---|
11 |
|
---|
12 | <title>Locale Related Issues</title>
|
---|
13 |
|
---|
14 | <para>This page contains information about locale related problems and
|
---|
15 | issues. In the following paragraphs you'll find a generic overview of
|
---|
16 | things that can come up when configuring your system for various locales.
|
---|
17 | Many (but not all) existing locale related problems can be classified
|
---|
18 | and fall under one of the headings below. The severity ratings below use
|
---|
19 | the following criteria:</para>
|
---|
20 |
|
---|
21 | <itemizedlist>
|
---|
22 | <listitem>
|
---|
23 | <para>Critical: The program doesn't perform its main function.
|
---|
24 | The fix would be very intrusive, it's better to search for a
|
---|
25 | replacement.</para>
|
---|
26 | </listitem>
|
---|
27 | <listitem>
|
---|
28 | <para>High: Part of the functionality that the program provides
|
---|
29 | is not usable. If that functionality is required, it's better to
|
---|
30 | search for a replacement.</para>
|
---|
31 | </listitem>
|
---|
32 | <listitem>
|
---|
33 | <para>Low: The program works in all typical use cases, but lacks
|
---|
34 | some functionality normally provided by its equivalents.</para>
|
---|
35 | </listitem>
|
---|
36 | </itemizedlist>
|
---|
37 |
|
---|
38 | <para>If there is a known workaround for a specific package, it will
|
---|
39 | appear on that package's page. For the most recent information
|
---|
40 | about locale related issues for individual packages, check the
|
---|
41 | <ulink url="&blfs-wiki;/BlfsNotes">Editor Notes</ulink> in the BLFS
|
---|
42 | Wiki.</para>
|
---|
43 |
|
---|
44 | <sect2 id="locale-not-valid-option"
|
---|
45 | xreflabel="Needed Encoding Not a Valid Option">
|
---|
46 |
|
---|
47 | <title>The Needed Encoding is Not a Valid Option in the Program</title>
|
---|
48 |
|
---|
49 | <para>Severity: Critical</para>
|
---|
50 |
|
---|
51 | <para>Some programs require the user to specify the character encoding
|
---|
52 | for their input or output data and present only a limited choice of
|
---|
53 | encodings. This is the case for the <option>-X</option> option in
|
---|
54 | <!-- <xref linkend="a2ps"/> and --><xref linkend="enscript"/>,
|
---|
55 | the <option>-input-charset</option> option in unpatched
|
---|
56 | <xref linkend="cdrtools"/>, and the character sets offered for display
|
---|
57 | in the menu of <xref linkend="Links"/>. If the required encoding is not
|
---|
58 | in the list, the program usually becomes completely unusable. For
|
---|
59 | non-interactive programs, it may be possible to work around this by
|
---|
60 | converting the document to a supported input character set before
|
---|
61 | submitting to the program.</para>
|
---|
62 |
|
---|
63 | <para>A solution to this type of problem is to implement the necessary
|
---|
64 | support for the missing encoding as a patch to the original program or to
|
---|
65 | find a replacement.</para>
|
---|
66 |
|
---|
67 | </sect2>
|
---|
68 |
|
---|
69 | <sect2 id="locale-assumed-encoding"
|
---|
70 | xreflabel="Program Assumes Encoding">
|
---|
71 |
|
---|
72 | <title>The Program Assumes the Locale-Based Encoding of External
|
---|
73 | Documents</title>
|
---|
74 |
|
---|
75 | <para>Severity: High for non-text documents, low for text
|
---|
76 | documents</para>
|
---|
77 |
|
---|
78 | <para>Some programs, <xref linkend="nano"/> or
|
---|
79 | <xref linkend="joe"/> for example, assume that documents are always
|
---|
80 | in the encoding implied by the current locale. While this assumption
|
---|
81 | may be valid for the user-created documents, it is not safe for
|
---|
82 | external ones. When this assumption fails, non-ASCII characters are
|
---|
83 | displayed incorrectly, and the document may become unreadable.</para>
|
---|
84 |
|
---|
85 | <para>If the external document is entirely text based, it can be
|
---|
86 | converted to the current locale encoding using the
|
---|
87 | <command>iconv</command> program.</para>
|
---|
88 |
|
---|
89 | <para>For documents that are not text-based, this is not possible.
|
---|
90 | In fact, the assumption made in the program may be completely
|
---|
91 | invalid for documents where the Microsoft Windows operating system
|
---|
92 | has set de facto standards. An example of this problem is ID3v1 tags
|
---|
93 | in MP3 files (see the <ulink url="&blfs-wiki;/ID3v1Coding">BLFS Wiki
|
---|
94 | ID3v1Coding page</ulink>
|
---|
95 | for more details). For these cases, the only solution is to find a
|
---|
96 | replacement program that doesn't have the issue (e.g., one that
|
---|
97 | will allow you to specify the assumed document encoding).</para>
|
---|
98 |
|
---|
99 | <para>Among BLFS packages, this problem applies to
|
---|
100 | <xref linkend="nano"/>, <xref linkend="joe"/>, and all media players
|
---|
101 | except <xref linkend="audacious"/>.</para>
|
---|
102 |
|
---|
103 | <para>Another problem in this category is when someone cannot read
|
---|
104 | the documents you've sent them because their operating system is
|
---|
105 | set up to handle character encodings differently. This can happen
|
---|
106 | often when the other person is using Microsoft Windows, which only
|
---|
107 | provides one character encoding for a given country. For example,
|
---|
108 | this causes problems with UTF-8 encoded TeX documents created in
|
---|
109 | Linux. On Windows, most applications will assume that these documents
|
---|
110 | have been created using the default Windows 8-bit encoding.
|
---|
111 | </para>
|
---|
112 |
|
---|
113 | <para>In extreme cases, Windows encoding compatibility issues may be
|
---|
114 | solved only by running Windows programs under
|
---|
115 | <ulink url="https://www.winehq.com/">Wine</ulink>.</para>
|
---|
116 |
|
---|
117 | </sect2>
|
---|
118 |
|
---|
119 | <sect2 id="locale-wrong-filename-encoding"
|
---|
120 | xreflabel="Wrong Filename Encoding">
|
---|
121 |
|
---|
122 | <title>The Program Uses or Creates Filenames in the Wrong Encoding</title>
|
---|
123 |
|
---|
124 | <para>Severity: Critical</para>
|
---|
125 |
|
---|
126 | <para>The POSIX standard mandates that the filename encoding is
|
---|
127 | the encoding implied by the current LC_CTYPE locale category. This
|
---|
128 | information is well-hidden on the page which specifies the behavior
|
---|
129 | of <application>Tar</application> and <application>Cpio</application>
|
---|
130 | programs. Some programs get it wrong by default (or simply don't
|
---|
131 | have enough information to get it right). The result is that they
|
---|
132 | create filenames which are not subsequently shown correctly by
|
---|
133 | <command>ls</command>, or they refuse to accept filenames that
|
---|
134 | <command>ls</command> shows properly. For the <xref linkend="glib2"/>
|
---|
135 | library, the problem can be corrected by setting the
|
---|
136 | <envar>G_FILENAME_ENCODING</envar> environment variable to the special
|
---|
137 | "@locale" value. <application>Glib2</application> based programs that
|
---|
138 | don't respect that environment variable are buggy.</para>
|
---|
139 |
|
---|
140 | <para>The <xref linkend="zip"/> and <xref linkend="unzip"/> have this
|
---|
141 | problem because they hard-code the expected filename encoding.
|
---|
142 | <application>UnZip</application> contains a hard-coded conversion table
|
---|
143 | between the CP850 (DOS) and ISO-8859-1 (UNIX) encodings and uses this table
|
---|
144 | when extracting archives created under DOS or Microsoft Windows. However,
|
---|
145 | this assumption only works for those in the US and not for anyone using a
|
---|
146 | UTF-8 locale. Non-ASCII characters will be mangled in the extracted
|
---|
147 | filenames.</para>
|
---|
148 |
|
---|
149 | <!--<para>On the other hand,
|
---|
150 | <application>Nautilus CD Burner</application> checks names of
|
---|
151 | files added to its window for UTF-8 validity. This is wrong for
|
---|
152 | users of non-UTF-8 locales. Also,
|
---|
153 | <application>Nautilus CD Burner</application> unconditionally
|
---|
154 | calls <command>mkisofs</command> with the
|
---|
155 | <parameter>-input-charset UTF-8</parameter> parameter, which is
|
---|
156 | only correct in UTF-8 locales.</para>-->
|
---|
157 |
|
---|
158 | <para>The general rule for avoiding this class of problems is to
|
---|
159 | avoid installing broken programs. If this is impossible, the
|
---|
160 | <ulink url="https://j3e.de/linux/convmv/">convmv</ulink>
|
---|
161 | command-line tool can be used to fix filenames created by these
|
---|
162 | broken programs, or intentionally mangle the existing filenames
|
---|
163 | to meet the broken expectations of such programs.</para>
|
---|
164 |
|
---|
165 | <para>In other cases, a similar problem is caused by importing
|
---|
166 | filenames from a system using a different locale with a tool that
|
---|
167 | is not locale-aware (e.g., <!--<xref linkend="nfs-utils"/> or-->
|
---|
168 | <xref linkend="openssh"/>). In order to avoid mangling non-ASCII
|
---|
169 | characters when transferring files to a system with a different
|
---|
170 | locale, any of the following methods can be used:</para>
|
---|
171 |
|
---|
172 | <itemizedlist>
|
---|
173 | <listitem>
|
---|
174 | <para>Transfer anyway, fix the damage with
|
---|
175 | <command>convmv</command>.</para>
|
---|
176 | </listitem>
|
---|
177 | <listitem>
|
---|
178 | <para>On the sending side, create a tar archive with the
|
---|
179 | <parameter>--format=posix</parameter> switch passed to
|
---|
180 | <command>tar</command> (this will be the default in a future
|
---|
181 | version of <command>tar</command>).</para>
|
---|
182 | </listitem>
|
---|
183 | <listitem>
|
---|
184 | <para>Mail the files as attachments. Mail clients specify the
|
---|
185 | encoding of attached filenames.</para>
|
---|
186 | </listitem>
|
---|
187 | <listitem>
|
---|
188 | <para>Write the files to a removable disk formatted with a FAT or
|
---|
189 | FAT32 filesystem.</para>
|
---|
190 | </listitem>
|
---|
191 | <listitem>
|
---|
192 | <para>Transfer the files using Samba.</para>
|
---|
193 | </listitem>
|
---|
194 | <listitem>
|
---|
195 | <para>Transfer the files via FTP using RFC2640-aware server
|
---|
196 | (this currently means only wu-ftpd, which has bad security history)
|
---|
197 | and client (e.g., lftp).</para>
|
---|
198 | </listitem>
|
---|
199 | </itemizedlist>
|
---|
200 |
|
---|
201 | <para>The last four methods work because the filenames are automatically
|
---|
202 | converted from the sender's locale to UNICODE and stored or sent in this
|
---|
203 | form. They are then transparently converted from UNICODE to the
|
---|
204 | recipient's locale encoding.</para>
|
---|
205 |
|
---|
206 | </sect2>
|
---|
207 |
|
---|
208 | <sect2 id="locale-wrong-multibyte-characters"
|
---|
209 | xreflabel="Breaks Multibyte Characters">
|
---|
210 |
|
---|
211 | <title>The Program Breaks Multibyte Characters or Doesn't Count
|
---|
212 | Character Cells Correctly</title>
|
---|
213 |
|
---|
214 | <para>Severity: High or critical</para>
|
---|
215 |
|
---|
216 | <para>Many programs were written in an older era where multibyte
|
---|
217 | locales were not common. Such programs assume that C "char" data
|
---|
218 | type, which is one byte, can be used to store single characters.
|
---|
219 | Further, they assume that any sequence of characters is a valid
|
---|
220 | string and that every character occupies a single character cell.
|
---|
221 | Such assumptions completely break in UTF-8 locales. The visible
|
---|
222 | manifestation is that the program truncates strings prematurely
|
---|
223 | (i.e., at 80 bytes instead of 80 characters). Terminal-based
|
---|
224 | programs don't place the cursor correctly on the screen, don't react
|
---|
225 | to the "Backspace" key by erasing one character, and leave junk
|
---|
226 | characters around when updating the screen, usually turning the
|
---|
227 | screen into a complete mess.</para>
|
---|
228 |
|
---|
229 | <para>Fixing this kind of problems is a tedious task from a
|
---|
230 | programmer's point of view, like all other cases of retrofitting new
|
---|
231 | concepts into the old flawed design. In this case, one has to redesign
|
---|
232 | all data structures in order to accommodate to the fact that a complete
|
---|
233 | character may span a variable number of "char"s (or switch to wchar_t
|
---|
234 | and convert as needed). Also, for every call to the "strlen" and
|
---|
235 | similar functions, find out whether a number of bytes, a number of
|
---|
236 | characters, or the width of the string was really meant. Sometimes it
|
---|
237 | is faster to write a program with the same functionality from scratch.
|
---|
238 | </para>
|
---|
239 |
|
---|
240 | <para>Among BLFS packages, this problem applies to
|
---|
241 | <xref linkend="xine-ui"/> and all the shells.</para>
|
---|
242 |
|
---|
243 | </sect2>
|
---|
244 |
|
---|
245 | <sect2 id="locale-wrong-manpage-encoding"
|
---|
246 | xreflabel="Incorrect Manual Page Encoding">
|
---|
247 |
|
---|
248 | <title>The Package Installs Manual Pages in Incorrect or
|
---|
249 | Non-Displayable Encoding</title>
|
---|
250 |
|
---|
251 | <para>Severity: Low</para>
|
---|
252 |
|
---|
253 | <para>LFS expects that manual pages are in the language-specific (usually
|
---|
254 | 8-bit) encoding, as specified on the <ulink
|
---|
255 | url="&lfs-root;/chapter08/man-db.html">LFS Man DB page</ulink>. However,
|
---|
256 | some packages install translated manual pages in UTF-8 encoding (e.g.,
|
---|
257 | Shadow, already dealt with), or manual pages in languages not in the table.
|
---|
258 | Not all BLFS packages have been audited for conformance with the
|
---|
259 | requirements put in LFS (the large majority have been checked, and fixes
|
---|
260 | placed in the book for packages known to install non-conforming manual
|
---|
261 | pages). If you find a manual page installed by any of BLFS packages that is
|
---|
262 | obviously in the wrong encoding, please remove or convert it as needed, and
|
---|
263 | report this to BLFS team as a bug.</para>
|
---|
264 |
|
---|
265 | <para>You can easily check your system for any non-conforming manual pages
|
---|
266 | by copying the following short shell script to some accessible location,
|
---|
267 |
|
---|
268 | <screen><literal>#!/bin/sh
|
---|
269 | # Begin checkman.sh
|
---|
270 | # Usage: find /usr/share/man -type f | xargs checkman.sh
|
---|
271 | for a in "$@"
|
---|
272 | do
|
---|
273 | # echo "Checking $a..."
|
---|
274 | # Pure-ASCII manual page (possibly except comments) is OK
|
---|
275 | grep -v '.\\"' "$a" | iconv -f US-ASCII -t US-ASCII >/dev/null 2>&1 \
|
---|
276 | && continue
|
---|
277 | # Non-UTF-8 manual page is OK
|
---|
278 | iconv -f UTF-8 -t UTF-8 "$a" >/dev/null 2>&1 || continue
|
---|
279 | # Found a UTF-8 manual page, bad.
|
---|
280 | echo "UTF-8 manual page: $a" >&2
|
---|
281 | done
|
---|
282 | # End checkman.sh
|
---|
283 | </literal></screen>
|
---|
284 |
|
---|
285 | and then issuing the following command (modify the command below if the
|
---|
286 | <command>checkman.sh</command> script is not in your <envar>PATH</envar>
|
---|
287 | environment variable):</para>
|
---|
288 |
|
---|
289 | <screen><userinput>find /usr/share/man -type f | xargs checkman.sh</userinput></screen>
|
---|
290 |
|
---|
291 | <para>Note that if you have manual pages installed in any location other
|
---|
292 | than <filename class='directory'>/usr/share/man</filename> (e.g.,
|
---|
293 | <filename class='directory'>/usr/local/share/man</filename>), you must
|
---|
294 | modify the above command to include this additional location.</para>
|
---|
295 |
|
---|
296 | </sect2>
|
---|
297 |
|
---|
298 | </sect1>
|
---|