source: introduction/important/locale-issues.xml@ 9a003fe1

10.0 10.1 11.0 11.1 11.2 11.3 12.0 12.1 6.3 6.3-rc1 6.3-rc2 6.3-rc3 7.10 7.4 7.5 7.6 7.6-blfs 7.6-systemd 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 9.0 9.1 basic bdubbs/svn elogind gnome kde5-13430 kde5-14269 kde5-14686 kea ken/TL2024 ken/inkscape-core-mods ken/tuningfonts krejzi/svn lazarus lxqt nosym perl-modules plabs/newcss plabs/python-mods python3.11 qt5new rahul/power-profiles-daemon renodr/vulkan-addition systemd-11177 systemd-13485 trunk upgradedb xry111/intltool xry111/llvm18 xry111/soup3 xry111/test-20220226 xry111/xf86-video-removal
Last change on this file since 9a003fe1 was 9a003fe1, checked in by Manuel Canales Esparcia <manuel@…>, 17 years ago

Fixed remaining "paragraph overflows" warnings from FOP-0.93

git-svn-id: svn://svn.linuxfromscratch.org/BLFS/trunk/BOOK@6752 af4574ff-66df-0310-9fd7-8a98e5e911e0

  • Property mode set to 100644
File size: 13.5 KB
Line 
1<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
4 <!ENTITY % general-entities SYSTEM "../../general.ent">
5 %general-entities;
6]>
7
8<sect1 id="locale-issues" xreflabel="Locale Related Issues">
9 <?dbhtml filename="locale-issues.html"?>
10
11 <sect1info>
12 <othername>$LastChangedBy$</othername>
13 <date>$Date$</date>
14 </sect1info>
15
16 <title>Locale Related Issues</title>
17
18 <para>This page contains information about locale related problems and
19 issues. In the following paragraphs you'll find a generic overview of
20 things that can come up when configuring your system for various locales.
21 Many (but not all) existing locale related problems can be classified
22 and fall under one of the headings below. The severity ratings below use
23 the following criteria:</para>
24
25 <itemizedlist>
26 <listitem>
27 <para>Critical: The program doesn't perform its main function.
28 The fix would be very intrusive, it's better to search for a
29 replacement.</para>
30 </listitem>
31 <listitem>
32 <para>High: Part of the functionality that the program provides
33 is not usable. If that functionality is required, it's better to
34 search for a replacement.</para>
35 </listitem>
36 <listitem>
37 <para>Low: The program works in all typical use cases, but lacks
38 some functionality normally provided by its equivalents.</para>
39 </listitem>
40 </itemizedlist>
41
42 <para>If there is a known workaround for a specific package, it will
43 appear on that package's page. For the most recent information
44 about locale related issues for individual packages, check the
45 <ulink url="&blfs-wiki;/BlfsNotes">User Notes</ulink> in the BLFS
46 Wiki.</para>
47
48 <sect2 id="locale-not-valid-option"
49 xreflabel="Needed Encoding Not a Valid Option">
50
51 <title>The Needed Encoding is Not a Valid Option in the Program</title>
52
53 <para>Severity: Critical</para>
54
55 <para>Some programs require the user to specify the character encoding
56 for their input or output data and present only a limited choice of
57 encodings. This is the case for the <option>-X</option> option in
58 <xref linkend="a2ps"/> and <xref linkend="enscript"/>,
59 the <option>-input-charset</option> option in unpatched
60 <xref linkend="cdrtools"/>, and the character sets offered for display
61 in the menu of <xref linkend="Links"/>. If the required encoding is not
62 in the list, the program usually becomes completely unusable. For
63 non-interactive programs, it may be possible to work around this by
64 converting the document to a supported input character set before
65 submitting to the program.</para>
66
67 <para>A solution to this type of problem is to implement the necessary
68 support for the missing encoding as a patch to the original program
69 (as done for <xref linkend="cdrtools"/> in this book), or to find a
70 replacement.</para>
71
72 </sect2>
73
74 <sect2 id="locale-assumed-encoding"
75 xreflabel="Program Assumes Encoding">
76
77 <title>The Program Assumes the Locale-Based Encoding of External
78 Documents</title>
79
80 <para>Severity: High for non-text documents, low for text
81 documents</para>
82
83 <para>Some programs, <xref linkend="nano"/> or
84 <xref linkend="joe"/> for example, assume that documents are always
85 in the encoding implied by the current locale. While this assumption
86 may be valid for the user-created documents, it is not safe for
87 external ones. When this assumption fails, non-ASCII characters are
88 displayed incorrectly, and the document may become unreadable.</para>
89
90 <para>If the external document is entirely text based, it can be
91 converted to the current locale encoding using the
92 <command>iconv</command> program.</para>
93
94 <para>For documents that are not text-based, this is not possible.
95 In fact, the assumption made in the program may be completely
96 invalid for documents where the Microsoft Windows operating system
97 has set de facto standards. An example of this problem is ID3v1 tags
98 in MP3 files (see the <ulink url="&blfs-wiki;/ID3v1Coding">BLFS Wiki
99 ID3v1Coding page</ulink>
100 for more details). For these cases, the only solution is to find a
101 replacement program that doesn't have the issue (e.g., one that
102 will allow you to specify the assumed document encoding).</para>
103
104 <para>Among BLFS packages, this problem applies to
105 <xref linkend="nano"/>, <xref linkend="joe"/>, and all media players
106 except <xref linkend="audacious"/>.</para>
107
108 <para>Another problem in this category is when someone cannot read
109 the documents you've sent them because their operating system is
110 set up to handle character encodings differently. This can happen
111 often when the other person is using Microsoft Windows, which only
112 provides one character encoding for a given country. For example,
113 this causes problems with UTF-8 encoded TeX documents created in
114 Linux. On Windows, most applications will assume that these documents
115 have been created using the default Windows 8-bit encoding. See the
116 <ulink url="&blfs-wiki;/tetex">teTeX</ulink> Wiki page for more
117 details.</para>
118
119 <para>In extreme cases, Windows encoding compatibility issues may be
120 solved only by running Windows programs under
121 <ulink url="http://www.winehq.com/">Wine</ulink>.</para>
122
123 </sect2>
124
125 <sect2 id="locale-wrong-filename-encoding"
126 xreflabel="Wrong Filename Encoding">
127
128 <title>The Program Uses or Creates Filenames in the Wrong Encoding</title>
129
130 <para>Severity: Critical</para>
131
132 <para>The POSIX standard mandates that the filename encoding is
133 the encoding implied by the current LC_CTYPE locale category. This
134 information is well-hidden on the page which specifies the behavior
135 of <application>Tar</application> and <application>Cpio</application>
136 programs. Some programs get it wrong by default (or simply don't
137 have enough information to get it right). The result is that they
138 create filenames which are not subsequently shown correctly by
139 <command>ls</command>, or they refuse to accept filenames that
140 <command>ls</command> shows properly. For the <xref linkend="glib2"/>
141 library, the problem can be corrected by setting the
142 <envar>G_FILENAME_ENCODING</envar> environment variable to the special
143 "@locale" value. <application>Glib2</application> based programs that
144 don't respect that environment variable are buggy.</para>
145
146 <para>The <xref linkend="zip"/>, <xref linkend="unzip"/>, and
147 <xref linkend="nautilus-cd-burner"/> have this problem because
148 they hard-code the expected filename encoding.
149 <application>UnZip</application> contains a hard-coded conversion
150 table between the CP850 (DOS) and ISO-8859-1 (UNIX) encodings and
151 uses this table when extracting archives created under DOS or
152 Microsoft Windows. However, this assumption only works for those
153 in the US and not for anyone using a UTF-8 locale. Non-ASCII
154 characters will be mangled in the extracted filenames.</para>
155
156 <para>On the other hand,
157 <application>Nautilus CD Burner</application> checks names of
158 files added to its window for UTF-8 validity. This is wrong for
159 users of non-UTF-8 locales. Also,
160 <application>Nautilus CD Burner</application> unconditionally
161 calls <command>mkisofs</command> with the
162 <parameter>-input-charset UTF-8</parameter> parameter, which is
163 only correct in UTF-8 locales.</para>
164
165 <para>The general rule for avoiding this class of problems is to
166 avoid installing broken programs. If this is impossible, the
167 <ulink url="http://j3e.de/linux/convmv/">convmv</ulink>
168 command-line tool can be used to fix filenames created by these
169 broken programs, or intentionally mangle the existing filenames
170 to meet the broken expectations of such programs.</para>
171
172 <para>In other cases, a similar problem is caused by importing
173 filenames from a system using a different locale with a tool that
174 is not locale-aware (e.g., <xref linkend="nfs-utils"/> or
175 <xref linkend="openssh"/>). In order to avoid mangling non-ASCII
176 characters when transferring files to a system with a different
177 locale, any of the following methods can be used:</para>
178
179 <itemizedlist>
180 <listitem>
181 <para>Transfer anyway, fix the damage with
182 <command>convmv</command>.</para>
183 </listitem>
184 <listitem>
185 <para>On the sending side, create a tar archive with the
186 <parameter>--format=posix</parameter> switch passed to
187 <command>tar</command> (this will be the default in a future
188 version of <command>tar</command>).</para>
189 </listitem>
190 <listitem>
191 <para>Mail the files as attachments. Mail clients specify the
192 encoding of attached filenames.</para>
193 </listitem>
194 <listitem>
195 <para>Write the files to a removable disk formatted with a FAT or
196 FAT32 filesystem.</para>
197 </listitem>
198 <listitem>
199 <para>Transfer the files using Samba.</para>
200 </listitem>
201 <listitem>
202 <para>Transfer the files via FTP using RFC2640-aware server
203 (this currently means only wu-ftpd, which has bad security history)
204 and client (e.g., lftp).</para>
205 </listitem>
206 </itemizedlist>
207
208 <para>The last four methods work because the filenames are automatically
209 converted from the sender's locale to UNICODE and stored or sent in this
210 form. They are then transparently converted from UNICODE to the
211 recipient's locale encoding.</para>
212
213 </sect2>
214
215 <sect2 id="locale-wrong-multibyte-characters"
216 xreflabel="Breaks Multibyte Characters">
217
218 <title>The Program Breaks Multibyte Characters or Doesn't Count
219 Character Cells Correctly</title>
220
221 <para>Severity: High or critical</para>
222
223 <para>Many programs were written in an older era where multibyte
224 locales were not common. Such programs assume that C "char" data
225 type, which is one byte, can be used to store single characters.
226 Further, they assume that any sequence of characters is a valid
227 string and that every character occupies a single character cell.
228 Such assumptions completely break in UTF-8 locales. The visible
229 manifestation is that the program truncates strings prematurely
230 (i.e., at 80 bytes instead of 80 characters). Terminal-based
231 programs don't place the cursor correctly on the screen, don't react
232 to the "Backspace" key by erasing one character, and leave junk
233 characters around when updating the screen, usually turning the
234 screen into a complete mess.</para>
235
236 <para>Fixing this kind of problems is a tedious task from a
237 programmer's point of view, like all other cases of retrofitting new
238 concepts into the old flawed design. In this case, one has to redesign
239 all data structures in order to accommodate to the fact that a complete
240 character may span a variable number of "char"s (or switch to wchar_t
241 and convert as needed). Also, for every call to the "strlen" and
242 similar functions, find out whether a number of bytes, a number of
243 characters, or the width of the string was really meant. Sometimes it
244 is faster to write a program with the same functionality from scratch.
245 </para>
246
247 <para>Among BLFS packages, this problem applies to
248 <xref linkend="ed"/>, <xref linkend="xine-ui"/> and all shells.</para>
249
250 </sect2>
251
252 <sect2 id="locale-wrong-manpage-encoding"
253 xreflabel="Incorrect Manual Page Encoding">
254
255 <title>The Package Installs Manual Pages in Incorrect or
256 Non-Displayable Encoding</title>
257
258 <para>Severity: Low</para>
259
260 <para>LFS expects that manual pages are in the language-specific (usually
261 8-bit) encoding, as specified on the <ulink
262 url="&lfs-root;/chapter06/man-db.html">LFS Man DB page</ulink>. However,
263 some packages install translated manual pages in UTF-8 encoding (e.g.,
264 Shadow, already dealt with), or manual pages in languages not in the table.
265 Not all BLFS packages have been audited for conformance with the
266 requirements put in LFS (the large majority have been checked, and fixes
267 placed in the book for packages known to install non-conforming manual
268 pages). If you find a manual page installed by any of BLFS packages that is
269 obviously in the wrong encoding, please remove or convert it as needed, and
270 report this to BLFS team as a bug.</para>
271
272 <para>You can easily check your system for any non-conforming manual pages
273 by copying the following short shell script to some accessible location,
274
275<screen><literal>#!/bin/sh
276# Begin checkman.sh
277# Usage: find /usr/share/man -type f | xargs checkman.sh
278for a in "$@"
279do
280 # echo "Checking $a..."
281 # Pure-ASCII manual page (possibly except comments) is OK
282 grep -v '.\\"' "$a" | iconv -f US-ASCII -t US-ASCII >/dev/null 2>&amp;1 \
283 &amp;&amp; continue
284 # Non-UTF-8 manual page is OK
285 iconv -f UTF-8 -t UTF-8 "$a" >/dev/null 2>&amp;1 || continue
286 # If we got here, we found UTF-8 manual page, bad.
287 echo "UTF-8 manual page: $a" >&amp;2
288done
289# End checkman.sh
290</literal></screen>
291
292 and then issuing the following command (modify the command below if the
293 <command>checkman.sh</command> script is not in your <envar>PATH</envar>
294 environment variable):</para>
295
296<screen><userinput>find /usr/share/man -type f | xargs checkman.sh</userinput></screen>
297
298 <para>Note that if you have manual pages installed in any location other
299 than <filename class='directory'>/usr/share/man</filename> (e.g.,
300 <filename class='directory'>/usr/local/share/man</filename>), you must
301 modify the above command to include this additional location.</para>
302
303 </sect2>
304
305</sect1>
Note: See TracBrowser for help on using the repository browser.