source: introduction/important/locale-issues.xml@ eede1a3

11.0 11.1 11.2 11.3 12.0 12.1 kea ken/TL2024 ken/inkscape-core-mods ken/tuningfonts lazarus lxqt plabs/newcss plabs/python-mods python3.11 qt5new rahul/power-profiles-daemon renodr/vulkan-addition trunk upgradedb xry111/intltool xry111/llvm18 xry111/soup3 xry111/test-20220226 xry111/xf86-video-removal
Last change on this file since eede1a3 was 45ab6c7, checked in by Xi Ruoyao <xry111@…>, 3 years ago

more SVN prop clean up

Remove "$LastChanged$" everywhere, and also some unused $Date$

  • Property mode set to 100644
File size: 13.3 KB
Line 
1<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
4 <!ENTITY % general-entities SYSTEM "../../general.ent">
5 %general-entities;
6]>
7
8<sect1 id="locale-issues" xreflabel="Locale Related Issues">
9 <?dbhtml filename="locale-issues.html"?>
10
11 <sect1info>
12 <date>$Date$</date>
13 </sect1info>
14
15 <title>Locale Related Issues</title>
16
17 <para>This page contains information about locale related problems and
18 issues. In the following paragraphs you'll find a generic overview of
19 things that can come up when configuring your system for various locales.
20 Many (but not all) existing locale related problems can be classified
21 and fall under one of the headings below. The severity ratings below use
22 the following criteria:</para>
23
24 <itemizedlist>
25 <listitem>
26 <para>Critical: The program doesn't perform its main function.
27 The fix would be very intrusive, it's better to search for a
28 replacement.</para>
29 </listitem>
30 <listitem>
31 <para>High: Part of the functionality that the program provides
32 is not usable. If that functionality is required, it's better to
33 search for a replacement.</para>
34 </listitem>
35 <listitem>
36 <para>Low: The program works in all typical use cases, but lacks
37 some functionality normally provided by its equivalents.</para>
38 </listitem>
39 </itemizedlist>
40
41 <para>If there is a known workaround for a specific package, it will
42 appear on that package's page. For the most recent information
43 about locale related issues for individual packages, check the
44 <ulink url="&blfs-wiki;/BlfsNotes">User Notes</ulink> in the BLFS
45 Wiki.</para>
46
47 <sect2 id="locale-not-valid-option"
48 xreflabel="Needed Encoding Not a Valid Option">
49
50 <title>The Needed Encoding is Not a Valid Option in the Program</title>
51
52 <para>Severity: Critical</para>
53
54 <para>Some programs require the user to specify the character encoding
55 for their input or output data and present only a limited choice of
56 encodings. This is the case for the <option>-X</option> option in
57<!-- <xref linkend="a2ps"/> and --><xref linkend="enscript"/>,
58 the <option>-input-charset</option> option in unpatched
59 <xref linkend="cdrtools"/>, and the character sets offered for display
60 in the menu of <xref linkend="Links"/>. If the required encoding is not
61 in the list, the program usually becomes completely unusable. For
62 non-interactive programs, it may be possible to work around this by
63 converting the document to a supported input character set before
64 submitting to the program.</para>
65
66 <para>A solution to this type of problem is to implement the necessary
67 support for the missing encoding as a patch to the original program or to
68 find a replacement.</para>
69
70 </sect2>
71
72 <sect2 id="locale-assumed-encoding"
73 xreflabel="Program Assumes Encoding">
74
75 <title>The Program Assumes the Locale-Based Encoding of External
76 Documents</title>
77
78 <para>Severity: High for non-text documents, low for text
79 documents</para>
80
81 <para>Some programs, <xref linkend="nano"/> or
82 <xref linkend="joe"/> for example, assume that documents are always
83 in the encoding implied by the current locale. While this assumption
84 may be valid for the user-created documents, it is not safe for
85 external ones. When this assumption fails, non-ASCII characters are
86 displayed incorrectly, and the document may become unreadable.</para>
87
88 <para>If the external document is entirely text based, it can be
89 converted to the current locale encoding using the
90 <command>iconv</command> program.</para>
91
92 <para>For documents that are not text-based, this is not possible.
93 In fact, the assumption made in the program may be completely
94 invalid for documents where the Microsoft Windows operating system
95 has set de facto standards. An example of this problem is ID3v1 tags
96 in MP3 files (see the <ulink url="&blfs-wiki;/ID3v1Coding">BLFS Wiki
97 ID3v1Coding page</ulink>
98 for more details). For these cases, the only solution is to find a
99 replacement program that doesn't have the issue (e.g., one that
100 will allow you to specify the assumed document encoding).</para>
101
102 <para>Among BLFS packages, this problem applies to
103 <xref linkend="nano"/>, <xref linkend="joe"/>, and all media players
104 except <xref linkend="audacious"/>.</para>
105
106 <para>Another problem in this category is when someone cannot read
107 the documents you've sent them because their operating system is
108 set up to handle character encodings differently. This can happen
109 often when the other person is using Microsoft Windows, which only
110 provides one character encoding for a given country. For example,
111 this causes problems with UTF-8 encoded TeX documents created in
112 Linux. On Windows, most applications will assume that these documents
113 have been created using the default Windows 8-bit encoding.
114 </para>
115
116 <para>In extreme cases, Windows encoding compatibility issues may be
117 solved only by running Windows programs under
118 <ulink url="http://www.winehq.com/">Wine</ulink>.</para>
119
120 </sect2>
121
122 <sect2 id="locale-wrong-filename-encoding"
123 xreflabel="Wrong Filename Encoding">
124
125 <title>The Program Uses or Creates Filenames in the Wrong Encoding</title>
126
127 <para>Severity: Critical</para>
128
129 <para>The POSIX standard mandates that the filename encoding is
130 the encoding implied by the current LC_CTYPE locale category. This
131 information is well-hidden on the page which specifies the behavior
132 of <application>Tar</application> and <application>Cpio</application>
133 programs. Some programs get it wrong by default (or simply don't
134 have enough information to get it right). The result is that they
135 create filenames which are not subsequently shown correctly by
136 <command>ls</command>, or they refuse to accept filenames that
137 <command>ls</command> shows properly. For the <xref linkend="glib2"/>
138 library, the problem can be corrected by setting the
139 <envar>G_FILENAME_ENCODING</envar> environment variable to the special
140 "@locale" value. <application>Glib2</application> based programs that
141 don't respect that environment variable are buggy.</para>
142
143 <para>The <xref linkend="zip"/> and <xref linkend="unzip"/> have this
144 problem because they hard-code the expected filename encoding.
145 <application>UnZip</application> contains a hard-coded conversion table
146 between the CP850 (DOS) and ISO-8859-1 (UNIX) encodings and uses this table
147 when extracting archives created under DOS or Microsoft Windows. However,
148 this assumption only works for those in the US and not for anyone using a
149 UTF-8 locale. Non-ASCII characters will be mangled in the extracted
150 filenames.</para>
151
152 <!--<para>On the other hand,
153 <application>Nautilus CD Burner</application> checks names of
154 files added to its window for UTF-8 validity. This is wrong for
155 users of non-UTF-8 locales. Also,
156 <application>Nautilus CD Burner</application> unconditionally
157 calls <command>mkisofs</command> with the
158 <parameter>-input-charset UTF-8</parameter> parameter, which is
159 only correct in UTF-8 locales.</para>-->
160
161 <para>The general rule for avoiding this class of problems is to
162 avoid installing broken programs. If this is impossible, the
163 <ulink url="http://j3e.de/linux/convmv/">convmv</ulink>
164 command-line tool can be used to fix filenames created by these
165 broken programs, or intentionally mangle the existing filenames
166 to meet the broken expectations of such programs.</para>
167
168 <para>In other cases, a similar problem is caused by importing
169 filenames from a system using a different locale with a tool that
170 is not locale-aware (e.g., <!--<xref linkend="nfs-utils"/> or-->
171 <xref linkend="openssh"/>). In order to avoid mangling non-ASCII
172 characters when transferring files to a system with a different
173 locale, any of the following methods can be used:</para>
174
175 <itemizedlist>
176 <listitem>
177 <para>Transfer anyway, fix the damage with
178 <command>convmv</command>.</para>
179 </listitem>
180 <listitem>
181 <para>On the sending side, create a tar archive with the
182 <parameter>--format=posix</parameter> switch passed to
183 <command>tar</command> (this will be the default in a future
184 version of <command>tar</command>).</para>
185 </listitem>
186 <listitem>
187 <para>Mail the files as attachments. Mail clients specify the
188 encoding of attached filenames.</para>
189 </listitem>
190 <listitem>
191 <para>Write the files to a removable disk formatted with a FAT or
192 FAT32 filesystem.</para>
193 </listitem>
194 <listitem>
195 <para>Transfer the files using Samba.</para>
196 </listitem>
197 <listitem>
198 <para>Transfer the files via FTP using RFC2640-aware server
199 (this currently means only wu-ftpd, which has bad security history)
200 and client (e.g., lftp).</para>
201 </listitem>
202 </itemizedlist>
203
204 <para>The last four methods work because the filenames are automatically
205 converted from the sender's locale to UNICODE and stored or sent in this
206 form. They are then transparently converted from UNICODE to the
207 recipient's locale encoding.</para>
208
209 </sect2>
210
211 <sect2 id="locale-wrong-multibyte-characters"
212 xreflabel="Breaks Multibyte Characters">
213
214 <title>The Program Breaks Multibyte Characters or Doesn't Count
215 Character Cells Correctly</title>
216
217 <para>Severity: High or critical</para>
218
219 <para>Many programs were written in an older era where multibyte
220 locales were not common. Such programs assume that C "char" data
221 type, which is one byte, can be used to store single characters.
222 Further, they assume that any sequence of characters is a valid
223 string and that every character occupies a single character cell.
224 Such assumptions completely break in UTF-8 locales. The visible
225 manifestation is that the program truncates strings prematurely
226 (i.e., at 80 bytes instead of 80 characters). Terminal-based
227 programs don't place the cursor correctly on the screen, don't react
228 to the "Backspace" key by erasing one character, and leave junk
229 characters around when updating the screen, usually turning the
230 screen into a complete mess.</para>
231
232 <para>Fixing this kind of problems is a tedious task from a
233 programmer's point of view, like all other cases of retrofitting new
234 concepts into the old flawed design. In this case, one has to redesign
235 all data structures in order to accommodate to the fact that a complete
236 character may span a variable number of "char"s (or switch to wchar_t
237 and convert as needed). Also, for every call to the "strlen" and
238 similar functions, find out whether a number of bytes, a number of
239 characters, or the width of the string was really meant. Sometimes it
240 is faster to write a program with the same functionality from scratch.
241 </para>
242
243 <para>Among BLFS packages, this problem applies to
244 <xref linkend="xine-ui"/> and all the shells.</para>
245
246 </sect2>
247
248 <sect2 id="locale-wrong-manpage-encoding"
249 xreflabel="Incorrect Manual Page Encoding">
250
251 <title>The Package Installs Manual Pages in Incorrect or
252 Non-Displayable Encoding</title>
253
254 <para>Severity: Low</para>
255
256 <para>LFS expects that manual pages are in the language-specific (usually
257 8-bit) encoding, as specified on the <ulink
258 url="&lfs-root;/chapter08/man-db.html">LFS Man DB page</ulink>. However,
259 some packages install translated manual pages in UTF-8 encoding (e.g.,
260 Shadow, already dealt with), or manual pages in languages not in the table.
261 Not all BLFS packages have been audited for conformance with the
262 requirements put in LFS (the large majority have been checked, and fixes
263 placed in the book for packages known to install non-conforming manual
264 pages). If you find a manual page installed by any of BLFS packages that is
265 obviously in the wrong encoding, please remove or convert it as needed, and
266 report this to BLFS team as a bug.</para>
267
268 <para>You can easily check your system for any non-conforming manual pages
269 by copying the following short shell script to some accessible location,
270
271<screen><literal>#!/bin/sh
272# Begin checkman.sh
273# Usage: find /usr/share/man -type f | xargs checkman.sh
274for a in "$@"
275do
276 # echo "Checking $a..."
277 # Pure-ASCII manual page (possibly except comments) is OK
278 grep -v '.\\"' "$a" | iconv -f US-ASCII -t US-ASCII >/dev/null 2>&amp;1 \
279 &amp;&amp; continue
280 # Non-UTF-8 manual page is OK
281 iconv -f UTF-8 -t UTF-8 "$a" >/dev/null 2>&amp;1 || continue
282 # Found a UTF-8 manual page, bad.
283 echo "UTF-8 manual page: $a" >&amp;2
284done
285# End checkman.sh
286</literal></screen>
287
288 and then issuing the following command (modify the command below if the
289 <command>checkman.sh</command> script is not in your <envar>PATH</envar>
290 environment variable):</para>
291
292<screen><userinput>find /usr/share/man -type f | xargs checkman.sh</userinput></screen>
293
294 <para>Note that if you have manual pages installed in any location other
295 than <filename class='directory'>/usr/share/man</filename> (e.g.,
296 <filename class='directory'>/usr/local/share/man</filename>), you must
297 modify the above command to include this additional location.</para>
298
299 </sect2>
300
301</sect1>
Note: See TracBrowser for help on using the repository browser.