source: introduction/important/locale-issues.xml@ 5a95524

11.3 12.0 12.1 kea ken/TL2024 ken/inkscape-core-mods ken/tuningfonts lazarus lxqt plabs/newcss python3.11 qt5new rahul/power-profiles-daemon renodr/vulkan-addition trunk xry111/llvm18 xry111/xf86-video-removal
Last change on this file since 5a95524 was 3f2db3a6, checked in by Pierre Labastie <pierre.labastie@…>, 18 months ago

Remove sect1info tags

They only contain a date tag that is nowhere used.

  • Property mode set to 100644
File size: 13.3 KB
Line 
1<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
4 <!ENTITY % general-entities SYSTEM "../../general.ent">
5 %general-entities;
6]>
7
8<sect1 id="locale-issues" xreflabel="Locale Related Issues">
9 <?dbhtml filename="locale-issues.html"?>
10
11
12 <title>Locale Related Issues</title>
13
14 <para>This page contains information about locale related problems and
15 issues. In the following paragraphs you'll find a generic overview of
16 things that can come up when configuring your system for various locales.
17 Many (but not all) existing locale related problems can be classified
18 and fall under one of the headings below. The severity ratings below use
19 the following criteria:</para>
20
21 <itemizedlist>
22 <listitem>
23 <para>Critical: The program doesn't perform its main function.
24 The fix would be very intrusive, it's better to search for a
25 replacement.</para>
26 </listitem>
27 <listitem>
28 <para>High: Part of the functionality that the program provides
29 is not usable. If that functionality is required, it's better to
30 search for a replacement.</para>
31 </listitem>
32 <listitem>
33 <para>Low: The program works in all typical use cases, but lacks
34 some functionality normally provided by its equivalents.</para>
35 </listitem>
36 </itemizedlist>
37
38 <para>If there is a known workaround for a specific package, it will
39 appear on that package's page. For the most recent information
40 about locale related issues for individual packages, check the
41 <ulink url="&blfs-wiki;/BlfsNotes">User Notes</ulink> in the BLFS
42 Wiki.</para>
43
44 <sect2 id="locale-not-valid-option"
45 xreflabel="Needed Encoding Not a Valid Option">
46
47 <title>The Needed Encoding is Not a Valid Option in the Program</title>
48
49 <para>Severity: Critical</para>
50
51 <para>Some programs require the user to specify the character encoding
52 for their input or output data and present only a limited choice of
53 encodings. This is the case for the <option>-X</option> option in
54<!-- <xref linkend="a2ps"/> and --><xref linkend="enscript"/>,
55 the <option>-input-charset</option> option in unpatched
56 <xref linkend="cdrtools"/>, and the character sets offered for display
57 in the menu of <xref linkend="Links"/>. If the required encoding is not
58 in the list, the program usually becomes completely unusable. For
59 non-interactive programs, it may be possible to work around this by
60 converting the document to a supported input character set before
61 submitting to the program.</para>
62
63 <para>A solution to this type of problem is to implement the necessary
64 support for the missing encoding as a patch to the original program or to
65 find a replacement.</para>
66
67 </sect2>
68
69 <sect2 id="locale-assumed-encoding"
70 xreflabel="Program Assumes Encoding">
71
72 <title>The Program Assumes the Locale-Based Encoding of External
73 Documents</title>
74
75 <para>Severity: High for non-text documents, low for text
76 documents</para>
77
78 <para>Some programs, <xref linkend="nano"/> or
79 <xref linkend="joe"/> for example, assume that documents are always
80 in the encoding implied by the current locale. While this assumption
81 may be valid for the user-created documents, it is not safe for
82 external ones. When this assumption fails, non-ASCII characters are
83 displayed incorrectly, and the document may become unreadable.</para>
84
85 <para>If the external document is entirely text based, it can be
86 converted to the current locale encoding using the
87 <command>iconv</command> program.</para>
88
89 <para>For documents that are not text-based, this is not possible.
90 In fact, the assumption made in the program may be completely
91 invalid for documents where the Microsoft Windows operating system
92 has set de facto standards. An example of this problem is ID3v1 tags
93 in MP3 files (see the <ulink url="&blfs-wiki;/ID3v1Coding">BLFS Wiki
94 ID3v1Coding page</ulink>
95 for more details). For these cases, the only solution is to find a
96 replacement program that doesn't have the issue (e.g., one that
97 will allow you to specify the assumed document encoding).</para>
98
99 <para>Among BLFS packages, this problem applies to
100 <xref linkend="nano"/>, <xref linkend="joe"/>, and all media players
101 except <xref linkend="audacious"/>.</para>
102
103 <para>Another problem in this category is when someone cannot read
104 the documents you've sent them because their operating system is
105 set up to handle character encodings differently. This can happen
106 often when the other person is using Microsoft Windows, which only
107 provides one character encoding for a given country. For example,
108 this causes problems with UTF-8 encoded TeX documents created in
109 Linux. On Windows, most applications will assume that these documents
110 have been created using the default Windows 8-bit encoding.
111 </para>
112
113 <para>In extreme cases, Windows encoding compatibility issues may be
114 solved only by running Windows programs under
115 <ulink url="https://www.winehq.com/">Wine</ulink>.</para>
116
117 </sect2>
118
119 <sect2 id="locale-wrong-filename-encoding"
120 xreflabel="Wrong Filename Encoding">
121
122 <title>The Program Uses or Creates Filenames in the Wrong Encoding</title>
123
124 <para>Severity: Critical</para>
125
126 <para>The POSIX standard mandates that the filename encoding is
127 the encoding implied by the current LC_CTYPE locale category. This
128 information is well-hidden on the page which specifies the behavior
129 of <application>Tar</application> and <application>Cpio</application>
130 programs. Some programs get it wrong by default (or simply don't
131 have enough information to get it right). The result is that they
132 create filenames which are not subsequently shown correctly by
133 <command>ls</command>, or they refuse to accept filenames that
134 <command>ls</command> shows properly. For the <xref linkend="glib2"/>
135 library, the problem can be corrected by setting the
136 <envar>G_FILENAME_ENCODING</envar> environment variable to the special
137 "@locale" value. <application>Glib2</application> based programs that
138 don't respect that environment variable are buggy.</para>
139
140 <para>The <xref linkend="zip"/> and <xref linkend="unzip"/> have this
141 problem because they hard-code the expected filename encoding.
142 <application>UnZip</application> contains a hard-coded conversion table
143 between the CP850 (DOS) and ISO-8859-1 (UNIX) encodings and uses this table
144 when extracting archives created under DOS or Microsoft Windows. However,
145 this assumption only works for those in the US and not for anyone using a
146 UTF-8 locale. Non-ASCII characters will be mangled in the extracted
147 filenames.</para>
148
149 <!--<para>On the other hand,
150 <application>Nautilus CD Burner</application> checks names of
151 files added to its window for UTF-8 validity. This is wrong for
152 users of non-UTF-8 locales. Also,
153 <application>Nautilus CD Burner</application> unconditionally
154 calls <command>mkisofs</command> with the
155 <parameter>-input-charset UTF-8</parameter> parameter, which is
156 only correct in UTF-8 locales.</para>-->
157
158 <para>The general rule for avoiding this class of problems is to
159 avoid installing broken programs. If this is impossible, the
160 <ulink url="https://j3e.de/linux/convmv/">convmv</ulink>
161 command-line tool can be used to fix filenames created by these
162 broken programs, or intentionally mangle the existing filenames
163 to meet the broken expectations of such programs.</para>
164
165 <para>In other cases, a similar problem is caused by importing
166 filenames from a system using a different locale with a tool that
167 is not locale-aware (e.g., <!--<xref linkend="nfs-utils"/> or-->
168 <xref linkend="openssh"/>). In order to avoid mangling non-ASCII
169 characters when transferring files to a system with a different
170 locale, any of the following methods can be used:</para>
171
172 <itemizedlist>
173 <listitem>
174 <para>Transfer anyway, fix the damage with
175 <command>convmv</command>.</para>
176 </listitem>
177 <listitem>
178 <para>On the sending side, create a tar archive with the
179 <parameter>--format=posix</parameter> switch passed to
180 <command>tar</command> (this will be the default in a future
181 version of <command>tar</command>).</para>
182 </listitem>
183 <listitem>
184 <para>Mail the files as attachments. Mail clients specify the
185 encoding of attached filenames.</para>
186 </listitem>
187 <listitem>
188 <para>Write the files to a removable disk formatted with a FAT or
189 FAT32 filesystem.</para>
190 </listitem>
191 <listitem>
192 <para>Transfer the files using Samba.</para>
193 </listitem>
194 <listitem>
195 <para>Transfer the files via FTP using RFC2640-aware server
196 (this currently means only wu-ftpd, which has bad security history)
197 and client (e.g., lftp).</para>
198 </listitem>
199 </itemizedlist>
200
201 <para>The last four methods work because the filenames are automatically
202 converted from the sender's locale to UNICODE and stored or sent in this
203 form. They are then transparently converted from UNICODE to the
204 recipient's locale encoding.</para>
205
206 </sect2>
207
208 <sect2 id="locale-wrong-multibyte-characters"
209 xreflabel="Breaks Multibyte Characters">
210
211 <title>The Program Breaks Multibyte Characters or Doesn't Count
212 Character Cells Correctly</title>
213
214 <para>Severity: High or critical</para>
215
216 <para>Many programs were written in an older era where multibyte
217 locales were not common. Such programs assume that C "char" data
218 type, which is one byte, can be used to store single characters.
219 Further, they assume that any sequence of characters is a valid
220 string and that every character occupies a single character cell.
221 Such assumptions completely break in UTF-8 locales. The visible
222 manifestation is that the program truncates strings prematurely
223 (i.e., at 80 bytes instead of 80 characters). Terminal-based
224 programs don't place the cursor correctly on the screen, don't react
225 to the "Backspace" key by erasing one character, and leave junk
226 characters around when updating the screen, usually turning the
227 screen into a complete mess.</para>
228
229 <para>Fixing this kind of problems is a tedious task from a
230 programmer's point of view, like all other cases of retrofitting new
231 concepts into the old flawed design. In this case, one has to redesign
232 all data structures in order to accommodate to the fact that a complete
233 character may span a variable number of "char"s (or switch to wchar_t
234 and convert as needed). Also, for every call to the "strlen" and
235 similar functions, find out whether a number of bytes, a number of
236 characters, or the width of the string was really meant. Sometimes it
237 is faster to write a program with the same functionality from scratch.
238 </para>
239
240 <para>Among BLFS packages, this problem applies to
241 <xref linkend="xine-ui"/> and all the shells.</para>
242
243 </sect2>
244
245 <sect2 id="locale-wrong-manpage-encoding"
246 xreflabel="Incorrect Manual Page Encoding">
247
248 <title>The Package Installs Manual Pages in Incorrect or
249 Non-Displayable Encoding</title>
250
251 <para>Severity: Low</para>
252
253 <para>LFS expects that manual pages are in the language-specific (usually
254 8-bit) encoding, as specified on the <ulink
255 url="&lfs-root;/chapter08/man-db.html">LFS Man DB page</ulink>. However,
256 some packages install translated manual pages in UTF-8 encoding (e.g.,
257 Shadow, already dealt with), or manual pages in languages not in the table.
258 Not all BLFS packages have been audited for conformance with the
259 requirements put in LFS (the large majority have been checked, and fixes
260 placed in the book for packages known to install non-conforming manual
261 pages). If you find a manual page installed by any of BLFS packages that is
262 obviously in the wrong encoding, please remove or convert it as needed, and
263 report this to BLFS team as a bug.</para>
264
265 <para>You can easily check your system for any non-conforming manual pages
266 by copying the following short shell script to some accessible location,
267
268<screen><literal>#!/bin/sh
269# Begin checkman.sh
270# Usage: find /usr/share/man -type f | xargs checkman.sh
271for a in "$@"
272do
273 # echo "Checking $a..."
274 # Pure-ASCII manual page (possibly except comments) is OK
275 grep -v '.\\"' "$a" | iconv -f US-ASCII -t US-ASCII >/dev/null 2>&amp;1 \
276 &amp;&amp; continue
277 # Non-UTF-8 manual page is OK
278 iconv -f UTF-8 -t UTF-8 "$a" >/dev/null 2>&amp;1 || continue
279 # Found a UTF-8 manual page, bad.
280 echo "UTF-8 manual page: $a" >&amp;2
281done
282# End checkman.sh
283</literal></screen>
284
285 and then issuing the following command (modify the command below if the
286 <command>checkman.sh</command> script is not in your <envar>PATH</envar>
287 environment variable):</para>
288
289<screen><userinput>find /usr/share/man -type f | xargs checkman.sh</userinput></screen>
290
291 <para>Note that if you have manual pages installed in any location other
292 than <filename class='directory'>/usr/share/man</filename> (e.g.,
293 <filename class='directory'>/usr/local/share/man</filename>), you must
294 modify the above command to include this additional location.</para>
295
296 </sect2>
297
298</sect1>
Note: See TracBrowser for help on using the repository browser.