[9c90b1b] | 1 | <?xml version="1.0" encoding="ISO-8859-1"?>
|
---|
[6732c094] | 2 | <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
|
---|
| 3 | "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
|
---|
[9c90b1b] | 4 | <!ENTITY % general-entities SYSTEM "../../general.ent">
|
---|
| 5 | %general-entities;
|
---|
| 6 | ]>
|
---|
| 7 |
|
---|
| 8 | <sect1 id="locale-issues" xreflabel="Locale Related Issues">
|
---|
| 9 | <?dbhtml filename="locale-issues.html"?>
|
---|
| 10 |
|
---|
| 11 | <sect1info>
|
---|
[3cd5c69] | 12 | <othername>$LastChangedBy$</othername>
|
---|
| 13 | <date>$Date$</date>
|
---|
[9c90b1b] | 14 | </sect1info>
|
---|
| 15 |
|
---|
| 16 | <title>Locale Related Issues</title>
|
---|
| 17 |
|
---|
| 18 | <para>This page contains information about locale related problems and
|
---|
[86eaa277] | 19 | issues. In the following paragraphs you'll find a generic overview of
|
---|
| 20 | things that can come up when configuring your system for various locales.
|
---|
[e65a39d] | 21 | Many (but not all) existing locale related problems can be classified
|
---|
[86eaa277] | 22 | and fall under one of the headings below. The severity ratings below use
|
---|
| 23 | the following criteria:</para>
|
---|
| 24 |
|
---|
| 25 | <itemizedlist>
|
---|
| 26 | <listitem>
|
---|
| 27 | <para>Critical: The program doesn't perform its main function.
|
---|
| 28 | The fix would be very intrusive, it's better to search for a
|
---|
| 29 | replacement.</para>
|
---|
| 30 | </listitem>
|
---|
| 31 | <listitem>
|
---|
| 32 | <para>High: Part of the functionality that the program provides
|
---|
| 33 | is not usable. If that functionality is required, it's better to
|
---|
| 34 | search for a replacement.</para>
|
---|
| 35 | </listitem>
|
---|
| 36 | <listitem>
|
---|
| 37 | <para>Low: The program works in all typical use cases, but lacks
|
---|
| 38 | some functionality normally provided by its equivalents.</para>
|
---|
| 39 | </listitem>
|
---|
| 40 | </itemizedlist>
|
---|
| 41 |
|
---|
| 42 | <para>If there is a known workaround for a specific package, it will
|
---|
[e65a39d] | 43 | appear on that package's page. For the most recent information
|
---|
| 44 | about locale related issues for individual packages, check the
|
---|
| 45 | <ulink url="&blfs-wiki;/BlfsNotes">User Notes</ulink> in the BLFS
|
---|
| 46 | Wiki.</para>
|
---|
[86eaa277] | 47 |
|
---|
| 48 | <sect2 id="locale-not-valid-option"
|
---|
| 49 | xreflabel="Needed Encoding Not a Valid Option">
|
---|
| 50 |
|
---|
| 51 | <title>The Needed Encoding is Not a Valid Option in the Program</title>
|
---|
| 52 |
|
---|
| 53 | <para>Severity: Critical</para>
|
---|
| 54 |
|
---|
| 55 | <para>Some programs require the user to specify the character encoding
|
---|
| 56 | for their input or output data and present only a limited choice of
|
---|
| 57 | encodings. This is the case for the <option>-X</option> option in
|
---|
| 58 | <xref linkend="a2ps"/> and <xref linkend="enscript"/>,
|
---|
| 59 | the <option>-input-charset</option> option in unpatched
|
---|
| 60 | <xref linkend="cdrtools"/>, and the character sets offered for display
|
---|
[648e8bc] | 61 | in the menu of <xref linkend="Links"/>. If the required encoding is not
|
---|
[86eaa277] | 62 | in the list, the program usually becomes completely unusable. For
|
---|
| 63 | non-interactive programs, it may be possible to work around this by
|
---|
| 64 | converting the document to a supported input character set before
|
---|
| 65 | submitting to the program.</para>
|
---|
| 66 |
|
---|
| 67 | <para>A solution to this type of problem is to implement the necessary
|
---|
| 68 | support for the missing encoding as a patch to the original program
|
---|
| 69 | (as done for <xref linkend="cdrtools"/> in this book), or to find a
|
---|
| 70 | replacement.</para>
|
---|
[9c90b1b] | 71 |
|
---|
[86eaa277] | 72 | </sect2>
|
---|
[9c90b1b] | 73 |
|
---|
[86eaa277] | 74 | <sect2 id="locale-assumed-encoding"
|
---|
| 75 | xreflabel="Program Assumes Encoding">
|
---|
| 76 |
|
---|
| 77 | <title>The Program Assumes the Locale-Based Encoding of External
|
---|
| 78 | Documents</title>
|
---|
| 79 |
|
---|
| 80 | <para>Severity: High for non-text documents, low for text
|
---|
| 81 | documents</para>
|
---|
| 82 |
|
---|
| 83 | <para>Some programs, <xref linkend="nano"/> or
|
---|
| 84 | <xref linkend="joe"/> for example, assume that documents are always
|
---|
| 85 | in the encoding implied by the current locale. While this assumption
|
---|
| 86 | may be valid for the user-created documents, it is not safe for
|
---|
| 87 | external ones. When this assumption fails, non-ASCII characters are
|
---|
| 88 | displayed incorrectly, and the document may become unreadable.</para>
|
---|
| 89 |
|
---|
| 90 | <para>If the external document is entirely text based, it can be
|
---|
| 91 | converted to the current locale encoding using the
|
---|
| 92 | <command>iconv</command> program.</para>
|
---|
| 93 |
|
---|
| 94 | <para>For documents that are not text-based, this is not possible.
|
---|
| 95 | In fact, the assumption made in the program may be completely
|
---|
| 96 | invalid for documents where the Microsoft Windows operating system
|
---|
| 97 | has set de facto standards. An example of this problem is ID3v1 tags
|
---|
[29f80ebc] | 98 | in MP3 files (see the <ulink url="&blfs-wiki;/ID3v1Coding">BLFS Wiki
|
---|
[648e8bc] | 99 | ID3v1Coding page</ulink>
|
---|
[86eaa277] | 100 | for more details). For these cases, the only solution is to find a
|
---|
| 101 | replacement program that doesn't have the issue (e.g., one that
|
---|
| 102 | will allow you to specify the assumed document encoding).</para>
|
---|
| 103 |
|
---|
| 104 | <para>Among BLFS packages, this problem applies to
|
---|
| 105 | <xref linkend="nano"/>, <xref linkend="joe"/>, and all media players
|
---|
| 106 | except <xref linkend="audacious"/>.</para>
|
---|
| 107 |
|
---|
| 108 | <para>Another problem in this category is when someone cannot read
|
---|
| 109 | the documents you've sent them because their operating system is
|
---|
| 110 | set up to handle character encodings differently. This can happen
|
---|
| 111 | often when the other person is using Microsoft Windows, which only
|
---|
| 112 | provides one character encoding for a given country. For example,
|
---|
| 113 | this causes problems with UTF-8 encoded TeX documents created in
|
---|
| 114 | Linux. On Windows, most applications will assume that these documents
|
---|
| 115 | have been created using the default Windows 8-bit encoding. See the
|
---|
| 116 | <ulink url="&blfs-wiki;/tetex">teTeX</ulink> Wiki page for more
|
---|
| 117 | details.</para>
|
---|
| 118 |
|
---|
[864b24de] | 119 | <para>In extreme cases, Windows encoding compatibility issues may be
|
---|
[86eaa277] | 120 | solved only by running Windows programs under
|
---|
| 121 | <ulink url="http://www.winehq.com/">Wine</ulink>.</para>
|
---|
[9c90b1b] | 122 |
|
---|
[86eaa277] | 123 | </sect2>
|
---|
[9c90b1b] | 124 |
|
---|
[86eaa277] | 125 | <sect2 id="locale-wrong-filename-encoding"
|
---|
| 126 | xreflabel="Wrong Filename Encoding">
|
---|
| 127 |
|
---|
| 128 | <title>The Program Uses or Creates Filenames in the Wrong Encoding</title>
|
---|
| 129 |
|
---|
| 130 | <para>Severity: Critical</para>
|
---|
| 131 |
|
---|
| 132 | <para>The POSIX standard mandates that the filename encoding is
|
---|
| 133 | the encoding implied by the current LC_CTYPE locale category. This
|
---|
| 134 | information is well-hidden on the page which specifies the behavior
|
---|
| 135 | of <application>Tar</application> and <application>Cpio</application>
|
---|
[864b24de] | 136 | programs. Some programs get it wrong by default (or simply don't
|
---|
[86eaa277] | 137 | have enough information to get it right). The result is that they
|
---|
| 138 | create filenames which are not subsequently shown correctly by
|
---|
| 139 | <command>ls</command>, or they refuse to accept filenames that
|
---|
| 140 | <command>ls</command> shows properly. For the <xref linkend="glib2"/>
|
---|
| 141 | library, the problem can be corrected by setting the
|
---|
| 142 | <envar>G_FILENAME_ENCODING</envar> environment variable to the special
|
---|
| 143 | "@locale" value. <application>Glib2</application> based programs that
|
---|
| 144 | don't respect that environment variable are buggy.</para>
|
---|
| 145 |
|
---|
| 146 | <para>The <xref linkend="zip"/>, <xref linkend="unzip"/>, and
|
---|
| 147 | <xref linkend="nautilus-cd-burner"/> have this problem because
|
---|
| 148 | they hard-code the expected filename encoding.
|
---|
| 149 | <application>UnZip</application> contains a hard-coded conversion
|
---|
| 150 | table between the CP850 (DOS) and ISO-8859-1 (UNIX) encodings and
|
---|
| 151 | uses this table when extracting archives created under DOS or
|
---|
| 152 | Microsoft Windows. However, this assumption only works for those
|
---|
| 153 | in the US and not for anyone using a UTF-8 locale. Non-ASCII
|
---|
| 154 | characters will be mangled in the extracted filenames.</para>
|
---|
| 155 |
|
---|
| 156 | <para>On the other hand,
|
---|
| 157 | <application>Nautilus CD Burner</application> checks names of
|
---|
| 158 | files added to its window for UTF-8 validity. This is wrong for
|
---|
| 159 | users of non-UTF-8 locales. Also,
|
---|
| 160 | <application>Nautilus CD Burner</application> unconditionally
|
---|
| 161 | calls <command>mkisofs</command> with the
|
---|
| 162 | <parameter>-input-charset UTF-8</parameter> parameter, which is
|
---|
| 163 | only correct in UTF-8 locales.</para>
|
---|
| 164 |
|
---|
[864b24de] | 165 | <para>The general rule for avoiding this class of problems is to
|
---|
[86eaa277] | 166 | avoid installing broken programs. If this is impossible, the
|
---|
| 167 | <ulink url="http://j3e.de/linux/convmv/">convmv</ulink>
|
---|
| 168 | command-line tool can be used to fix filenames created by these
|
---|
| 169 | broken programs, or intentionally mangle the existing filenames
|
---|
| 170 | to meet the broken expectations of such programs.</para>
|
---|
| 171 |
|
---|
| 172 | <para>In other cases, a similar problem is caused by importing
|
---|
| 173 | filenames from a system using a different locale with a tool that
|
---|
| 174 | is not locale-aware (e.g., <xref linkend="nfs-utils"/> or
|
---|
[864b24de] | 175 | <xref linkend="openssh"/>). In order to avoid mangling non-ASCII
|
---|
[86eaa277] | 176 | characters when transferring files to a system with a different
|
---|
| 177 | locale, any of the following methods can be used:</para>
|
---|
[9c90b1b] | 178 |
|
---|
[86eaa277] | 179 | <itemizedlist>
|
---|
[3cd5c69] | 180 | <listitem>
|
---|
[86eaa277] | 181 | <para>Transfer anyway, fix the damage with
|
---|
| 182 | <command>convmv</command>.</para>
|
---|
[3cd5c69] | 183 | </listitem>
|
---|
[9c90b1b] | 184 | <listitem>
|
---|
[864b24de] | 185 | <para>On the sending side, create a tar archive with the
|
---|
[86eaa277] | 186 | <parameter>--format=posix</parameter> switch passed to
|
---|
[864b24de] | 187 | <command>tar</command> (this will be the default in a future
|
---|
[86eaa277] | 188 | version of <command>tar</command>).</para>
|
---|
[9c90b1b] | 189 | </listitem>
|
---|
[f6b83352] | 190 | <listitem>
|
---|
[86eaa277] | 191 | <para>Mail the files as attachments. Mail clients specify the
|
---|
| 192 | encoding of attached filenames.</para>
|
---|
| 193 | </listitem>
|
---|
| 194 | <listitem>
|
---|
| 195 | <para>Write the files to a removable disk formatted with a FAT or
|
---|
| 196 | FAT32 filesystem.</para>
|
---|
| 197 | </listitem>
|
---|
| 198 | <listitem>
|
---|
| 199 | <para>Transfer the files using Samba.</para>
|
---|
| 200 | </listitem>
|
---|
| 201 | <listitem>
|
---|
| 202 | <para>Transfer the files via FTP using RFC2640-aware server
|
---|
| 203 | (this currently means only wu-ftpd, which has bad security history)
|
---|
| 204 | and client (e.g., lftp).</para>
|
---|
[f6b83352] | 205 | </listitem>
|
---|
[9c90b1b] | 206 | </itemizedlist>
|
---|
| 207 |
|
---|
[86eaa277] | 208 | <para>The last four methods work because the filenames are automatically
|
---|
| 209 | converted from the sender's locale to UNICODE and stored or sent in this
|
---|
| 210 | form. They are then transparently converted from UNICODE to the
|
---|
| 211 | recipient's locale encoding.</para>
|
---|
| 212 |
|
---|
| 213 | </sect2>
|
---|
| 214 |
|
---|
| 215 | <sect2 id="locale-wrong-multibyte-characters"
|
---|
[a4b9cd7] | 216 | xreflabel="Breaks Multibyte Characters">
|
---|
[86eaa277] | 217 |
|
---|
| 218 | <title>The Program Breaks Multibyte Characters or Doesn't Count
|
---|
| 219 | Character Cells Correctly</title>
|
---|
| 220 |
|
---|
| 221 | <para>Severity: High or critical</para>
|
---|
| 222 |
|
---|
| 223 | <para>Many programs were written in an older era where multibyte
|
---|
| 224 | locales were not common. Such programs assume that C "char" data
|
---|
| 225 | type, which is one byte, can be used to store single characters.
|
---|
| 226 | Further, they assume that any sequence of characters is a valid
|
---|
| 227 | string and that every character occupies a single character cell.
|
---|
| 228 | Such assumptions completely break in UTF-8 locales. The visible
|
---|
| 229 | manifestation is that the program truncates strings prematurely
|
---|
| 230 | (i.e., at 80 bytes instead of 80 characters). Terminal-based
|
---|
| 231 | programs don't place the cursor correctly on the screen, don't react
|
---|
| 232 | to the "Backspace" key by erasing one character, and leave junk
|
---|
| 233 | characters around when updating the screen, usually turning the
|
---|
| 234 | screen into a complete mess.</para>
|
---|
| 235 |
|
---|
[864b24de] | 236 | <para>Fixing this kind of problems is a tedious task from a
|
---|
| 237 | programmer's point of view, like all other cases of retrofitting new
|
---|
| 238 | concepts into the old flawed design. In this case, one has to redesign
|
---|
| 239 | all data structures in order to accommodate to the fact that a complete
|
---|
| 240 | character may span a variable number of "char"s (or switch to wchar_t
|
---|
| 241 | and convert as needed). Also, for every call to the "strlen" and
|
---|
| 242 | similar functions, find out whether a number of bytes, a number of
|
---|
| 243 | characters, or the width of the string was really meant. Sometimes it
|
---|
[86eaa277] | 244 | is faster to write a program with the same functionality from scratch.
|
---|
| 245 | </para>
|
---|
| 246 |
|
---|
[5aeb97df] | 247 | <para>Among BLFS packages, this problem applies to
|
---|
[1fc6df6] | 248 | <xref linkend="xine-ui"/> and all the shells.</para>
|
---|
[f6b83352] | 249 |
|
---|
[9c90b1b] | 250 | </sect2>
|
---|
| 251 |
|
---|
[c6c037c] | 252 | <sect2 id="locale-wrong-manpage-encoding"
|
---|
| 253 | xreflabel="Incorrect Manual Page Encoding">
|
---|
| 254 |
|
---|
| 255 | <title>The Package Installs Manual Pages in Incorrect or
|
---|
| 256 | Non-Displayable Encoding</title>
|
---|
| 257 |
|
---|
| 258 | <para>Severity: Low</para>
|
---|
| 259 |
|
---|
| 260 | <para>LFS expects that manual pages are in the language-specific (usually
|
---|
[648e8bc] | 261 | 8-bit) encoding, as specified on the <ulink
|
---|
| 262 | url="&lfs-root;/chapter06/man-db.html">LFS Man DB page</ulink>. However,
|
---|
| 263 | some packages install translated manual pages in UTF-8 encoding (e.g.,
|
---|
| 264 | Shadow, already dealt with), or manual pages in languages not in the table.
|
---|
| 265 | Not all BLFS packages have been audited for conformance with the
|
---|
| 266 | requirements put in LFS (the large majority have been checked, and fixes
|
---|
| 267 | placed in the book for packages known to install non-conforming manual
|
---|
| 268 | pages). If you find a manual page installed by any of BLFS packages that is
|
---|
| 269 | obviously in the wrong encoding, please remove or convert it as needed, and
|
---|
[29f80ebc] | 270 | report this to BLFS team as a bug.</para>
|
---|
[a45a7bc] | 271 |
|
---|
| 272 | <para>You can easily check your system for any non-conforming manual pages
|
---|
| 273 | by copying the following short shell script to some accessible location,
|
---|
| 274 |
|
---|
| 275 | <screen><literal>#!/bin/sh
|
---|
| 276 | # Begin checkman.sh
|
---|
| 277 | # Usage: find /usr/share/man -type f | xargs checkman.sh
|
---|
| 278 | for a in "$@"
|
---|
| 279 | do
|
---|
| 280 | # echo "Checking $a..."
|
---|
| 281 | # Pure-ASCII manual page (possibly except comments) is OK
|
---|
[9a003fe1] | 282 | grep -v '.\\"' "$a" | iconv -f US-ASCII -t US-ASCII >/dev/null 2>&1 \
|
---|
| 283 | && continue
|
---|
[a45a7bc] | 284 | # Non-UTF-8 manual page is OK
|
---|
| 285 | iconv -f UTF-8 -t UTF-8 "$a" >/dev/null 2>&1 || continue
|
---|
| 286 | # If we got here, we found UTF-8 manual page, bad.
|
---|
| 287 | echo "UTF-8 manual page: $a" >&2
|
---|
| 288 | done
|
---|
| 289 | # End checkman.sh
|
---|
| 290 | </literal></screen>
|
---|
| 291 |
|
---|
| 292 | and then issuing the following command (modify the command below if the
|
---|
| 293 | <command>checkman.sh</command> script is not in your <envar>PATH</envar>
|
---|
| 294 | environment variable):</para>
|
---|
| 295 |
|
---|
| 296 | <screen><userinput>find /usr/share/man -type f | xargs checkman.sh</userinput></screen>
|
---|
| 297 |
|
---|
| 298 | <para>Note that if you have manual pages installed in any location other
|
---|
| 299 | than <filename class='directory'>/usr/share/man</filename> (e.g.,
|
---|
| 300 | <filename class='directory'>/usr/local/share/man</filename>), you must
|
---|
| 301 | modify the above command to include this additional location.</para>
|
---|
[c6c037c] | 302 |
|
---|
| 303 | </sect2>
|
---|
| 304 |
|
---|
[9c90b1b] | 305 | </sect1>
|
---|