[9c90b1b] | 1 | <?xml version="1.0" encoding="ISO-8859-1"?>
|
---|
[6732c094] | 2 | <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
|
---|
| 3 | "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
|
---|
[9c90b1b] | 4 | <!ENTITY % general-entities SYSTEM "../../general.ent">
|
---|
| 5 | %general-entities;
|
---|
| 6 | ]>
|
---|
| 7 |
|
---|
| 8 | <sect1 id="locale-issues" xreflabel="Locale Related Issues">
|
---|
| 9 | <?dbhtml filename="locale-issues.html"?>
|
---|
| 10 |
|
---|
| 11 |
|
---|
| 12 | <title>Locale Related Issues</title>
|
---|
| 13 |
|
---|
| 14 | <para>This page contains information about locale related problems and
|
---|
[86eaa277] | 15 | issues. In the following paragraphs you'll find a generic overview of
|
---|
| 16 | things that can come up when configuring your system for various locales.
|
---|
[e65a39d] | 17 | Many (but not all) existing locale related problems can be classified
|
---|
[86eaa277] | 18 | and fall under one of the headings below. The severity ratings below use
|
---|
| 19 | the following criteria:</para>
|
---|
| 20 |
|
---|
| 21 | <itemizedlist>
|
---|
| 22 | <listitem>
|
---|
| 23 | <para>Critical: The program doesn't perform its main function.
|
---|
| 24 | The fix would be very intrusive, it's better to search for a
|
---|
| 25 | replacement.</para>
|
---|
| 26 | </listitem>
|
---|
| 27 | <listitem>
|
---|
| 28 | <para>High: Part of the functionality that the program provides
|
---|
| 29 | is not usable. If that functionality is required, it's better to
|
---|
| 30 | search for a replacement.</para>
|
---|
| 31 | </listitem>
|
---|
| 32 | <listitem>
|
---|
| 33 | <para>Low: The program works in all typical use cases, but lacks
|
---|
| 34 | some functionality normally provided by its equivalents.</para>
|
---|
| 35 | </listitem>
|
---|
| 36 | </itemizedlist>
|
---|
| 37 |
|
---|
| 38 | <para>If there is a known workaround for a specific package, it will
|
---|
[e65a39d] | 39 | appear on that package's page. For the most recent information
|
---|
| 40 | about locale related issues for individual packages, check the
|
---|
[42ddc30] | 41 | <ulink url="&blfs-wiki;/BlfsNotes">Editor Notes</ulink> in the BLFS
|
---|
[e65a39d] | 42 | Wiki.</para>
|
---|
[86eaa277] | 43 |
|
---|
| 44 | <sect2 id="locale-not-valid-option"
|
---|
| 45 | xreflabel="Needed Encoding Not a Valid Option">
|
---|
| 46 |
|
---|
| 47 | <title>The Needed Encoding is Not a Valid Option in the Program</title>
|
---|
| 48 |
|
---|
| 49 | <para>Severity: Critical</para>
|
---|
| 50 |
|
---|
| 51 | <para>Some programs require the user to specify the character encoding
|
---|
| 52 | for their input or output data and present only a limited choice of
|
---|
| 53 | encodings. This is the case for the <option>-X</option> option in
|
---|
[51b61f4] | 54 | <!-- <xref linkend="a2ps"/> and --><xref linkend="enscript"/>,
|
---|
[86eaa277] | 55 | the <option>-input-charset</option> option in unpatched
|
---|
[00e20af7] | 56 | <xref linkend="cdrtools"/>, and the character sets offered for display
|
---|
[648e8bc] | 57 | in the menu of <xref linkend="Links"/>. If the required encoding is not
|
---|
[86eaa277] | 58 | in the list, the program usually becomes completely unusable. For
|
---|
| 59 | non-interactive programs, it may be possible to work around this by
|
---|
| 60 | converting the document to a supported input character set before
|
---|
| 61 | submitting to the program.</para>
|
---|
| 62 |
|
---|
| 63 | <para>A solution to this type of problem is to implement the necessary
|
---|
[fb192216] | 64 | support for the missing encoding as a patch to the original program or to
|
---|
| 65 | find a replacement.</para>
|
---|
[9c90b1b] | 66 |
|
---|
[86eaa277] | 67 | </sect2>
|
---|
[9c90b1b] | 68 |
|
---|
[86eaa277] | 69 | <sect2 id="locale-assumed-encoding"
|
---|
| 70 | xreflabel="Program Assumes Encoding">
|
---|
| 71 |
|
---|
| 72 | <title>The Program Assumes the Locale-Based Encoding of External
|
---|
| 73 | Documents</title>
|
---|
| 74 |
|
---|
| 75 | <para>Severity: High for non-text documents, low for text
|
---|
| 76 | documents</para>
|
---|
| 77 |
|
---|
| 78 | <para>Some programs, <xref linkend="nano"/> or
|
---|
| 79 | <xref linkend="joe"/> for example, assume that documents are always
|
---|
| 80 | in the encoding implied by the current locale. While this assumption
|
---|
| 81 | may be valid for the user-created documents, it is not safe for
|
---|
| 82 | external ones. When this assumption fails, non-ASCII characters are
|
---|
| 83 | displayed incorrectly, and the document may become unreadable.</para>
|
---|
| 84 |
|
---|
| 85 | <para>If the external document is entirely text based, it can be
|
---|
| 86 | converted to the current locale encoding using the
|
---|
| 87 | <command>iconv</command> program.</para>
|
---|
| 88 |
|
---|
| 89 | <para>For documents that are not text-based, this is not possible.
|
---|
| 90 | In fact, the assumption made in the program may be completely
|
---|
| 91 | invalid for documents where the Microsoft Windows operating system
|
---|
| 92 | has set de facto standards. An example of this problem is ID3v1 tags
|
---|
[29f80ebc] | 93 | in MP3 files (see the <ulink url="&blfs-wiki;/ID3v1Coding">BLFS Wiki
|
---|
[648e8bc] | 94 | ID3v1Coding page</ulink>
|
---|
[86eaa277] | 95 | for more details). For these cases, the only solution is to find a
|
---|
| 96 | replacement program that doesn't have the issue (e.g., one that
|
---|
| 97 | will allow you to specify the assumed document encoding).</para>
|
---|
| 98 |
|
---|
| 99 | <para>Among BLFS packages, this problem applies to
|
---|
| 100 | <xref linkend="nano"/>, <xref linkend="joe"/>, and all media players
|
---|
| 101 | except <xref linkend="audacious"/>.</para>
|
---|
| 102 |
|
---|
| 103 | <para>Another problem in this category is when someone cannot read
|
---|
| 104 | the documents you've sent them because their operating system is
|
---|
| 105 | set up to handle character encodings differently. This can happen
|
---|
| 106 | often when the other person is using Microsoft Windows, which only
|
---|
| 107 | provides one character encoding for a given country. For example,
|
---|
| 108 | this causes problems with UTF-8 encoded TeX documents created in
|
---|
| 109 | Linux. On Windows, most applications will assume that these documents
|
---|
[0d7900a] | 110 | have been created using the default Windows 8-bit encoding.
|
---|
[8aeb474] | 111 | </para>
|
---|
[86eaa277] | 112 |
|
---|
[864b24de] | 113 | <para>In extreme cases, Windows encoding compatibility issues may be
|
---|
[86eaa277] | 114 | solved only by running Windows programs under
|
---|
[824ddcc] | 115 | <ulink url="https://www.winehq.com/">Wine</ulink>.</para>
|
---|
[9c90b1b] | 116 |
|
---|
[86eaa277] | 117 | </sect2>
|
---|
[9c90b1b] | 118 |
|
---|
[86eaa277] | 119 | <sect2 id="locale-wrong-filename-encoding"
|
---|
| 120 | xreflabel="Wrong Filename Encoding">
|
---|
| 121 |
|
---|
| 122 | <title>The Program Uses or Creates Filenames in the Wrong Encoding</title>
|
---|
| 123 |
|
---|
| 124 | <para>Severity: Critical</para>
|
---|
| 125 |
|
---|
| 126 | <para>The POSIX standard mandates that the filename encoding is
|
---|
| 127 | the encoding implied by the current LC_CTYPE locale category. This
|
---|
| 128 | information is well-hidden on the page which specifies the behavior
|
---|
| 129 | of <application>Tar</application> and <application>Cpio</application>
|
---|
[864b24de] | 130 | programs. Some programs get it wrong by default (or simply don't
|
---|
[86eaa277] | 131 | have enough information to get it right). The result is that they
|
---|
| 132 | create filenames which are not subsequently shown correctly by
|
---|
| 133 | <command>ls</command>, or they refuse to accept filenames that
|
---|
| 134 | <command>ls</command> shows properly. For the <xref linkend="glib2"/>
|
---|
| 135 | library, the problem can be corrected by setting the
|
---|
| 136 | <envar>G_FILENAME_ENCODING</envar> environment variable to the special
|
---|
| 137 | "@locale" value. <application>Glib2</application> based programs that
|
---|
| 138 | don't respect that environment variable are buggy.</para>
|
---|
| 139 |
|
---|
[df8df0a5] | 140 | <para>The <xref linkend="zip"/> and <xref linkend="unzip"/> have this
|
---|
| 141 | problem because they hard-code the expected filename encoding.
|
---|
| 142 | <application>UnZip</application> contains a hard-coded conversion table
|
---|
| 143 | between the CP850 (DOS) and ISO-8859-1 (UNIX) encodings and uses this table
|
---|
| 144 | when extracting archives created under DOS or Microsoft Windows. However,
|
---|
| 145 | this assumption only works for those in the US and not for anyone using a
|
---|
| 146 | UTF-8 locale. Non-ASCII characters will be mangled in the extracted
|
---|
| 147 | filenames.</para>
|
---|
| 148 |
|
---|
| 149 | <!--<para>On the other hand,
|
---|
[86eaa277] | 150 | <application>Nautilus CD Burner</application> checks names of
|
---|
| 151 | files added to its window for UTF-8 validity. This is wrong for
|
---|
| 152 | users of non-UTF-8 locales. Also,
|
---|
| 153 | <application>Nautilus CD Burner</application> unconditionally
|
---|
| 154 | calls <command>mkisofs</command> with the
|
---|
| 155 | <parameter>-input-charset UTF-8</parameter> parameter, which is
|
---|
[df8df0a5] | 156 | only correct in UTF-8 locales.</para>-->
|
---|
[86eaa277] | 157 |
|
---|
[864b24de] | 158 | <para>The general rule for avoiding this class of problems is to
|
---|
[86eaa277] | 159 | avoid installing broken programs. If this is impossible, the
|
---|
[824ddcc] | 160 | <ulink url="https://j3e.de/linux/convmv/">convmv</ulink>
|
---|
[86eaa277] | 161 | command-line tool can be used to fix filenames created by these
|
---|
| 162 | broken programs, or intentionally mangle the existing filenames
|
---|
| 163 | to meet the broken expectations of such programs.</para>
|
---|
| 164 |
|
---|
| 165 | <para>In other cases, a similar problem is caused by importing
|
---|
| 166 | filenames from a system using a different locale with a tool that
|
---|
[9e9cd2a2] | 167 | is not locale-aware (e.g., <!--<xref linkend="nfs-utils"/> or-->
|
---|
[864b24de] | 168 | <xref linkend="openssh"/>). In order to avoid mangling non-ASCII
|
---|
[86eaa277] | 169 | characters when transferring files to a system with a different
|
---|
| 170 | locale, any of the following methods can be used:</para>
|
---|
[9c90b1b] | 171 |
|
---|
[86eaa277] | 172 | <itemizedlist>
|
---|
[3cd5c69] | 173 | <listitem>
|
---|
[86eaa277] | 174 | <para>Transfer anyway, fix the damage with
|
---|
| 175 | <command>convmv</command>.</para>
|
---|
[3cd5c69] | 176 | </listitem>
|
---|
[9c90b1b] | 177 | <listitem>
|
---|
[864b24de] | 178 | <para>On the sending side, create a tar archive with the
|
---|
[86eaa277] | 179 | <parameter>--format=posix</parameter> switch passed to
|
---|
[864b24de] | 180 | <command>tar</command> (this will be the default in a future
|
---|
[86eaa277] | 181 | version of <command>tar</command>).</para>
|
---|
[9c90b1b] | 182 | </listitem>
|
---|
[f6b83352] | 183 | <listitem>
|
---|
[86eaa277] | 184 | <para>Mail the files as attachments. Mail clients specify the
|
---|
| 185 | encoding of attached filenames.</para>
|
---|
| 186 | </listitem>
|
---|
| 187 | <listitem>
|
---|
| 188 | <para>Write the files to a removable disk formatted with a FAT or
|
---|
| 189 | FAT32 filesystem.</para>
|
---|
| 190 | </listitem>
|
---|
| 191 | <listitem>
|
---|
| 192 | <para>Transfer the files using Samba.</para>
|
---|
| 193 | </listitem>
|
---|
| 194 | <listitem>
|
---|
| 195 | <para>Transfer the files via FTP using RFC2640-aware server
|
---|
| 196 | (this currently means only wu-ftpd, which has bad security history)
|
---|
| 197 | and client (e.g., lftp).</para>
|
---|
[f6b83352] | 198 | </listitem>
|
---|
[9c90b1b] | 199 | </itemizedlist>
|
---|
| 200 |
|
---|
[86eaa277] | 201 | <para>The last four methods work because the filenames are automatically
|
---|
| 202 | converted from the sender's locale to UNICODE and stored or sent in this
|
---|
| 203 | form. They are then transparently converted from UNICODE to the
|
---|
| 204 | recipient's locale encoding.</para>
|
---|
| 205 |
|
---|
| 206 | </sect2>
|
---|
| 207 |
|
---|
| 208 | <sect2 id="locale-wrong-multibyte-characters"
|
---|
[a4b9cd7] | 209 | xreflabel="Breaks Multibyte Characters">
|
---|
[86eaa277] | 210 |
|
---|
| 211 | <title>The Program Breaks Multibyte Characters or Doesn't Count
|
---|
| 212 | Character Cells Correctly</title>
|
---|
| 213 |
|
---|
| 214 | <para>Severity: High or critical</para>
|
---|
| 215 |
|
---|
| 216 | <para>Many programs were written in an older era where multibyte
|
---|
| 217 | locales were not common. Such programs assume that C "char" data
|
---|
| 218 | type, which is one byte, can be used to store single characters.
|
---|
| 219 | Further, they assume that any sequence of characters is a valid
|
---|
| 220 | string and that every character occupies a single character cell.
|
---|
| 221 | Such assumptions completely break in UTF-8 locales. The visible
|
---|
| 222 | manifestation is that the program truncates strings prematurely
|
---|
| 223 | (i.e., at 80 bytes instead of 80 characters). Terminal-based
|
---|
| 224 | programs don't place the cursor correctly on the screen, don't react
|
---|
| 225 | to the "Backspace" key by erasing one character, and leave junk
|
---|
| 226 | characters around when updating the screen, usually turning the
|
---|
| 227 | screen into a complete mess.</para>
|
---|
| 228 |
|
---|
[864b24de] | 229 | <para>Fixing this kind of problems is a tedious task from a
|
---|
| 230 | programmer's point of view, like all other cases of retrofitting new
|
---|
| 231 | concepts into the old flawed design. In this case, one has to redesign
|
---|
| 232 | all data structures in order to accommodate to the fact that a complete
|
---|
| 233 | character may span a variable number of "char"s (or switch to wchar_t
|
---|
| 234 | and convert as needed). Also, for every call to the "strlen" and
|
---|
| 235 | similar functions, find out whether a number of bytes, a number of
|
---|
| 236 | characters, or the width of the string was really meant. Sometimes it
|
---|
[86eaa277] | 237 | is faster to write a program with the same functionality from scratch.
|
---|
| 238 | </para>
|
---|
| 239 |
|
---|
[5aeb97df] | 240 | <para>Among BLFS packages, this problem applies to
|
---|
[1fc6df6] | 241 | <xref linkend="xine-ui"/> and all the shells.</para>
|
---|
[f6b83352] | 242 |
|
---|
[9c90b1b] | 243 | </sect2>
|
---|
| 244 |
|
---|
[c6c037c] | 245 | <sect2 id="locale-wrong-manpage-encoding"
|
---|
| 246 | xreflabel="Incorrect Manual Page Encoding">
|
---|
| 247 |
|
---|
| 248 | <title>The Package Installs Manual Pages in Incorrect or
|
---|
| 249 | Non-Displayable Encoding</title>
|
---|
| 250 |
|
---|
| 251 | <para>Severity: Low</para>
|
---|
| 252 |
|
---|
| 253 | <para>LFS expects that manual pages are in the language-specific (usually
|
---|
[648e8bc] | 254 | 8-bit) encoding, as specified on the <ulink
|
---|
[f0dc9578] | 255 | url="&lfs-root;/chapter08/man-db.html">LFS Man DB page</ulink>. However,
|
---|
[648e8bc] | 256 | some packages install translated manual pages in UTF-8 encoding (e.g.,
|
---|
| 257 | Shadow, already dealt with), or manual pages in languages not in the table.
|
---|
| 258 | Not all BLFS packages have been audited for conformance with the
|
---|
| 259 | requirements put in LFS (the large majority have been checked, and fixes
|
---|
| 260 | placed in the book for packages known to install non-conforming manual
|
---|
| 261 | pages). If you find a manual page installed by any of BLFS packages that is
|
---|
| 262 | obviously in the wrong encoding, please remove or convert it as needed, and
|
---|
[29f80ebc] | 263 | report this to BLFS team as a bug.</para>
|
---|
[a45a7bc] | 264 |
|
---|
| 265 | <para>You can easily check your system for any non-conforming manual pages
|
---|
| 266 | by copying the following short shell script to some accessible location,
|
---|
| 267 |
|
---|
| 268 | <screen><literal>#!/bin/sh
|
---|
| 269 | # Begin checkman.sh
|
---|
| 270 | # Usage: find /usr/share/man -type f | xargs checkman.sh
|
---|
| 271 | for a in "$@"
|
---|
| 272 | do
|
---|
| 273 | # echo "Checking $a..."
|
---|
| 274 | # Pure-ASCII manual page (possibly except comments) is OK
|
---|
[9a003fe1] | 275 | grep -v '.\\"' "$a" | iconv -f US-ASCII -t US-ASCII >/dev/null 2>&1 \
|
---|
| 276 | && continue
|
---|
[a45a7bc] | 277 | # Non-UTF-8 manual page is OK
|
---|
| 278 | iconv -f UTF-8 -t UTF-8 "$a" >/dev/null 2>&1 || continue
|
---|
[ec64d28] | 279 | # Found a UTF-8 manual page, bad.
|
---|
[a45a7bc] | 280 | echo "UTF-8 manual page: $a" >&2
|
---|
| 281 | done
|
---|
| 282 | # End checkman.sh
|
---|
| 283 | </literal></screen>
|
---|
| 284 |
|
---|
| 285 | and then issuing the following command (modify the command below if the
|
---|
| 286 | <command>checkman.sh</command> script is not in your <envar>PATH</envar>
|
---|
| 287 | environment variable):</para>
|
---|
| 288 |
|
---|
| 289 | <screen><userinput>find /usr/share/man -type f | xargs checkman.sh</userinput></screen>
|
---|
| 290 |
|
---|
| 291 | <para>Note that if you have manual pages installed in any location other
|
---|
| 292 | than <filename class='directory'>/usr/share/man</filename> (e.g.,
|
---|
| 293 | <filename class='directory'>/usr/local/share/man</filename>), you must
|
---|
| 294 | modify the above command to include this additional location.</para>
|
---|
[c6c037c] | 295 |
|
---|
| 296 | </sect2>
|
---|
| 297 |
|
---|
[9c90b1b] | 298 | </sect1>
|
---|