[ab4fdfc] | 1 | <?xml version="1.0" encoding="UTF-8"?>
|
---|
[6732c094] | 2 | <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
|
---|
| 3 | "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
|
---|
[9c90b1b] | 4 | <!ENTITY % general-entities SYSTEM "../../general.ent">
|
---|
| 5 | %general-entities;
|
---|
| 6 | ]>
|
---|
| 7 |
|
---|
| 8 | <sect1 id="locale-issues" xreflabel="Locale Related Issues">
|
---|
| 9 | <?dbhtml filename="locale-issues.html"?>
|
---|
| 10 |
|
---|
| 11 |
|
---|
| 12 | <title>Locale Related Issues</title>
|
---|
| 13 |
|
---|
| 14 | <para>This page contains information about locale related problems and
|
---|
[86eaa277] | 15 | issues. In the following paragraphs you'll find a generic overview of
|
---|
| 16 | things that can come up when configuring your system for various locales.
|
---|
[e65a39d] | 17 | Many (but not all) existing locale related problems can be classified
|
---|
[86eaa277] | 18 | and fall under one of the headings below. The severity ratings below use
|
---|
| 19 | the following criteria:</para>
|
---|
| 20 |
|
---|
| 21 | <itemizedlist>
|
---|
| 22 | <listitem>
|
---|
| 23 | <para>Critical: The program doesn't perform its main function.
|
---|
| 24 | The fix would be very intrusive, it's better to search for a
|
---|
| 25 | replacement.</para>
|
---|
| 26 | </listitem>
|
---|
| 27 | <listitem>
|
---|
| 28 | <para>High: Part of the functionality that the program provides
|
---|
| 29 | is not usable. If that functionality is required, it's better to
|
---|
| 30 | search for a replacement.</para>
|
---|
| 31 | </listitem>
|
---|
| 32 | <listitem>
|
---|
| 33 | <para>Low: The program works in all typical use cases, but lacks
|
---|
| 34 | some functionality normally provided by its equivalents.</para>
|
---|
| 35 | </listitem>
|
---|
| 36 | </itemizedlist>
|
---|
| 37 |
|
---|
| 38 | <para>If there is a known workaround for a specific package, it will
|
---|
[1859ee6] | 39 | appear on that package's page.</para>
|
---|
[86eaa277] | 40 |
|
---|
| 41 | <sect2 id="locale-not-valid-option"
|
---|
| 42 | xreflabel="Needed Encoding Not a Valid Option">
|
---|
| 43 |
|
---|
| 44 | <title>The Needed Encoding is Not a Valid Option in the Program</title>
|
---|
| 45 |
|
---|
| 46 | <para>Severity: Critical</para>
|
---|
| 47 |
|
---|
| 48 | <para>Some programs require the user to specify the character encoding
|
---|
| 49 | for their input or output data and present only a limited choice of
|
---|
| 50 | encodings. This is the case for the <option>-X</option> option in
|
---|
[51b61f4] | 51 | <!-- <xref linkend="a2ps"/> and --><xref linkend="enscript"/>,
|
---|
[86eaa277] | 52 | the <option>-input-charset</option> option in unpatched
|
---|
[00e20af7] | 53 | <xref linkend="cdrtools"/>, and the character sets offered for display
|
---|
[648e8bc] | 54 | in the menu of <xref linkend="Links"/>. If the required encoding is not
|
---|
[86eaa277] | 55 | in the list, the program usually becomes completely unusable. For
|
---|
| 56 | non-interactive programs, it may be possible to work around this by
|
---|
| 57 | converting the document to a supported input character set before
|
---|
| 58 | submitting to the program.</para>
|
---|
| 59 |
|
---|
| 60 | <para>A solution to this type of problem is to implement the necessary
|
---|
[fb192216] | 61 | support for the missing encoding as a patch to the original program or to
|
---|
| 62 | find a replacement.</para>
|
---|
[9c90b1b] | 63 |
|
---|
[86eaa277] | 64 | </sect2>
|
---|
[9c90b1b] | 65 |
|
---|
[86eaa277] | 66 | <sect2 id="locale-assumed-encoding"
|
---|
| 67 | xreflabel="Program Assumes Encoding">
|
---|
| 68 |
|
---|
| 69 | <title>The Program Assumes the Locale-Based Encoding of External
|
---|
| 70 | Documents</title>
|
---|
| 71 |
|
---|
| 72 | <para>Severity: High for non-text documents, low for text
|
---|
| 73 | documents</para>
|
---|
| 74 |
|
---|
| 75 | <para>Some programs, <xref linkend="nano"/> or
|
---|
| 76 | <xref linkend="joe"/> for example, assume that documents are always
|
---|
| 77 | in the encoding implied by the current locale. While this assumption
|
---|
| 78 | may be valid for the user-created documents, it is not safe for
|
---|
| 79 | external ones. When this assumption fails, non-ASCII characters are
|
---|
| 80 | displayed incorrectly, and the document may become unreadable.</para>
|
---|
| 81 |
|
---|
| 82 | <para>If the external document is entirely text based, it can be
|
---|
| 83 | converted to the current locale encoding using the
|
---|
| 84 | <command>iconv</command> program.</para>
|
---|
| 85 |
|
---|
| 86 | <para>For documents that are not text-based, this is not possible.
|
---|
| 87 | In fact, the assumption made in the program may be completely
|
---|
| 88 | invalid for documents where the Microsoft Windows operating system
|
---|
| 89 | has set de facto standards. An example of this problem is ID3v1 tags
|
---|
[1859ee6] | 90 | in MP3 files. For these cases, the only solution is to find a
|
---|
[86eaa277] | 91 | replacement program that doesn't have the issue (e.g., one that
|
---|
| 92 | will allow you to specify the assumed document encoding).</para>
|
---|
| 93 |
|
---|
| 94 | <para>Among BLFS packages, this problem applies to
|
---|
| 95 | <xref linkend="nano"/>, <xref linkend="joe"/>, and all media players
|
---|
| 96 | except <xref linkend="audacious"/>.</para>
|
---|
| 97 |
|
---|
| 98 | <para>Another problem in this category is when someone cannot read
|
---|
| 99 | the documents you've sent them because their operating system is
|
---|
| 100 | set up to handle character encodings differently. This can happen
|
---|
| 101 | often when the other person is using Microsoft Windows, which only
|
---|
| 102 | provides one character encoding for a given country. For example,
|
---|
| 103 | this causes problems with UTF-8 encoded TeX documents created in
|
---|
| 104 | Linux. On Windows, most applications will assume that these documents
|
---|
[0d7900a] | 105 | have been created using the default Windows 8-bit encoding.
|
---|
[8aeb474] | 106 | </para>
|
---|
[86eaa277] | 107 |
|
---|
[864b24de] | 108 | <para>In extreme cases, Windows encoding compatibility issues may be
|
---|
[86eaa277] | 109 | solved only by running Windows programs under
|
---|
[2e07201] | 110 | <ulink url="https://www.winehq.org/">Wine</ulink>.</para>
|
---|
[9c90b1b] | 111 |
|
---|
[86eaa277] | 112 | </sect2>
|
---|
[9c90b1b] | 113 |
|
---|
[86eaa277] | 114 | <sect2 id="locale-wrong-filename-encoding"
|
---|
| 115 | xreflabel="Wrong Filename Encoding">
|
---|
| 116 |
|
---|
| 117 | <title>The Program Uses or Creates Filenames in the Wrong Encoding</title>
|
---|
| 118 |
|
---|
| 119 | <para>Severity: Critical</para>
|
---|
| 120 |
|
---|
| 121 | <para>The POSIX standard mandates that the filename encoding is
|
---|
| 122 | the encoding implied by the current LC_CTYPE locale category. This
|
---|
| 123 | information is well-hidden on the page which specifies the behavior
|
---|
| 124 | of <application>Tar</application> and <application>Cpio</application>
|
---|
[864b24de] | 125 | programs. Some programs get it wrong by default (or simply don't
|
---|
[86eaa277] | 126 | have enough information to get it right). The result is that they
|
---|
| 127 | create filenames which are not subsequently shown correctly by
|
---|
| 128 | <command>ls</command>, or they refuse to accept filenames that
|
---|
| 129 | <command>ls</command> shows properly. For the <xref linkend="glib2"/>
|
---|
| 130 | library, the problem can be corrected by setting the
|
---|
| 131 | <envar>G_FILENAME_ENCODING</envar> environment variable to the special
|
---|
| 132 | "@locale" value. <application>Glib2</application> based programs that
|
---|
| 133 | don't respect that environment variable are buggy.</para>
|
---|
| 134 |
|
---|
[df8df0a5] | 135 | <para>The <xref linkend="zip"/> and <xref linkend="unzip"/> have this
|
---|
| 136 | problem because they hard-code the expected filename encoding.
|
---|
| 137 | <application>UnZip</application> contains a hard-coded conversion table
|
---|
| 138 | between the CP850 (DOS) and ISO-8859-1 (UNIX) encodings and uses this table
|
---|
| 139 | when extracting archives created under DOS or Microsoft Windows. However,
|
---|
| 140 | this assumption only works for those in the US and not for anyone using a
|
---|
| 141 | UTF-8 locale. Non-ASCII characters will be mangled in the extracted
|
---|
| 142 | filenames.</para>
|
---|
| 143 |
|
---|
| 144 | <!--<para>On the other hand,
|
---|
[86eaa277] | 145 | <application>Nautilus CD Burner</application> checks names of
|
---|
| 146 | files added to its window for UTF-8 validity. This is wrong for
|
---|
| 147 | users of non-UTF-8 locales. Also,
|
---|
| 148 | <application>Nautilus CD Burner</application> unconditionally
|
---|
| 149 | calls <command>mkisofs</command> with the
|
---|
| 150 | <parameter>-input-charset UTF-8</parameter> parameter, which is
|
---|
[df8df0a5] | 151 | only correct in UTF-8 locales.</para>-->
|
---|
[86eaa277] | 152 |
|
---|
[864b24de] | 153 | <para>The general rule for avoiding this class of problems is to
|
---|
[86eaa277] | 154 | avoid installing broken programs. If this is impossible, the
|
---|
[824ddcc] | 155 | <ulink url="https://j3e.de/linux/convmv/">convmv</ulink>
|
---|
[86eaa277] | 156 | command-line tool can be used to fix filenames created by these
|
---|
| 157 | broken programs, or intentionally mangle the existing filenames
|
---|
| 158 | to meet the broken expectations of such programs.</para>
|
---|
| 159 |
|
---|
| 160 | <para>In other cases, a similar problem is caused by importing
|
---|
| 161 | filenames from a system using a different locale with a tool that
|
---|
[9e9cd2a2] | 162 | is not locale-aware (e.g., <!--<xref linkend="nfs-utils"/> or-->
|
---|
[864b24de] | 163 | <xref linkend="openssh"/>). In order to avoid mangling non-ASCII
|
---|
[86eaa277] | 164 | characters when transferring files to a system with a different
|
---|
| 165 | locale, any of the following methods can be used:</para>
|
---|
[9c90b1b] | 166 |
|
---|
[86eaa277] | 167 | <itemizedlist>
|
---|
[3cd5c69] | 168 | <listitem>
|
---|
[86eaa277] | 169 | <para>Transfer anyway, fix the damage with
|
---|
| 170 | <command>convmv</command>.</para>
|
---|
[3cd5c69] | 171 | </listitem>
|
---|
[9c90b1b] | 172 | <listitem>
|
---|
[864b24de] | 173 | <para>On the sending side, create a tar archive with the
|
---|
[86eaa277] | 174 | <parameter>--format=posix</parameter> switch passed to
|
---|
[864b24de] | 175 | <command>tar</command> (this will be the default in a future
|
---|
[86eaa277] | 176 | version of <command>tar</command>).</para>
|
---|
[9c90b1b] | 177 | </listitem>
|
---|
[f6b83352] | 178 | <listitem>
|
---|
[86eaa277] | 179 | <para>Mail the files as attachments. Mail clients specify the
|
---|
| 180 | encoding of attached filenames.</para>
|
---|
| 181 | </listitem>
|
---|
| 182 | <listitem>
|
---|
| 183 | <para>Write the files to a removable disk formatted with a FAT or
|
---|
| 184 | FAT32 filesystem.</para>
|
---|
| 185 | </listitem>
|
---|
| 186 | <listitem>
|
---|
| 187 | <para>Transfer the files using Samba.</para>
|
---|
| 188 | </listitem>
|
---|
| 189 | <listitem>
|
---|
| 190 | <para>Transfer the files via FTP using RFC2640-aware server
|
---|
| 191 | (this currently means only wu-ftpd, which has bad security history)
|
---|
| 192 | and client (e.g., lftp).</para>
|
---|
[f6b83352] | 193 | </listitem>
|
---|
[9c90b1b] | 194 | </itemizedlist>
|
---|
| 195 |
|
---|
[86eaa277] | 196 | <para>The last four methods work because the filenames are automatically
|
---|
| 197 | converted from the sender's locale to UNICODE and stored or sent in this
|
---|
| 198 | form. They are then transparently converted from UNICODE to the
|
---|
| 199 | recipient's locale encoding.</para>
|
---|
| 200 |
|
---|
| 201 | </sect2>
|
---|
| 202 |
|
---|
| 203 | <sect2 id="locale-wrong-multibyte-characters"
|
---|
[a4b9cd7] | 204 | xreflabel="Breaks Multibyte Characters">
|
---|
[86eaa277] | 205 |
|
---|
| 206 | <title>The Program Breaks Multibyte Characters or Doesn't Count
|
---|
| 207 | Character Cells Correctly</title>
|
---|
| 208 |
|
---|
| 209 | <para>Severity: High or critical</para>
|
---|
| 210 |
|
---|
| 211 | <para>Many programs were written in an older era where multibyte
|
---|
| 212 | locales were not common. Such programs assume that C "char" data
|
---|
| 213 | type, which is one byte, can be used to store single characters.
|
---|
| 214 | Further, they assume that any sequence of characters is a valid
|
---|
| 215 | string and that every character occupies a single character cell.
|
---|
| 216 | Such assumptions completely break in UTF-8 locales. The visible
|
---|
| 217 | manifestation is that the program truncates strings prematurely
|
---|
| 218 | (i.e., at 80 bytes instead of 80 characters). Terminal-based
|
---|
| 219 | programs don't place the cursor correctly on the screen, don't react
|
---|
| 220 | to the "Backspace" key by erasing one character, and leave junk
|
---|
| 221 | characters around when updating the screen, usually turning the
|
---|
| 222 | screen into a complete mess.</para>
|
---|
| 223 |
|
---|
[864b24de] | 224 | <para>Fixing this kind of problems is a tedious task from a
|
---|
| 225 | programmer's point of view, like all other cases of retrofitting new
|
---|
| 226 | concepts into the old flawed design. In this case, one has to redesign
|
---|
| 227 | all data structures in order to accommodate to the fact that a complete
|
---|
| 228 | character may span a variable number of "char"s (or switch to wchar_t
|
---|
| 229 | and convert as needed). Also, for every call to the "strlen" and
|
---|
| 230 | similar functions, find out whether a number of bytes, a number of
|
---|
| 231 | characters, or the width of the string was really meant. Sometimes it
|
---|
[86eaa277] | 232 | is faster to write a program with the same functionality from scratch.
|
---|
| 233 | </para>
|
---|
| 234 |
|
---|
[5aeb97df] | 235 | <para>Among BLFS packages, this problem applies to
|
---|
[1fc6df6] | 236 | <xref linkend="xine-ui"/> and all the shells.</para>
|
---|
[f6b83352] | 237 |
|
---|
[9c90b1b] | 238 | </sect2>
|
---|
| 239 |
|
---|
[c6c037c] | 240 | <sect2 id="locale-wrong-manpage-encoding"
|
---|
| 241 | xreflabel="Incorrect Manual Page Encoding">
|
---|
| 242 |
|
---|
| 243 | <title>The Package Installs Manual Pages in Incorrect or
|
---|
| 244 | Non-Displayable Encoding</title>
|
---|
| 245 |
|
---|
| 246 | <para>Severity: Low</para>
|
---|
| 247 |
|
---|
| 248 | <para>LFS expects that manual pages are in the language-specific (usually
|
---|
[648e8bc] | 249 | 8-bit) encoding, as specified on the <ulink
|
---|
[f0dc9578] | 250 | url="&lfs-root;/chapter08/man-db.html">LFS Man DB page</ulink>. However,
|
---|
[648e8bc] | 251 | some packages install translated manual pages in UTF-8 encoding (e.g.,
|
---|
| 252 | Shadow, already dealt with), or manual pages in languages not in the table.
|
---|
| 253 | Not all BLFS packages have been audited for conformance with the
|
---|
| 254 | requirements put in LFS (the large majority have been checked, and fixes
|
---|
| 255 | placed in the book for packages known to install non-conforming manual
|
---|
| 256 | pages). If you find a manual page installed by any of BLFS packages that is
|
---|
| 257 | obviously in the wrong encoding, please remove or convert it as needed, and
|
---|
[29f80ebc] | 258 | report this to BLFS team as a bug.</para>
|
---|
[a45a7bc] | 259 |
|
---|
| 260 | <para>You can easily check your system for any non-conforming manual pages
|
---|
| 261 | by copying the following short shell script to some accessible location,
|
---|
| 262 |
|
---|
| 263 | <screen><literal>#!/bin/sh
|
---|
| 264 | # Begin checkman.sh
|
---|
| 265 | # Usage: find /usr/share/man -type f | xargs checkman.sh
|
---|
| 266 | for a in "$@"
|
---|
| 267 | do
|
---|
| 268 | # echo "Checking $a..."
|
---|
| 269 | # Pure-ASCII manual page (possibly except comments) is OK
|
---|
[9a003fe1] | 270 | grep -v '.\\"' "$a" | iconv -f US-ASCII -t US-ASCII >/dev/null 2>&1 \
|
---|
| 271 | && continue
|
---|
[a45a7bc] | 272 | # Non-UTF-8 manual page is OK
|
---|
| 273 | iconv -f UTF-8 -t UTF-8 "$a" >/dev/null 2>&1 || continue
|
---|
[ec64d28] | 274 | # Found a UTF-8 manual page, bad.
|
---|
[a45a7bc] | 275 | echo "UTF-8 manual page: $a" >&2
|
---|
| 276 | done
|
---|
| 277 | # End checkman.sh
|
---|
| 278 | </literal></screen>
|
---|
| 279 |
|
---|
| 280 | and then issuing the following command (modify the command below if the
|
---|
| 281 | <command>checkman.sh</command> script is not in your <envar>PATH</envar>
|
---|
| 282 | environment variable):</para>
|
---|
| 283 |
|
---|
| 284 | <screen><userinput>find /usr/share/man -type f | xargs checkman.sh</userinput></screen>
|
---|
| 285 |
|
---|
| 286 | <para>Note that if you have manual pages installed in any location other
|
---|
| 287 | than <filename class='directory'>/usr/share/man</filename> (e.g.,
|
---|
| 288 | <filename class='directory'>/usr/local/share/man</filename>), you must
|
---|
| 289 | modify the above command to include this additional location.</para>
|
---|
[c6c037c] | 290 |
|
---|
| 291 | </sect2>
|
---|
| 292 |
|
---|
[9c90b1b] | 293 | </sect1>
|
---|