source: introduction/important/locale-issues.xml@ 6c42d4e

10.0 10.1 11.0 11.1 11.2 11.3 12.0 12.1 6.2 6.2.0 6.2.0-rc1 6.2.0-rc2 6.3 6.3-rc1 6.3-rc2 6.3-rc3 7.10 7.4 7.5 7.6 7.6-blfs 7.6-systemd 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 9.0 9.1 basic bdubbs/svn elogind gnome kde5-13430 kde5-14269 kde5-14686 kea ken/TL2024 ken/inkscape-core-mods ken/tuningfonts krejzi/svn lazarus lxqt nosym perl-modules plabs/newcss plabs/python-mods python3.11 qt5new rahul/power-profiles-daemon renodr/vulkan-addition systemd-11177 systemd-13485 trunk upgradedb xry111/intltool xry111/llvm18 xry111/soup3 xry111/test-20220226 xry111/xf86-video-removal
Last change on this file since 6c42d4e was f6b83352, checked in by Dan Nichilson <dnicholson@…>, 18 years ago

Added info about UTF-8 support in Nano

git-svn-id: svn://svn.linuxfromscratch.org/BLFS/trunk/BOOK@5860 af4574ff-66df-0310-9fd7-8a98e5e911e0

  • Property mode set to 100644
File size: 6.9 KB
Line 
1<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
3 "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
4 <!ENTITY % general-entities SYSTEM "../../general.ent">
5 %general-entities;
6]>
7
8<sect1 id="locale-issues" xreflabel="Locale Related Issues">
9 <?dbhtml filename="locale-issues.html"?>
10
11 <sect1info>
12 <othername>$LastChangedBy$</othername>
13 <date>$Date$</date>
14 </sect1info>
15
16 <title>Locale Related Issues</title>
17
18 <para>This page contains information about locale related problems and
19 issues. In this paragraph you'll find a generic overview of things that can
20 come up when configuring your system for various locales. The previous
21 sentence and the remainder of this paragraph must still be
22 revised/completed.</para>
23
24 <sect2>
25
26 <title>Package Specific Locale Issues</title>
27
28 <para>For package-specific issues, find the concerned package from the list
29 below and follow the link to view the available information. If a package
30 is not listed here, it does not mean there are no known locale-specific
31 issues or problems with that package. It only means that this page has not
32 been updated with the locale-specific information regarding that package.
33 Please reference the BLFS Wiki page for a particular package for any
34 additional locale-specific information. </para>
35
36 <itemizedlist>
37
38 <title>List of Packages with Locale Related Issues</title>
39
40 <listitem>
41 <para><xref linkend="locale-mc"/></para>
42 </listitem>
43 <listitem>
44 <para><xref linkend="locale-unzip"/></para>
45 </listitem>
46 <listitem>
47 <para><xref linkend="locale-nano"/></para>
48 </listitem>
49
50 </itemizedlist>
51
52 <sect3 id="locale-mc" xreflabel="MC-&mc-version;">
53
54 <title><xref linkend="mc"/></title>
55
56 <para>This package makes the assumption that <quote>characters</quote>
57 and <quote>bytes</quote> are the same thing. This is not true in UTF-8
58 based locales. Due to this assumption <application>MC</application> will
59 incorrectly position characters on the screen. After the cursor is moved
60 a bit the screen becomes totally unreadable, as illustrated on
61 <ulink url="&files-anduin;/mc-bad.png">this
62 screenshot</ulink> (taken in a ru_RU.UTF-8 locale). Additionally, input
63 of non-ASCII characters in the editor is impossible, even after selecting
64 <quote>Other 8-bit</quote> encoding from the menu.</para>
65
66 </sect3>
67
68 <sect3 id="locale-unzip" xreflabel="UnZip-&unzip-version;">
69
70 <title><xref linkend="unzip"/></title>
71
72 <note>
73 <para>Use of <application>UnZip</application> in the
74 <application>JDK</application>, <application>Mozilla</application>,
75 <application>DocBook</application> or any other BLFS package
76 installation is not a problem, as BLFS instructions never use
77 <application>UnZip</application> to extract a file with non-ASCII
78 characters in the file's name.</para>
79 </note>
80
81 <para>The <application>UnZip</application> package assumes that filenames
82 stored in the ZIP archives created on non-Unix systems are encoded in
83 CP850, and that they should be converted to ISO-8859-1 when writing files
84 onto the filesystem. Such assumptions are not always valid. In fact,
85 inside the ZIP archive, filenames are encoded in the DOS codepage that is
86 in use in the relevant country, and the filenames on disk should be in
87 the locale encoding. In MS Windows, the OemToChar() C function (from
88 <filename>User32.DLL</filename>) does the correct conversion (which is
89 indeed the conversion from CP850 to a superset of ISO-8859-1 if MS
90 Windows is set up to use the US English language), but there is no
91 equivalent in Linux.</para>
92
93 <para>When using <command>unzip</command> to unpack a ZIP archive
94 containing non-ASCII filenames, the filenames are damaged because
95 <command>unzip</command> uses improper conversion when any of its
96 encoding assumptions are incorrect. For example, in the ru_RU.KOI8-R
97 locale, conversion of filenames from CP866 to KOI8-R is required, but
98 conversion from CP850 to ISO-8859-1 is done, which produces filenames
99 consisting of undecipherable characters instead of words (the closest
100 equivalent understandable example for English-only users is rot13). There
101 are several ways around this limitation:</para>
102
103 <para>1) For unpacking ZIP archives with filenames containing non-ASCII
104 characters, use <ulink url="http://www.winzip.com/">WinZip</ulink> while
105 running the <ulink url="http://www.winehq.com/">Wine</ulink> Windows
106 emulator.</para>
107
108 <para>2) After running <command>unzip</command>, fix the damage made to
109 the filenames using the <command>convmv</command> tool
110 (<ulink url="http://j3e.de/linux/convmv/"/>). The following is an example
111 for the ru_RU.KOI8-R locale:</para>
112
113 <blockquote>
114 <para>Step 1. Undo the conversion done by
115 <command>unzip</command>:</para>
116
117<screen><userinput>convmv -f iso-8859-1 -t cp850 -r --nosmart --notest \
118 <replaceable>&lt;/path/to/unzipped/files&gt;</replaceable></userinput></screen>
119
120 <para>Step 2. Do the correct conversion instead:</para>
121
122<screen><userinput>convmv -f cp866 -t koi8-r -r --nosmart --notest \
123 <replaceable>&lt;/path/to/unzipped/files&gt;</replaceable></userinput></screen>
124 </blockquote>
125
126 <para>3) Apply this patch to unzip:
127 <ulink url="https://bugzilla.altlinux.ru/attachment.cgi?id=532"/></para>
128
129 <para>It allows to specify the assumed filename encoding in the ZIP
130 archive using the <option>-O charset_name</option> option and the
131 on-disk filename encoding using the <option>-I charset_name</option>
132 option. Defaults: the on-disk filename encoding is the locale encoding,
133 the encoding inside the ZIP archive is guessed according to the builtin
134 table based on the locale encoding. For US English users, this still
135 means that unzip converts from CP850 to ISO-8859-1 by default.</para>
136
137 <para>Caveat: this method works only with 8-bit locale encodings, not
138 with UTF-8. Attempting to use a patched <command>unzip</command> in UTF-8
139 locales may result in a segmentation fault and is probably a security
140 risk.</para>
141
142 </sect3>
143
144 <sect3 id="locale-nano" xreflabel="Nano-&nano-version;">
145
146 <title><xref linkend="nano"/></title>
147
148 <para>The current stable version of <application>Nano</application>
149 (&nano-version;) does not support UTF-8 character encodings. A
150 development version is available which addresses these issues. This
151 version can be downloaded at <ulink
152 url="http://www.nano-editor.org/dist/v1.3/nano-1.3.11.tar.gz"/>.
153 Instructions for installing this version are the same as those found on
154 the <xref linkend="nano"/> page.</para>
155
156 </sect3>
157
158 </sect2>
159
160</sect1>
Note: See TracBrowser for help on using the repository browser.