source: introduction/important/locale-issues.xml@ 9c90b1b

10.0 10.1 11.0 11.1 11.2 11.3 12.0 12.1 6.2 6.2.0 6.2.0-rc1 6.2.0-rc2 6.3 6.3-rc1 6.3-rc2 6.3-rc3 7.10 7.4 7.5 7.6 7.6-blfs 7.6-systemd 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 9.0 9.1 basic bdubbs/svn elogind gnome kde5-13430 kde5-14269 kde5-14686 kea ken/TL2024 ken/inkscape-core-mods ken/tuningfonts krejzi/svn lazarus lxqt nosym perl-modules plabs/newcss plabs/python-mods python3.11 qt5new rahul/power-profiles-daemon renodr/vulkan-addition systemd-11177 systemd-13485 trunk upgradedb xry111/intltool xry111/llvm18 xry111/soup3 xry111/test-20220226 xry111/xf86-video-removal
Last change on this file since 9c90b1b was 9c90b1b, checked in by Randy McMurchy <randy@…>, 18 years ago

Added new section 'Locale Related Issues' to Chapter 2, 'Important Information', thanks to Alexander Patrakov for contributing the text for this page

git-svn-id: svn://svn.linuxfromscratch.org/BLFS/trunk/BOOK@5498 af4574ff-66df-0310-9fd7-8a98e5e911e0

  • Property mode set to 100644
File size: 5.2 KB
Line 
1<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
3 "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
4 <!ENTITY % general-entities SYSTEM "../../general.ent">
5 %general-entities;
6]>
7
8<sect1 id="locale-issues" xreflabel="Locale Related Issues">
9 <?dbhtml filename="locale-issues.html"?>
10
11 <sect1info>
12 <othername>$LastChangedBy:$</othername>
13 <date>$Date:$</date>
14 </sect1info>
15
16 <title>Locale Related Issues</title>
17
18 <para>This page contains information about locale related problems and
19 issues. In this paragraph you'll find a generic overview of things that can
20 come up when configuring your system for various locales. The previous
21 sentence and the remainder of this paragraph must still be
22 revised/completed.</para>
23
24 <sect2>
25
26 <title>Package Specific Locale Issues</title>
27
28 <para>For package specific issues, find the concerned package from the list
29 below and follow the link to view the available information. If a package
30 is not listed here, it means there are no known locale specific issues or
31 problems with that package.</para>
32
33 <itemizedlist>
34
35 <title>List of Packages with Locale Related Issues</title>
36
37 <listitem>
38 <para><xref linkend="locale-unzip"/></para>
39 </listitem>
40
41 </itemizedlist>
42
43 <sect3 id="locale-unzip" xreflabel="UnZip-&unzip-version;">
44
45 <title><xref linkend="unzip"/></title>
46
47 <note>
48 <para>Use of <application>UnZip</application> in the
49 <application>JDK</application>, <application>Mozilla</application>,
50 <application>DocBook</application> or any other BLFS installation
51 instructions is not a problem, as these applications never use
52 <application>UnZip</application> to extract a file with non-ASCII
53 characters in its name.</para>
54 </note>
55
56 <para>The <application>UnZip</application> package assumes that filenames
57 stored in the ZIP archives created on non-Unix systems are encoded in
58 CP850, and that they should be converted to ISO-8859-1 when writing files
59 onto the filesystem. Such assumptions are not always valid. In fact,
60 inside the ZIP archive, filenames are encoded in the DOS codepage that is
61 in use in the relevant country, and the filenames on disk should be in
62 the locale encoding. In MS Windows, the OemToChar() C function (from
63 <filename>User32.DLL</filename>) does the correct conversion (which is
64 indeed the conversion from CP850 to a superset of ISO-8859-1 if MS
65 Windows is set up to use the US English language), but there is no
66 equivalent in Linux.</para>
67
68 <para>When using <command>unzip</command> to unpack a ZIP archive
69 containing non-ASCII filenames, the filenames are damaged because
70 <command>unzip</command> uses improper conversion when any of
71 <replaceable>[SOMETHING NEEDS TO BE PUT HERE AS THE SENTENCE WAS
72 INCOMPLETE]</replaceable>. For example, in the ru_RU.KOI8-R locale,
73 conversion of filenames from CP866 to KOI8-R is required, but conversion
74 from CP850 to ISO-8859-1 is done, which produces filenames consisting of
75 undecipherable characters instead of words (the closest equivalent
76 understandable example for English-only users is rot13). There are
77 several ways around this limitation:</para>
78
79 <para>1) For unpacking ZIP archives with filenames containing non-ASCII
80 characters, use <ulink url="http://www.winzip.com/">WinZip</ulink> while
81 running the <ulink url="http://www.winehq.com/">Wine</ulink> Windows
82 emulator.</para>
83
84 <para>2) After running <command>unzip</command>, fix the damage made to
85 the filenames using the <command>convmv</command> tool
86 (<ulink url="http://j3e.de/linux/convmv/"/>). The following is an example
87 for the ru_RU.KOI8-R locale:</para>
88
89 <blockquote>
90 <para>Step 1. Undo the conversion done by
91 <command>unzip</command>:</para>
92
93<screen><userinput>convmv -f iso-8859-1 -t cp850 -r --nosmart --notest \
94 <replaceable>[/path/to/unzipped/files]</replaceable></userinput></screen>
95
96 <para>Step 2. Do the correct conversion instead:</para>
97
98<screen><userinput>convmv -f cp866 -t koi8-r -r --nosmart --notest \
99 <replaceable>[/path/to/unzipped/files]</replaceable></userinput></screen>
100 </blockquote>
101
102 <para>3) Apply this patch to unzip:
103 <ulink url="https://bugzilla.altlinux.ru/attachment.cgi?id=532"/></para>
104
105 <para>It allows to specify the assumed filename encoding in the ZIP
106 archive using the <option>-O charset_name</option> option and the
107 on-disk filename encoding using the <option>-I charset_name</option>
108 option. Defaults: the on-disk filename encoding is the locale encoding,
109 the encoding inside the ZIP archive is guessed according to the builtin
110 table based on the locale encoding. For US English users, this still
111 means that unzip converts from CP850 to ISO-8859-1 by default.</para>
112
113 <para>Caveat: this method works only with 8-bit locale encodings, not
114 with UTF-8. Attempting to use a patched <command>unzip</command> in UTF-8
115 locales may result in a segmentation fault and is probably a security
116 risk.</para>
117
118 </sect3>
119
120 </sect2>
121
122</sect1>
Note: See TracBrowser for help on using the repository browser.