Ignore:
Timestamp:
10/28/2006 07:13:18 AM (18 years ago)
Author:
Dan Nichilson <dnicholson@…>
Branches:
10.0, 10.1, 11.0, 11.1, 11.2, 11.3, 12.0, 12.1, 6.2, 6.2.0, 6.2.0-rc1, 6.2.0-rc2, 6.3, 6.3-rc1, 6.3-rc2, 6.3-rc3, 7.10, 7.4, 7.5, 7.6, 7.6-blfs, 7.6-systemd, 7.7, 7.8, 7.9, 8.0, 8.1, 8.2, 8.3, 8.4, 9.0, 9.1, basic, bdubbs/svn, elogind, gnome, kde5-13430, kde5-14269, kde5-14686, kea, ken/TL2024, ken/inkscape-core-mods, ken/tuningfonts, krejzi/svn, lazarus, lxqt, nosym, perl-modules, plabs/newcss, plabs/python-mods, python3.11, qt5new, rahul/power-profiles-daemon, renodr/vulkan-addition, systemd-11177, systemd-13485, trunk, upgradedb, xry111/intltool, xry111/llvm18, xry111/soup3, xry111/test-20220226, xry111/xf86-video-removal
Children:
0952b7d8
Parents:
3aeb033
Message:

Implemented Alexander Patrakov's Locale Related Issues changes

git-svn-id: svn://svn.linuxfromscratch.org/BLFS/trunk/BOOK@6364 af4574ff-66df-0310-9fd7-8a98e5e911e0

File:
1 edited

Legend:

Unmodified
Added
Removed
  • general/sysutils/unzip.xml

    r3aeb033 r86eaa277  
    3939    <caution>
    4040      <para>The <application>UnZip</application> package has some locale
    41       related issues. For a full explanation of the issues and some possible
    42       solutions, see the <xref linkend="locale-unzip"/> section of the
    43       <xref linkend="locale-issues"/>.</para>
     41      related issues. See the discussion below in the
     42      <xref linkend="unzip-locale-issues"/> section. A more general
     43      discussion of these problems can be found on the
     44      <xref linkend="locale-issues"/> page.</para>
    4445    </caution>
    4546
     
    6869    <para condition="html" role="usernotes">User Notes:
    6970    <ulink url="&blfs-wiki;/unzip"/></para>
     71
     72  </sect2>
     73
     74  <sect2 id="unzip-locale-issues">
     75    <title>UnZip Locale Issues</title>
     76
     77    <note>
     78      <para>Use of <application>UnZip</application> in the
     79      <application>JDK</application>, <application>Mozilla</application>,
     80      <application>DocBook</application> or any other BLFS package
     81      installation is not a problem, as BLFS instructions never use
     82      <application>UnZip</application> to extract a file with non-ASCII
     83      characters in the file's name.</para>
     84    </note>
     85
     86    <para>The <application>UnZip</application> package assumes that filenames
     87    stored in the ZIP archives created on non-Unix systems are encoded in
     88    CP850, and that they should be converted to ISO-8859-1 when writing files
     89    onto the filesystem. Such assumptions are not always valid. In fact,
     90    inside the ZIP archive, filenames are encoded in the DOS codepage that is
     91    in use in the relevant country, and the filenames on disk should be in
     92    the locale encoding. In MS Windows, the OemToChar() C function (from
     93    <filename>User32.DLL</filename>) does the correct conversion (which is
     94    indeed the conversion from CP850 to a superset of ISO-8859-1 if MS
     95    Windows is set up to use the US English language), but there is no
     96    equivalent in Linux.</para>
     97
     98    <para>When using <command>unzip</command> to unpack a ZIP archive
     99    containing non-ASCII filenames, the filenames are damaged because
     100    <command>unzip</command> uses improper conversion when any of its
     101    encoding assumptions are incorrect. For example, in the ru_RU.KOI8-R
     102    locale, conversion of filenames from CP866 to KOI8-R is required, but
     103    conversion from CP850 to ISO-8859-1 is done, which produces filenames
     104    consisting of undecipherable characters instead of words (the closest
     105    equivalent understandable example for English-only users is rot13). There
     106    are several ways around this limitation:</para>
     107
     108    <para>1) For unpacking ZIP archives with filenames containing non-ASCII
     109    characters, use <ulink url="http://www.winzip.com/">WinZip</ulink> while-      running the <ulink url="http://www.winehq.com/">Wine</ulink> Windows
     110    emulator.</para>
     111
     112    <para>2) After running <command>unzip</command>, fix the damage made to
     113    the filenames using the <command>convmv</command> tool
     114    (<ulink url="http://j3e.de/linux/convmv/"/>). The following is an example
     115    for the ru_RU.KOI8-R locale:</para>
     116
     117    <blockquote>
     118      <para>Step 1. Undo the conversion done by
     119      <command>unzip</command>:</para>
     120
     121<screen><userinput>convmv -f iso-8859-1 -t cp850 -r --nosmart --notest \
     122    <replaceable>&lt;/path/to/unzipped/files&gt;</replaceable></userinput></screen>
     123
     124      <para>Step 2. Do the correct conversion instead:</para>
     125
     126<screen><userinput>convmv -f cp866 -t koi8-r -r --nosmart --notest \
     127    <replaceable>&lt;/path/to/unzipped/files&gt;</replaceable></userinput></screen>
     128    </blockquote>
     129
     130    <para>3) Apply this patch to unzip:
     131    <ulink url="https://bugzilla.altlinux.ru/attachment.cgi?id=532"/></para>
     132
     133    <para>It allows to specify the assumed filename encoding in the ZIP
     134    archive using the <option>-O charset_name</option> option and the
     135    on-disk filename encoding using the <option>-I charset_name</option>
     136    option. Defaults: the on-disk filename encoding is the locale encoding,
     137    the encoding inside the ZIP archive is guessed according to the builtin
     138    table based on the locale encoding. For US English users, this still
     139    means that unzip converts from CP850 to ISO-8859-1 by default.</para>
     140
     141    <para>Caveat: this method works only with 8-bit locale encodings, not
     142    with UTF-8. Attempting to use a patched <command>unzip</command> in UTF-8
     143    locales may result in a segmentation fault and is probably a security
     144    risk.</para>
    70145
    71146  </sect2>
Note: See TracChangeset for help on using the changeset viewer.