source: postlfs/config/compressdoc.xml@ 48e6b2a

10.0 10.1 11.0 11.1 11.2 11.3 12.0 12.1 6.3 6.3-rc1 6.3-rc2 6.3-rc3 7.10 7.4 7.5 7.6 7.6-blfs 7.6-systemd 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 9.0 9.1 basic bdubbs/svn elogind gnome kde5-13430 kde5-14269 kde5-14686 kea ken/TL2024 ken/inkscape-core-mods ken/tuningfonts krejzi/svn lazarus lxqt nosym perl-modules plabs/newcss plabs/python-mods python3.11 qt5new rahul/power-profiles-daemon renodr/vulkan-addition systemd-11177 systemd-13485 trunk upgradedb xry111/intltool xry111/llvm18 xry111/soup3 xry111/test-20220226 xry111/xf86-video-removal
Last change on this file since 48e6b2a was 48e6b2a, checked in by Dan Nichilson <dnicholson@…>, 16 years ago

compressdoc: Be more robust with directories

git-svn-id: svn://svn.linuxfromscratch.org/BLFS/trunk/BOOK@7391 af4574ff-66df-0310-9fd7-8a98e5e911e0

  • Property mode set to 100644
File size: 18.7 KB
Line 
1<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
4 <!ENTITY % general-entities SYSTEM "../../general.ent">
5 %general-entities;
6]>
7
8<sect1 id="compressdoc" xreflabel="Compressing man and info pages">
9 <?dbhtml filename="compressdoc.html"?>
10
11 <sect1info>
12 <othername>$LastChangedBy$</othername>
13 <date>$Date$</date>
14 </sect1info>
15
16 <title>Compressing Man and Info Pages</title>
17
18 <indexterm zone="compressdoc">
19 <primary sortas="b-compressdoc">compressdoc</primary>
20 </indexterm>
21
22 <para>Man and info reader programs can transparently process files compressed
23 with <command>gzip</command> or <command>bzip2</command>, a feature you can
24 use to free some disk space while keeping your documentation
25 available. However, things are not that simple; man directories tend to
26 contain links&mdash;hard and symbolic&mdash;which defeat simple ideas like
27 recursively calling <command>gzip</command> on them. A better way to go is
28 to use the script below. If you would prefer to download the file instead of
29 creating it by typing or cut-and-pasting, you can find it at
30 <ulink url="&files-anduin;/compressdoc"/> (the file should be installed in
31 the <filename class="directory">/usr/sbin</filename> directory).</para>
32
33<screen role="root"><?dbfo keep-together="auto"?><userinput>cat &gt; /usr/sbin/compressdoc &lt;&lt; "EOF"
34<literal>#!/bin/bash
35# VERSION: 20080421.1121
36#
37# Compress (with bzip2 or gzip) all man pages in a hierarchy and
38# update symlinks - By Marc Heerdink &lt;marc @ koelkast.net&gt;
39#
40# Modified to be able to gzip or bzip2 files as an option and to deal
41# with all symlinks properly by Mark Hymers &lt;markh @ &lfs-domainname;&gt;
42#
43# Modified 20030930 by Yann E. Morin &lt;yann.morin.1998 @ anciens.enib.fr&gt;
44# to accept compression/decompression, to correctly handle hard-links,
45# to allow for changing hard-links into soft- ones, to specify the
46# compression level, to parse the man.conf for all occurrences of MANPATH,
47# to allow for a backup, to allow to keep the newest version of a page.
48#
49# Modified 20040330 by Tushar Teredesai to replace $0 by the name of the
50# script.
51# (Note: It is assumed that the script is in the user's PATH)
52#
53# Modified 20050112 by Randy McMurchy to shorten line lengths and
54# correct grammar errors.
55#
56# Modified 20060128 by Alexander E. Patrakov for compatibility with Man-DB.
57#
58# Modified 20060311 by Archaic to use Man-DB manpath utility which is a
59# replacement for man --path from Man.
60#
61# Modified 20080421 by Dan Nicholson to properly execute the correct
62# compressdoc when working recursively. This means the same compressdoc
63# will be used whether a full path was given or it was resolved from PATH.
64#
65# Modified 20080421 by Dan Nicholson to be more robust with directories
66# that don't exist or don't have sufficient permissions.
67#
68# TODO:
69# - choose a default compress method to be based on the available
70# tool : gzip or bzip2;
71# - offer an option to automagically choose the best compression
72# methed on a per page basis (eg. check which of
73# gzip/bzip2/whatever is the most effective, page per page);
74# - when a MANPATH env var exists, use this instead of /etc/man_db.conf
75# (useful for users to (de)compress their man pages;
76# - offer an option to restore a previous backup;
77# - add other compression engines (compress, zip, etc?). Needed?
78
79# Funny enough, this function prints some help.
80function help ()
81{
82 if [ -n "$1" ]; then
83 echo "Unknown option : $1"
84 fi
85 ( echo "Usage: $MY_NAME &lt;comp_method&gt; [options] [dirs]" &amp;&amp; \
86 cat &lt;&lt; EOT
87Where comp_method is one of :
88 --gzip, --gz, -g
89 --bzip2, --bz2, -b
90 Compress using gzip or bzip2.
91
92 --decompress, -d
93 Decompress the man pages.
94
95 --backup Specify a .tar backup shall be done for all directories.
96 In case a backup already exists, it is saved as .tar.old
97 prior to making the new backup. If a .tar.old backup
98 exists, it is removed prior to saving the backup.
99 In backup mode, no other action is performed.
100
101And where options are :
102 -1 to -9, --fast, --best
103 The compression level, as accepted by gzip and bzip2.
104 When not specified, uses the default compression level
105 for the given method (-6 for gzip, and -9 for bzip2).
106 Not used when in backup or decompress modes.
107
108 --force, -F Force (re-)compression, even if the previous one was
109 the same method. Useful when changing the compression
110 ratio. By default, a page will not be re-compressed if
111 it ends with the same suffix as the method adds
112 (.bz2 for bzip2, .gz for gzip).
113
114 --soft, -S Change hard-links into soft-links. Use with _caution_
115 as the first encountered file will be used as a
116 reference. Not used when in backup mode.
117
118 --hard, -H Change soft-links into hard-links. Not used when in
119 backup mode.
120
121 --conf=dir, --conf dir
122 Specify the location of man_db.conf. Defaults to /etc.
123
124 --verbose, -v Verbose mode, print the name of the directory being
125 processed. Double the flag to turn it even more verbose,
126 and to print the name of the file being processed.
127
128 --fake, -f Fakes it. Print the actual parameters compressdoc will use.
129
130 dirs A list of space-separated _absolute_ pathnames to the
131 man directories. When empty, and only then, use manpath
132 to parse ${MAN_CONF}/man_db.conf for all valid occurrences
133 of MANDATORY_MANPATH.
134
135Note about compression:
136 There has been a discussion on blfs-support about compression ratios of
137 both gzip and bzip2 on man pages, taking into account the hosting fs,
138 the architecture, etc... On the overall, the conclusion was that gzip
139 was much more efficient on 'small' files, and bzip2 on 'big' files,
140 small and big being very dependent on the content of the files.
141
142 See the original post from Mickael A. Peters, titled
143 "Bootable Utility CD", dated 20030409.1816(+0200), and subsequent posts:
144 http://&lfs-domainname;/pipermail/blfs-support/2003-April/038817.html
145
146 On my system (x86, ext3), man pages were 35564KB before compression.
147 gzip -9 compressed them down to 20372KB (57.28%), bzip2 -9 got down to
148 19812KB (55.71%). That is a 1.57% gain in space. YMMV.
149
150 What was not taken into consideration was the decompression speed. But
151 does it make sense to? You gain fast access with uncompressed man
152 pages, or you gain space at the expense of a slight overhead in time.
153 Well, my P4-2.5GHz does not even let me notice this... :-)
154
155EOT
156) | less
157}
158
159# This function checks that the man page is unique amongst bzip2'd,
160# gzip'd and uncompressed versions.
161# $1 the directory in which the file resides
162# $2 the file name for the man page
163# Returns 0 (true) if the file is the latest and must be taken care of,
164# and 1 (false) if the file is not the latest (and has therefore been
165# deleted).
166function check_unique ()
167{
168 # NB. When there are hard-links to this file, these are
169 # _not_ deleted. In fact, if there are hard-links, they
170 # all have the same date/time, thus making them ready
171 # for deletion later on.
172
173 # Build the list of all man pages with the same name
174 DIR=$1
175 BASENAME=`basename "${2}" .bz2`
176 BASENAME=`basename "${BASENAME}" .gz`
177 GZ_FILE="$BASENAME".gz
178 BZ_FILE="$BASENAME".bz2
179
180 # Look for, and keep, the most recent one
181 LATEST=`(cd "$DIR"; ls -1rt "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}" \
182 2&gt;/dev/null | tail -n 1)`
183 for i in "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}"; do
184 [ "$LATEST" != "$i" ] &amp;&amp; rm -f "$DIR"/"$i"
185 done
186
187 # In case the specified file was the latest, return 0
188 [ "$LATEST" = "$2" ] &amp;&amp; return 0
189 # If the file was not the latest, return 1
190 return 1
191}
192
193# Name of the script
194MY_NAME=`basename $0`
195
196# OK, parse the command-line for arguments, and initialize to some
197# sensible state, that is: don't change links state, parse
198# /etc/man_db.conf, be most silent, search man_db.conf in /etc, and don't
199# force (re-)compression.
200COMP_METHOD=
201COMP_SUF=
202COMP_LVL=
203FORCE_OPT=
204LN_OPT=
205MAN_DIR=
206VERBOSE_LVL=0
207BACKUP=no
208FAKE=no
209MAN_CONF=/etc
210while [ -n "$1" ]; do
211 case $1 in
212 --gzip|--gz|-g)
213 COMP_SUF=.gz
214 COMP_METHOD=$1
215 shift
216 ;;
217 --bzip2|--bz2|-b)
218 COMP_SUF=.bz2
219 COMP_METHOD=$1
220 shift
221 ;;
222 --decompress|-d)
223 COMP_SUF=
224 COMP_LVL=
225 COMP_METHOD=$1
226 shift
227 ;;
228 -[1-9]|--fast|--best)
229 COMP_LVL=$1
230 shift
231 ;;
232 --force|-F)
233 FORCE_OPT=-F
234 shift
235 ;;
236 --soft|-S)
237 LN_OPT=-S
238 shift
239 ;;
240 --hard|-H)
241 LN_OPT=-H
242 shift
243 ;;
244 --conf=*)
245 MAN_CONF=`echo $1 | cut -d '=' -f2-`
246 shift
247 ;;
248 --conf)
249 MAN_CONF="$2"
250 shift 2
251 ;;
252 --verbose|-v)
253 let VERBOSE_LVL++
254 shift
255 ;;
256 --backup)
257 BACKUP=yes
258 shift
259 ;;
260 --fake|-f)
261 FAKE=yes
262 shift
263 ;;
264 --help|-h)
265 help
266 exit 0
267 ;;
268 /*)
269 MAN_DIR="${MAN_DIR} ${1}"
270 shift
271 ;;
272 -*)
273 help $1
274 exit 1
275 ;;
276 *)
277 echo "\"$1\" is not an absolute path name"
278 exit 1
279 ;;
280 esac
281done
282
283# Redirections
284case $VERBOSE_LVL in
285 0)
286 # O, be silent
287 DEST_FD0=/dev/null
288 DEST_FD1=/dev/null
289 VERBOSE_OPT=
290 ;;
291 1)
292 # 1, be a bit verbose
293 DEST_FD0=/dev/stdout
294 DEST_FD1=/dev/null
295 VERBOSE_OPT=-v
296 ;;
297 *)
298 # 2 and above, be most verbose
299 DEST_FD0=/dev/stdout
300 DEST_FD1=/dev/stdout
301 VERBOSE_OPT="-v -v"
302 ;;
303esac
304
305# Note: on my machine, 'man --path' gives /usr/share/man twice, once
306# with a trailing '/', once without.
307if [ -z "$MAN_DIR" ]; then
308 MAN_DIR=`manpath -C "$MAN_CONF"/man_db.conf \
309 | sed 's/:/\\n/g' \
310 | while read foo; do dirname "$foo"/.; done \
311 | sort -u \
312 | while read bar; do echo -n "$bar "; done`
313fi
314
315# If no MANDATORY_MANPATH in ${MAN_CONF}/man_db.conf, abort as well
316if [ -z "$MAN_DIR" ]; then
317 echo "No directory specified, and no directory found with \`manpath'"
318 exit 1
319fi
320
321# Check that the specified directories actually exist and are readable
322for DIR in $MAN_DIR; do
323 if [ ! -d "$DIR" -o ! -r "$DIR" ]; then
324 echo "Directory '$DIR' does not exist or is not readable"
325 exit 1
326 fi
327done
328
329# Fake?
330if [ "$FAKE" != "no" ]; then
331 echo "Actual parameters used:"
332 echo -n "Compression.......: "
333 case $COMP_METHOD in
334 --bzip2|--bz2|-b) echo -n "bzip2";;
335 --gzip|__gz|-g) echo -n "gzip";;
336 --decompress|-d) echo -n "decompressing";;
337 *) echo -n "unknown";;
338 esac
339 echo " ($COMP_METHOD)"
340 echo "Compression level.: $COMP_LVL"
341 echo "Compression suffix: $COMP_SUF"
342 echo -n "Force compression.: "
343 [ "foo$FORCE_OPT" = "foo-F" ] &amp;&amp; echo "yes" || echo "no"
344 echo "man_db.conf is....: ${MAN_CONF}/man_db.conf"
345 echo -n "Hard-links........: "
346 [ "foo$LN_OPT" = "foo-S" ] &amp;&amp;
347 echo "convert to soft-links" || echo "leave as is"
348 echo -n "Soft-links........: "
349 [ "foo$LN_OPT" = "foo-H" ] &amp;&amp;
350 echo "convert to hard-links" || echo "leave as is"
351 echo "Backup............: $BACKUP"
352 echo "Faking (yes!).....: $FAKE"
353 echo "Directories.......: $MAN_DIR"
354 echo "Verbosity level...: $VERBOSE_LVL"
355 exit 0
356fi
357
358# If no method was specified, print help
359if [ -z "${COMP_METHOD}" -a "${BACKUP}" = "no" ]; then
360 help
361 exit 1
362fi
363
364# In backup mode, do the backup solely
365if [ "$BACKUP" = "yes" ]; then
366 for DIR in $MAN_DIR; do
367 cd "${DIR}/.."
368 if [ ! -w "`pwd`" ]; then
369 echo "Directory '`pwd`' is not writable"
370 exit 1
371 fi
372 DIR_NAME=`basename "${DIR}"`
373 echo "Backing up $DIR..." &gt; $DEST_FD0
374 [ -f "${DIR_NAME}.tar.old" ] &amp;&amp; rm -f "${DIR_NAME}.tar.old"
375 [ -f "${DIR_NAME}.tar" ] &amp;&amp;
376 mv "${DIR_NAME}.tar" "${DIR_NAME}.tar.old"
377 tar -cvf "${DIR_NAME}.tar" "${DIR_NAME}" &gt; $DEST_FD1
378 done
379 exit 0
380fi
381
382# I know MAN_DIR has only absolute path names
383# I need to take into account the localized man, so I'm going recursive
384for DIR in $MAN_DIR; do
385 MEM_DIR=`pwd`
386 if [ ! -w "$DIR" ]; then
387 echo "Directory '$DIR' is not writable"
388 exit 1
389 fi
390 cd "$DIR"
391 for FILE in *; do
392 # Fixes the case were the directory is empty
393 if [ "foo$FILE" = "foo*" ]; then continue; fi
394
395 # Fixes the case when hard-links see their compression scheme change
396 # (from not compressed to compressed, or from bz2 to gz, or from gz
397 # to bz2)
398 # Also fixes the case when multiple version of the page are present,
399 # which are either compressed or not.
400 if [ ! -L "$FILE" -a ! -e "$FILE" ]; then continue; fi
401
402 # Do not compress whatis files
403 if [ "$FILE" = "whatis" ]; then continue; fi
404
405 if [ -d "$FILE" ]; then
406 # We are going recursive to that directory
407 echo "-&gt; Entering ${DIR}/${FILE}..." &gt; $DEST_FD0
408 # I need not pass --conf, as I specify the directory to work on
409 # But I need exit in case of error. We must change back to the
410 # original directory so $0 is resolved correctly.
411 (cd "$MEM_DIR" &amp;&amp; eval "$0" ${COMP_METHOD} ${COMP_LVL} ${LN_OPT} \
412 ${VERBOSE_OPT} ${FORCE_OPT} "${DIR}/${FILE}") || exit $?
413 echo "&lt;- Leaving ${DIR}/${FILE}." &gt; $DEST_FD1
414
415 else # !dir
416 if ! check_unique "$DIR" "$FILE"; then continue; fi
417
418 # Check if the file is already compressed with the specified method
419 BASE_FILE=`basename "$FILE" .gz`
420 BASE_FILE=`basename "$BASE_FILE" .bz2`
421 if [ "${FILE}" = "${BASE_FILE}${COMP_SUF}" \
422 -a "foo${FORCE_OPT}" = "foo" ]; then continue; fi
423
424 # If we have a symlink
425 if [ -h "$FILE" ]; then
426 case "$FILE" in
427 *.bz2)
428 EXT=bz2 ;;
429 *.gz)
430 EXT=gz ;;
431 *)
432 EXT=none ;;
433 esac
434
435 if [ ! "$EXT" = "none" ]; then
436 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 \
437 | tr -d " " | sed s/\.$EXT$//`
438 NEWNAME=`echo "$FILE" | sed s/\.$EXT$//`
439 mv "$FILE" "$NEWNAME"
440 FILE="$NEWNAME"
441 else
442 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 | tr -d " "`
443 fi
444
445 if [ "$LN_OPT" = "-H" ]; then
446 # Change this soft-link into a hard- one
447 rm -f "$FILE" &amp;&amp; ln "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
448 chmod --reference "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
449 else
450 # Keep this soft-link a soft- one.
451 rm -f "$FILE" &amp;&amp; ln -s "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
452 fi
453 echo "Relinked $FILE" &gt; $DEST_FD1
454
455 # else if we have a plain file
456 elif [ -f "$FILE" ]; then
457 # Take care of hard-links: build the list of files hard-linked
458 # to the one we are {de,}compressing.
459 # NB. This is not optimum has the file will eventually be
460 # compressed as many times it has hard-links. But for now,
461 # that's the safe way.
462 inode=`ls -li "$FILE" | awk '{print $1}'`
463 HLINKS=`find . \! -name "$FILE" -inum $inode`
464
465 if [ -n "$HLINKS" ]; then
466 # We have hard-links! Remove them now.
467 for i in $HLINKS; do rm -f "$i"; done
468 fi
469
470 # Now take care of the file that has no hard-link
471 # We do decompress first to re-compress with the selected
472 # compression ratio later on...
473 case "$FILE" in
474 *.bz2)
475 bunzip2 $FILE
476 FILE=`basename "$FILE" .bz2`
477 ;;
478 *.gz)
479 gunzip $FILE
480 FILE=`basename "$FILE" .gz`
481 ;;
482 esac
483
484 # Compress the file with the given compression ratio, if needed
485 case $COMP_SUF in
486 *bz2)
487 bzip2 ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
488 echo "Compressed $FILE" &gt; $DEST_FD1
489 ;;
490 *gz)
491 gzip ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
492 echo "Compressed $FILE" &gt; $DEST_FD1
493 ;;
494 *)
495 echo "Uncompressed $FILE" &gt; $DEST_FD1
496 ;;
497 esac
498
499 # If the file had hard-links, recreate those (either hard or soft)
500 if [ -n "$HLINKS" ]; then
501 for i in $HLINKS; do
502 NEWFILE=`echo "$i" | sed s/\.gz$// | sed s/\.bz2$//`
503 if [ "$LN_OPT" = "-S" ]; then
504 # Make this hard-link a soft- one
505 ln -s "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
506 else
507 # Keep the hard-link a hard- one
508 ln "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
509 fi
510 # Really work only for hard-links. Harmless for soft-links
511 chmod 644 "${NEWFILE}$COMP_SUF"
512 done
513 fi
514
515 else
516 # There is a problem when we get neither a symlink nor a plain
517 # file. Obviously, we shall never ever come here... :-(
518 echo -n "Whaooo... \"${DIR}/${FILE}\" is neither a symlink "
519 echo "nor a plain file. Please check:"
520 ls -l "${DIR}/${FILE}"
521 exit 1
522 fi
523 fi
524 done # for FILE
525done # for DIR</literal>
526
527EOF</userinput></screen>
528
529 <para>As <systemitem class="username">root</systemitem>, make
530 <command>compressdoc</command> executable for all users:</para>
531
532<screen><userinput>chmod -v 755 /usr/sbin/compressdoc</userinput></screen>
533
534 <para>Now, as <systemitem class="username">root</systemitem>, you can issue
535 the command <command>compressdoc --bz2</command> to compress all your system man
536 pages. You can also run <command>compressdoc --help</command> to get
537 comprehensive help about what the script is able to do.</para>
538
539 <para>Don't forget that a few programs, like the <application>X Window
540 System</application> and <application>XEmacs</application> also
541 install their documentation in non-standard places (such as
542 <filename class="directory">/usr/X11R6/man</filename>, etc.). Be sure
543 to add these locations to the file <filename>/etc/man_db.conf</filename>, as
544 <envar>MANDATORY_MANPATH</envar> <replaceable>&lt;/path&gt;</replaceable>
545 lines.</para>
546
547 <para>Example:</para>
548
549<screen><literal> ...
550 MANDATORY_MANPATH /usr/share/man
551 MANDATORY_MANPATH /usr/X11R6/man
552 MANDATORY_MANPATH /usr/local/man
553 MANDATORY_MANPATH /opt/qt/doc/man
554 ...</literal></screen>
555
556 <para>Generally, package installation systems do not compress man/info pages,
557 which means you will need to run the script again if you want to keep the size
558 of your documentation as small as possible. Also, note that running the script
559 after upgrading a package is safe; when you have several versions of a page
560 (for example, one compressed and one uncompressed), the most recent one is kept
561 and the others are deleted.</para>
562
563</sect1>
Note: See TracBrowser for help on using the repository browser.