source: postlfs/config/compressdoc.xml@ 62b2fb0

10.0 10.1 11.0 11.1 11.2 11.3 12.0 12.1 6.3 6.3-rc1 6.3-rc2 6.3-rc3 7.10 7.4 7.5 7.6 7.6-blfs 7.6-systemd 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 9.0 9.1 basic bdubbs/svn elogind gnome kde5-13430 kde5-14269 kde5-14686 kea ken/TL2024 ken/inkscape-core-mods ken/tuningfonts krejzi/svn lazarus lxqt nosym perl-modules plabs/newcss plabs/python-mods python3.11 qt5new rahul/power-profiles-daemon renodr/vulkan-addition systemd-11177 systemd-13485 trunk upgradedb xry111/intltool xry111/llvm18 xry111/soup3 xry111/test-20220226 xry111/xf86-video-removal
Last change on this file since 62b2fb0 was 62b2fb0, checked in by Dan Nichilson <dnicholson@…>, 16 years ago

compressdoc: Automatic compression by file size from Lars Bamberger

git-svn-id: svn://svn.linuxfromscratch.org/BLFS/trunk/BOOK@7392 af4574ff-66df-0310-9fd7-8a98e5e911e0

  • Property mode set to 100644
File size: 19.7 KB
Line 
1<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
4 <!ENTITY % general-entities SYSTEM "../../general.ent">
5 %general-entities;
6]>
7
8<sect1 id="compressdoc" xreflabel="Compressing man and info pages">
9 <?dbhtml filename="compressdoc.html"?>
10
11 <sect1info>
12 <othername>$LastChangedBy$</othername>
13 <date>$Date$</date>
14 </sect1info>
15
16 <title>Compressing Man and Info Pages</title>
17
18 <indexterm zone="compressdoc">
19 <primary sortas="b-compressdoc">compressdoc</primary>
20 </indexterm>
21
22 <para>Man and info reader programs can transparently process files compressed
23 with <command>gzip</command> or <command>bzip2</command>, a feature you can
24 use to free some disk space while keeping your documentation
25 available. However, things are not that simple; man directories tend to
26 contain links&mdash;hard and symbolic&mdash;which defeat simple ideas like
27 recursively calling <command>gzip</command> on them. A better way to go is
28 to use the script below. If you would prefer to download the file instead of
29 creating it by typing or cut-and-pasting, you can find it at
30 <ulink url="&files-anduin;/compressdoc"/> (the file should be installed in
31 the <filename class="directory">/usr/sbin</filename> directory).</para>
32
33<screen role="root"><?dbfo keep-together="auto"?><userinput>cat &gt; /usr/sbin/compressdoc &lt;&lt; "EOF"
34<literal>#!/bin/bash
35# VERSION: 20080421.1320
36#
37# Compress (with bzip2 or gzip) all man pages in a hierarchy and
38# update symlinks - By Marc Heerdink &lt;marc @ koelkast.net&gt;
39#
40# Modified to be able to gzip or bzip2 files as an option and to deal
41# with all symlinks properly by Mark Hymers &lt;markh @ &lfs-domainname;&gt;
42#
43# Modified 20030930 by Yann E. Morin &lt;yann.morin.1998 @ anciens.enib.fr&gt;
44# to accept compression/decompression, to correctly handle hard-links,
45# to allow for changing hard-links into soft- ones, to specify the
46# compression level, to parse the man.conf for all occurrences of MANPATH,
47# to allow for a backup, to allow to keep the newest version of a page.
48#
49# Modified 20040330 by Tushar Teredesai to replace $0 by the name of the
50# script.
51# (Note: It is assumed that the script is in the user's PATH)
52#
53# Modified 20050112 by Randy McMurchy to shorten line lengths and
54# correct grammar errors.
55#
56# Modified 20060128 by Alexander E. Patrakov for compatibility with Man-DB.
57#
58# Modified 20060311 by Archaic to use Man-DB manpath utility which is a
59# replacement for man --path from Man.
60#
61# Modified 20080421 by Dan Nicholson to properly execute the correct
62# compressdoc when working recursively. This means the same compressdoc
63# will be used whether a full path was given or it was resolved from PATH.
64#
65# Modified 20080421 by Dan Nicholson to be more robust with directories
66# that don't exist or don't have sufficient permissions.
67#
68# Modified 20080421 by Lars Bamberger to (sort of) automatically choose
69# a compression method based on the size of the manpage. A couple bug
70# fixes were added by Dan Nicholson.
71#
72# TODO:
73# - choose a default compress method to be based on the available
74# tool : gzip or bzip2;
75# - when a MANPATH env var exists, use this instead of /etc/man_db.conf
76# (useful for users to (de)compress their man pages;
77# - offer an option to restore a previous backup;
78# - add other compression engines (compress, zip, etc?). Needed?
79
80# Funny enough, this function prints some help.
81function help ()
82{
83 if [ -n "$1" ]; then
84 echo "Unknown option : $1"
85 fi
86 ( echo "Usage: $MY_NAME &lt;comp_method&gt; [options] [dirs]" &amp;&amp; \
87 cat &lt;&lt; EOT
88Where comp_method is one of :
89 --gzip, --gz, -g
90 --bzip2, --bz2, -b
91 Compress using gzip or bzip2.
92 --automatic
93 Compress using either gzip or bzip2, depending on the
94 size of the file to be compressed. Files larger than 5
95 kB are bzipped, files larger than 1 kB are gzipped and
96 files smaller than 1 kB are not compressed.
97
98 --decompress, -d
99 Decompress the man pages.
100
101 --backup Specify a .tar backup shall be done for all directories.
102 In case a backup already exists, it is saved as .tar.old
103 prior to making the new backup. If a .tar.old backup
104 exists, it is removed prior to saving the backup.
105 In backup mode, no other action is performed.
106
107And where options are :
108 -1 to -9, --fast, --best
109 The compression level, as accepted by gzip and bzip2.
110 When not specified, uses the default compression level
111 for the given method (-6 for gzip, and -9 for bzip2).
112 Not used when in backup or decompress modes.
113
114 --force, -F Force (re-)compression, even if the previous one was
115 the same method. Useful when changing the compression
116 ratio. By default, a page will not be re-compressed if
117 it ends with the same suffix as the method adds
118 (.bz2 for bzip2, .gz for gzip).
119
120 --soft, -S Change hard-links into soft-links. Use with _caution_
121 as the first encountered file will be used as a
122 reference. Not used when in backup mode.
123
124 --hard, -H Change soft-links into hard-links. Not used when in
125 backup mode.
126
127 --conf=dir, --conf dir
128 Specify the location of man_db.conf. Defaults to /etc.
129
130 --verbose, -v Verbose mode, print the name of the directory being
131 processed. Double the flag to turn it even more verbose,
132 and to print the name of the file being processed.
133
134 --fake, -f Fakes it. Print the actual parameters compressdoc will use.
135
136 dirs A list of space-separated _absolute_ pathnames to the
137 man directories. When empty, and only then, use manpath
138 to parse ${MAN_CONF}/man_db.conf for all valid occurrences
139 of MANDATORY_MANPATH.
140
141Note about compression:
142 There has been a discussion on blfs-support about compression ratios of
143 both gzip and bzip2 on man pages, taking into account the hosting fs,
144 the architecture, etc... On the overall, the conclusion was that gzip
145 was much more efficient on 'small' files, and bzip2 on 'big' files,
146 small and big being very dependent on the content of the files.
147
148 See the original post from Mickael A. Peters, titled
149 "Bootable Utility CD", dated 20030409.1816(+0200), and subsequent posts:
150 http://&lfs-domainname;/pipermail/blfs-support/2003-April/038817.html
151
152 On my system (x86, ext3), man pages were 35564KB before compression.
153 gzip -9 compressed them down to 20372KB (57.28%), bzip2 -9 got down to
154 19812KB (55.71%). That is a 1.57% gain in space. YMMV.
155
156 What was not taken into consideration was the decompression speed. But
157 does it make sense to? You gain fast access with uncompressed man
158 pages, or you gain space at the expense of a slight overhead in time.
159 Well, my P4-2.5GHz does not even let me notice this... :-)
160
161EOT
162) | less
163}
164
165# This function checks that the man page is unique amongst bzip2'd,
166# gzip'd and uncompressed versions.
167# $1 the directory in which the file resides
168# $2 the file name for the man page
169# Returns 0 (true) if the file is the latest and must be taken care of,
170# and 1 (false) if the file is not the latest (and has therefore been
171# deleted).
172function check_unique ()
173{
174 # NB. When there are hard-links to this file, these are
175 # _not_ deleted. In fact, if there are hard-links, they
176 # all have the same date/time, thus making them ready
177 # for deletion later on.
178
179 # Build the list of all man pages with the same name
180 DIR=$1
181 BASENAME=`basename "${2}" .bz2`
182 BASENAME=`basename "${BASENAME}" .gz`
183 GZ_FILE="$BASENAME".gz
184 BZ_FILE="$BASENAME".bz2
185
186 # Look for, and keep, the most recent one
187 LATEST=`(cd "$DIR"; ls -1rt "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}" \
188 2&gt;/dev/null | tail -n 1)`
189 for i in "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}"; do
190 [ "$LATEST" != "$i" ] &amp;&amp; rm -f "$DIR"/"$i"
191 done
192
193 # In case the specified file was the latest, return 0
194 [ "$LATEST" = "$2" ] &amp;&amp; return 0
195 # If the file was not the latest, return 1
196 return 1
197}
198
199# Name of the script
200MY_NAME=`basename $0`
201
202# OK, parse the command-line for arguments, and initialize to some
203# sensible state, that is: don't change links state, parse
204# /etc/man_db.conf, be most silent, search man_db.conf in /etc, and don't
205# force (re-)compression.
206COMP_METHOD=
207COMP_SUF=
208COMP_LVL=
209FORCE_OPT=
210LN_OPT=
211MAN_DIR=
212VERBOSE_LVL=0
213BACKUP=no
214FAKE=no
215MAN_CONF=/etc
216while [ -n "$1" ]; do
217 case $1 in
218 --gzip|--gz|-g)
219 COMP_SUF=.gz
220 COMP_METHOD=$1
221 shift
222 ;;
223 --bzip2|--bz2|-b)
224 COMP_SUF=.bz2
225 COMP_METHOD=$1
226 shift
227 ;;
228 --automatic)
229 COMP_SUF=TBD
230 COMP_METHOD=$1
231 shift
232 ;;
233 --decompress|-d)
234 COMP_SUF=
235 COMP_LVL=
236 COMP_METHOD=$1
237 shift
238 ;;
239 -[1-9]|--fast|--best)
240 COMP_LVL=$1
241 shift
242 ;;
243 --force|-F)
244 FORCE_OPT=-F
245 shift
246 ;;
247 --soft|-S)
248 LN_OPT=-S
249 shift
250 ;;
251 --hard|-H)
252 LN_OPT=-H
253 shift
254 ;;
255 --conf=*)
256 MAN_CONF=`echo $1 | cut -d '=' -f2-`
257 shift
258 ;;
259 --conf)
260 MAN_CONF="$2"
261 shift 2
262 ;;
263 --verbose|-v)
264 let VERBOSE_LVL++
265 shift
266 ;;
267 --backup)
268 BACKUP=yes
269 shift
270 ;;
271 --fake|-f)
272 FAKE=yes
273 shift
274 ;;
275 --help|-h)
276 help
277 exit 0
278 ;;
279 /*)
280 MAN_DIR="${MAN_DIR} ${1}"
281 shift
282 ;;
283 -*)
284 help $1
285 exit 1
286 ;;
287 *)
288 echo "\"$1\" is not an absolute path name"
289 exit 1
290 ;;
291 esac
292done
293
294# Redirections
295case $VERBOSE_LVL in
296 0)
297 # O, be silent
298 DEST_FD0=/dev/null
299 DEST_FD1=/dev/null
300 VERBOSE_OPT=
301 ;;
302 1)
303 # 1, be a bit verbose
304 DEST_FD0=/dev/stdout
305 DEST_FD1=/dev/null
306 VERBOSE_OPT=-v
307 ;;
308 *)
309 # 2 and above, be most verbose
310 DEST_FD0=/dev/stdout
311 DEST_FD1=/dev/stdout
312 VERBOSE_OPT="-v -v"
313 ;;
314esac
315
316# Note: on my machine, 'man --path' gives /usr/share/man twice, once
317# with a trailing '/', once without.
318if [ -z "$MAN_DIR" ]; then
319 MAN_DIR=`manpath -C "$MAN_CONF"/man_db.conf \
320 | sed 's/:/\\n/g' \
321 | while read foo; do dirname "$foo"/.; done \
322 | sort -u \
323 | while read bar; do echo -n "$bar "; done`
324fi
325
326# If no MANDATORY_MANPATH in ${MAN_CONF}/man_db.conf, abort as well
327if [ -z "$MAN_DIR" ]; then
328 echo "No directory specified, and no directory found with \`manpath'"
329 exit 1
330fi
331
332# Check that the specified directories actually exist and are readable
333for DIR in $MAN_DIR; do
334 if [ ! -d "$DIR" -o ! -r "$DIR" ]; then
335 echo "Directory '$DIR' does not exist or is not readable"
336 exit 1
337 fi
338done
339
340# Fake?
341if [ "$FAKE" != "no" ]; then
342 echo "Actual parameters used:"
343 echo -n "Compression.......: "
344 case $COMP_METHOD in
345 --bzip2|--bz2|-b) echo -n "bzip2";;
346 --gzip|--gz|-g) echo -n "gzip";;
347 --automatic) echo -n "compressing";;
348 --decompress|-d) echo -n "decompressing";;
349 *) echo -n "unknown";;
350 esac
351 echo " ($COMP_METHOD)"
352 echo "Compression level.: $COMP_LVL"
353 echo "Compression suffix: $COMP_SUF"
354 echo -n "Force compression.: "
355 [ "foo$FORCE_OPT" = "foo-F" ] &amp;&amp; echo "yes" || echo "no"
356 echo "man_db.conf is....: ${MAN_CONF}/man_db.conf"
357 echo -n "Hard-links........: "
358 [ "foo$LN_OPT" = "foo-S" ] &amp;&amp;
359 echo "convert to soft-links" || echo "leave as is"
360 echo -n "Soft-links........: "
361 [ "foo$LN_OPT" = "foo-H" ] &amp;&amp;
362 echo "convert to hard-links" || echo "leave as is"
363 echo "Backup............: $BACKUP"
364 echo "Faking (yes!).....: $FAKE"
365 echo "Directories.......: $MAN_DIR"
366 echo "Verbosity level...: $VERBOSE_LVL"
367 exit 0
368fi
369
370# If no method was specified, print help
371if [ -z "${COMP_METHOD}" -a "${BACKUP}" = "no" ]; then
372 help
373 exit 1
374fi
375
376# In backup mode, do the backup solely
377if [ "$BACKUP" = "yes" ]; then
378 for DIR in $MAN_DIR; do
379 cd "${DIR}/.."
380 if [ ! -w "`pwd`" ]; then
381 echo "Directory '`pwd`' is not writable"
382 exit 1
383 fi
384 DIR_NAME=`basename "${DIR}"`
385 echo "Backing up $DIR..." &gt; $DEST_FD0
386 [ -f "${DIR_NAME}.tar.old" ] &amp;&amp; rm -f "${DIR_NAME}.tar.old"
387 [ -f "${DIR_NAME}.tar" ] &amp;&amp;
388 mv "${DIR_NAME}.tar" "${DIR_NAME}.tar.old"
389 tar -cvf "${DIR_NAME}.tar" "${DIR_NAME}" &gt; $DEST_FD1
390 done
391 exit 0
392fi
393
394# I know MAN_DIR has only absolute path names
395# I need to take into account the localized man, so I'm going recursive
396for DIR in $MAN_DIR; do
397 MEM_DIR=`pwd`
398 if [ ! -w "$DIR" ]; then
399 echo "Directory '$DIR' is not writable"
400 exit 1
401 fi
402 cd "$DIR"
403 for FILE in *; do
404 # Fixes the case were the directory is empty
405 if [ "foo$FILE" = "foo*" ]; then continue; fi
406
407 # Fixes the case when hard-links see their compression scheme change
408 # (from not compressed to compressed, or from bz2 to gz, or from gz
409 # to bz2)
410 # Also fixes the case when multiple version of the page are present,
411 # which are either compressed or not.
412 if [ ! -L "$FILE" -a ! -e "$FILE" ]; then continue; fi
413
414 # Do not compress whatis files
415 if [ "$FILE" = "whatis" ]; then continue; fi
416
417 if [ -d "$FILE" ]; then
418 # We are going recursive to that directory
419 echo "-&gt; Entering ${DIR}/${FILE}..." &gt; $DEST_FD0
420 # I need not pass --conf, as I specify the directory to work on
421 # But I need exit in case of error. We must change back to the
422 # original directory so $0 is resolved correctly.
423 (cd "$MEM_DIR" &amp;&amp; eval "$0" ${COMP_METHOD} ${COMP_LVL} ${LN_OPT} \
424 ${VERBOSE_OPT} ${FORCE_OPT} "${DIR}/${FILE}") || exit $?
425 echo "&lt;- Leaving ${DIR}/${FILE}." &gt; $DEST_FD1
426
427 else # !dir
428 if ! check_unique "$DIR" "$FILE"; then continue; fi
429
430 # With automatic compression, get the uncompressed file size of
431 # the file (dereferencing symlinks), and choose an appropriate
432 # compression method.
433 if [ "$COMP_METHOD" = "--automatic" ]; then
434 declare -i SIZE
435 case "$FILE" in
436 *.bz2)
437 SIZE=$(bzcat "$FILE" | wc -c) ;;
438 *.gz)
439 SIZE=$(zcat "$FILE" | wc -c) ;;
440 *)
441 SIZE=$(wc -c &lt; "$FILE") ;;
442 esac
443 if (( $SIZE &gt;= (5 * 2**10) )); then
444 COMP_SUF=.bz2
445 elif (( $SIZE &gt;= (1 * 2**10) )); then
446 COMP_SUF=.gz
447 else
448 COMP_SUF=
449 fi
450 fi
451
452 # Check if the file is already compressed with the specified method
453 BASE_FILE=`basename "$FILE" .gz`
454 BASE_FILE=`basename "$BASE_FILE" .bz2`
455 if [ "${FILE}" = "${BASE_FILE}${COMP_SUF}" \
456 -a "foo${FORCE_OPT}" = "foo" ]; then continue; fi
457
458 # If we have a symlink
459 if [ -h "$FILE" ]; then
460 case "$FILE" in
461 *.bz2)
462 EXT=bz2 ;;
463 *.gz)
464 EXT=gz ;;
465 *)
466 EXT=none ;;
467 esac
468
469 if [ ! "$EXT" = "none" ]; then
470 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 \
471 | tr -d " " | sed s/\.$EXT$//`
472 NEWNAME=`echo "$FILE" | sed s/\.$EXT$//`
473 mv "$FILE" "$NEWNAME"
474 FILE="$NEWNAME"
475 else
476 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 | tr -d " "`
477 fi
478
479 if [ "$LN_OPT" = "-H" ]; then
480 # Change this soft-link into a hard- one
481 rm -f "$FILE" &amp;&amp; ln "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
482 chmod --reference "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
483 else
484 # Keep this soft-link a soft- one.
485 rm -f "$FILE" &amp;&amp; ln -s "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
486 fi
487 echo "Relinked $FILE" &gt; $DEST_FD1
488
489 # else if we have a plain file
490 elif [ -f "$FILE" ]; then
491 # Take care of hard-links: build the list of files hard-linked
492 # to the one we are {de,}compressing.
493 # NB. This is not optimum has the file will eventually be
494 # compressed as many times it has hard-links. But for now,
495 # that's the safe way.
496 inode=`ls -li "$FILE" | awk '{print $1}'`
497 HLINKS=`find . \! -name "$FILE" -inum $inode`
498
499 if [ -n "$HLINKS" ]; then
500 # We have hard-links! Remove them now.
501 for i in $HLINKS; do rm -f "$i"; done
502 fi
503
504 # Now take care of the file that has no hard-link
505 # We do decompress first to re-compress with the selected
506 # compression ratio later on...
507 case "$FILE" in
508 *.bz2)
509 bunzip2 $FILE
510 FILE=`basename "$FILE" .bz2`
511 ;;
512 *.gz)
513 gunzip $FILE
514 FILE=`basename "$FILE" .gz`
515 ;;
516 esac
517
518 # Compress the file with the given compression ratio, if needed
519 case $COMP_SUF in
520 *bz2)
521 bzip2 ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
522 echo "Compressed $FILE" &gt; $DEST_FD1
523 ;;
524 *gz)
525 gzip ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
526 echo "Compressed $FILE" &gt; $DEST_FD1
527 ;;
528 *)
529 echo "Uncompressed $FILE" &gt; $DEST_FD1
530 ;;
531 esac
532
533 # If the file had hard-links, recreate those (either hard or soft)
534 if [ -n "$HLINKS" ]; then
535 for i in $HLINKS; do
536 NEWFILE=`echo "$i" | sed s/\.gz$// | sed s/\.bz2$//`
537 if [ "$LN_OPT" = "-S" ]; then
538 # Make this hard-link a soft- one
539 ln -s "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
540 else
541 # Keep the hard-link a hard- one
542 ln "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
543 fi
544 # Really work only for hard-links. Harmless for soft-links
545 chmod 644 "${NEWFILE}$COMP_SUF"
546 done
547 fi
548
549 else
550 # There is a problem when we get neither a symlink nor a plain
551 # file. Obviously, we shall never ever come here... :-(
552 echo -n "Whaooo... \"${DIR}/${FILE}\" is neither a symlink "
553 echo "nor a plain file. Please check:"
554 ls -l "${DIR}/${FILE}"
555 exit 1
556 fi
557 fi
558 done # for FILE
559done # for DIR</literal>
560
561EOF</userinput></screen>
562
563 <para>As <systemitem class="username">root</systemitem>, make
564 <command>compressdoc</command> executable for all users:</para>
565
566<screen><userinput>chmod -v 755 /usr/sbin/compressdoc</userinput></screen>
567
568 <para>Now, as <systemitem class="username">root</systemitem>, you can issue
569 the command <command>compressdoc --bz2</command> to compress all your system man
570 pages. You can also run <command>compressdoc --help</command> to get
571 comprehensive help about what the script is able to do.</para>
572
573 <para>Don't forget that a few programs, like the <application>X Window
574 System</application> and <application>XEmacs</application> also
575 install their documentation in non-standard places (such as
576 <filename class="directory">/usr/X11R6/man</filename>, etc.). Be sure
577 to add these locations to the file <filename>/etc/man_db.conf</filename>, as
578 <envar>MANDATORY_MANPATH</envar> <replaceable>&lt;/path&gt;</replaceable>
579 lines.</para>
580
581 <para>Example:</para>
582
583<screen><literal> ...
584 MANDATORY_MANPATH /usr/share/man
585 MANDATORY_MANPATH /usr/X11R6/man
586 MANDATORY_MANPATH /usr/local/man
587 MANDATORY_MANPATH /opt/qt/doc/man
588 ...</literal></screen>
589
590 <para>Generally, package installation systems do not compress man/info pages,
591 which means you will need to run the script again if you want to keep the size
592 of your documentation as small as possible. Also, note that running the script
593 after upgrading a package is safe; when you have several versions of a page
594 (for example, one compressed and one uncompressed), the most recent one is kept
595 and the others are deleted.</para>
596
597</sect1>
Note: See TracBrowser for help on using the repository browser.