source: archive/compressdoc.xml@ a67e66f

10.0 10.1 11.0 7.10 7.5 7.6 7.6-blfs 7.6-systemd 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 9.0 9.1 basic bdubbs/svn elogind gnome kde5-13430 kde5-14269 kde5-14686 krejzi/svn lazarus nosym perl-modules qt5new systemd-11177 systemd-13485 trunk xry111/git-date xry111/git-date-for-trunk xry111/git-date-test
Last change on this file since a67e66f was a67e66f, checked in by Igor Živković <igor@…>, 8 years ago

archived compressdoc

git-svn-id: svn://svn.linuxfromscratch.org/BLFS/trunk/BOOK@12435 af4574ff-66df-0310-9fd7-8a98e5e911e0

  • Property mode set to 100644
File size: 20.0 KB
Line 
1<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
4 <!ENTITY % general-entities SYSTEM "../../general.ent">
5 %general-entities;
6]>
7
8<sect1 id="compressdoc" xreflabel="Compressing man and info pages">
9 <?dbhtml filename="compressdoc.html"?>
10
11 <sect1info>
12 <othername>$LastChangedBy$</othername>
13 <date>$Date$</date>
14 </sect1info>
15
16 <title>Compressing Man and Info Pages</title>
17
18 <indexterm zone="compressdoc">
19 <primary sortas="b-compressdoc">compressdoc</primary>
20 </indexterm>
21
22 <para>Man and info reader programs can transparently process files compressed
23 with <command>gzip</command> or <command>bzip2</command>, a feature you can
24 use to free some disk space while keeping your documentation
25 available. However, things are not that simple; man directories tend to
26 contain links&mdash;hard and symbolic&mdash;which defeat simple ideas like
27 recursively calling <command>gzip</command> on them. A better way to go is
28 to use the script below. If you would prefer to download the file instead of
29 creating it by typing or copy-and-pasting, you can find it at
30 <ulink url="&files-anduin;/compressdoc"/> (the file should be installed in
31 the <filename class="directory">/usr/sbin</filename> directory).</para>
32
33<screen role="root"><?dbfo keep-together="auto"?><userinput>cat &gt; /usr/sbin/compressdoc &lt;&lt; "EOF"
34<literal>#!/bin/bash
35# VERSION: 20080421.1623
36#
37# Compress (with bzip2 or gzip) all man pages in a hierarchy and
38# update symlinks - By Marc Heerdink &lt;marc @ koelkast.net&gt;
39#
40# Modified to be able to gzip or bzip2 files as an option and to deal
41# with all symlinks properly by Mark Hymers &lt;markh @ &lfs-domainname;&gt;
42#
43# Modified 20030930 by Yann E. Morin &lt;yann.morin.1998 @ anciens.enib.fr&gt;
44# to accept compression/decompression, to correctly handle hard-links,
45# to allow for changing hard-links into soft- ones, to specify the
46# compression level, to parse the man.conf for all occurrences of MANPATH,
47# to allow for a backup, to allow to keep the newest version of a page.
48#
49# Modified 20040330 by Tushar Teredesai to replace $0 by the name of the
50# script.
51# (Note: It is assumed that the script is in the user's PATH)
52#
53# Modified 20050112 by Randy McMurchy to shorten line lengths and
54# correct grammar errors.
55#
56# Modified 20060128 by Alexander E. Patrakov for compatibility with Man-DB.
57#
58# Modified 20060311 by Archaic to use Man-DB manpath utility which is a
59# replacement for man --path from Man.
60#
61# Modified 20080421 by Dan Nicholson to properly execute the correct
62# compressdoc when working recursively. This means the same compressdoc
63# will be used whether a full path was given or it was resolved from PATH.
64#
65# Modified 20080421 by Dan Nicholson to be more robust with directories
66# that don't exist or don't have sufficient permissions.
67#
68# Modified 20080421 by Lars Bamberger to (sort of) automatically choose
69# a compression method based on the size of the manpage. A couple bug
70# fixes were added by Dan Nicholson.
71#
72# Modified 20080421 by Dan Nicholson to suppress warnings from manpath
73# since these are emitted when $MANPATH is set. Removed the TODO for
74# using the $MANPATH variable since manpath(1) handles this already.
75#
76# TODO:
77# - choose a default compress method to be based on the available
78# tool : gzip or bzip2;
79# - offer an option to restore a previous backup;
80# - add other compression engines (compress, zip, etc?). Needed?
81
82# Funny enough, this function prints some help.
83function help ()
84{
85 if [ -n "$1" ]; then
86 echo "Unknown option : $1"
87 fi
88 ( echo "Usage: $MY_NAME &lt;comp_method&gt; [options] [dirs]" &amp;&amp; \
89 cat &lt;&lt; EOT
90Where comp_method is one of :
91 --gzip, --gz, -g
92 --bzip2, --bz2, -b
93 Compress using gzip or bzip2.
94 --automatic
95 Compress using either gzip or bzip2, depending on the
96 size of the file to be compressed. Files larger than 5
97 kB are bzipped, files larger than 1 kB are gzipped and
98 files smaller than 1 kB are not compressed.
99
100 --decompress, -d
101 Decompress the man pages.
102
103 --backup Specify a .tar backup shall be done for all directories.
104 In case a backup already exists, it is saved as .tar.old
105 prior to making the new backup. If a .tar.old backup
106 exists, it is removed prior to saving the backup.
107 In backup mode, no other action is performed.
108
109And where options are :
110 -1 to -9, --fast, --best
111 The compression level, as accepted by gzip and bzip2.
112 When not specified, uses the default compression level
113 for the given method (-6 for gzip, and -9 for bzip2).
114 Not used when in backup or decompress modes.
115
116 --force, -F Force (re-)compression, even if the previous one was
117 the same method. Useful when changing the compression
118 ratio. By default, a page will not be re-compressed if
119 it ends with the same suffix as the method adds
120 (.bz2 for bzip2, .gz for gzip).
121
122 --soft, -S Change hard-links into soft-links. Use with _caution_
123 as the first encountered file will be used as a
124 reference. Not used when in backup mode.
125
126 --hard, -H Change soft-links into hard-links. Not used when in
127 backup mode.
128
129 --conf=dir, --conf dir
130 Specify the location of man_db.conf. Defaults to /etc.
131
132 --verbose, -v Verbose mode, print the name of the directory being
133 processed. Double the flag to turn it even more verbose,
134 and to print the name of the file being processed.
135
136 --fake, -f Fakes it. Print the actual parameters compressdoc will use.
137
138 dirs A list of space-separated _absolute_ pathnames to the
139 man directories. When empty, and only then, use manpath
140 to parse ${MAN_CONF}/man_db.conf for all valid occurrences
141 of MANDATORY_MANPATH.
142
143Note about compression:
144 There has been a discussion on blfs-support about compression ratios of
145 both gzip and bzip2 on man pages, taking into account the hosting fs,
146 the architecture, etc... On the overall, the conclusion was that gzip
147 was much more efficient on 'small' files, and bzip2 on 'big' files,
148 small and big being very dependent on the content of the files.
149
150 See the original post from Mickael A. Peters, titled
151 "Bootable Utility CD", dated 20030409.1816(+0200), and subsequent posts:
152 http://&lfs-domainname;/pipermail/blfs-support/2003-April/038817.html
153
154 On my system (x86, ext3), man pages were 35564KB before compression.
155 gzip -9 compressed them down to 20372KB (57.28%), bzip2 -9 got down to
156 19812KB (55.71%). That is a 1.57% gain in space. YMMV.
157
158 What was not taken into consideration was the decompression speed. But
159 does it make sense to? You gain fast access with uncompressed man
160 pages, or you gain space at the expense of a slight overhead in time.
161 Well, my P4-2.5GHz does not even let me notice this... :-)
162
163EOT
164) | less
165}
166
167# This function checks that the man page is unique amongst bzip2'd,
168# gzip'd and uncompressed versions.
169# $1 the directory in which the file resides
170# $2 the file name for the man page
171# Returns 0 (true) if the file is the latest and must be taken care of,
172# and 1 (false) if the file is not the latest (and has therefore been
173# deleted).
174function check_unique ()
175{
176 # NB. When there are hard-links to this file, these are
177 # _not_ deleted. In fact, if there are hard-links, they
178 # all have the same date/time, thus making them ready
179 # for deletion later on.
180
181 # Build the list of all man pages with the same name
182 DIR=$1
183 BASENAME=`basename "${2}" .bz2`
184 BASENAME=`basename "${BASENAME}" .gz`
185 GZ_FILE="$BASENAME".gz
186 BZ_FILE="$BASENAME".bz2
187
188 # Look for, and keep, the most recent one
189 LATEST=`(cd "$DIR"; ls -1rt "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}" \
190 2&gt;/dev/null | tail -n 1)`
191 for i in "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}"; do
192 [ "$LATEST" != "$i" ] &amp;&amp; rm -f "$DIR"/"$i"
193 done
194
195 # In case the specified file was the latest, return 0
196 [ "$LATEST" = "$2" ] &amp;&amp; return 0
197 # If the file was not the latest, return 1
198 return 1
199}
200
201# Name of the script
202MY_NAME=`basename $0`
203
204# OK, parse the command-line for arguments, and initialize to some
205# sensible state, that is: don't change links state, parse
206# /etc/man_db.conf, be most silent, search man_db.conf in /etc, and don't
207# force (re-)compression.
208COMP_METHOD=
209COMP_SUF=
210COMP_LVL=
211FORCE_OPT=
212LN_OPT=
213MAN_DIR=
214VERBOSE_LVL=0
215BACKUP=no
216FAKE=no
217MAN_CONF=/etc
218while [ -n "$1" ]; do
219 case $1 in
220 --gzip|--gz|-g)
221 COMP_SUF=.gz
222 COMP_METHOD=$1
223 shift
224 ;;
225 --bzip2|--bz2|-b)
226 COMP_SUF=.bz2
227 COMP_METHOD=$1
228 shift
229 ;;
230 --automatic)
231 COMP_SUF=TBD
232 COMP_METHOD=$1
233 shift
234 ;;
235 --decompress|-d)
236 COMP_SUF=
237 COMP_LVL=
238 COMP_METHOD=$1
239 shift
240 ;;
241 -[1-9]|--fast|--best)
242 COMP_LVL=$1
243 shift
244 ;;
245 --force|-F)
246 FORCE_OPT=-F
247 shift
248 ;;
249 --soft|-S)
250 LN_OPT=-S
251 shift
252 ;;
253 --hard|-H)
254 LN_OPT=-H
255 shift
256 ;;
257 --conf=*)
258 MAN_CONF=`echo $1 | cut -d '=' -f2-`
259 shift
260 ;;
261 --conf)
262 MAN_CONF="$2"
263 shift 2
264 ;;
265 --verbose|-v)
266 let VERBOSE_LVL++
267 shift
268 ;;
269 --backup)
270 BACKUP=yes
271 shift
272 ;;
273 --fake|-f)
274 FAKE=yes
275 shift
276 ;;
277 --help|-h)
278 help
279 exit 0
280 ;;
281 /*)
282 MAN_DIR="${MAN_DIR} ${1}"
283 shift
284 ;;
285 -*)
286 help $1
287 exit 1
288 ;;
289 *)
290 echo "\"$1\" is not an absolute path name"
291 exit 1
292 ;;
293 esac
294done
295
296# Redirections
297case $VERBOSE_LVL in
298 0)
299 # O, be silent
300 DEST_FD0=/dev/null
301 DEST_FD1=/dev/null
302 VERBOSE_OPT=
303 ;;
304 1)
305 # 1, be a bit verbose
306 DEST_FD0=/dev/stdout
307 DEST_FD1=/dev/null
308 VERBOSE_OPT=-v
309 ;;
310 *)
311 # 2 and above, be most verbose
312 DEST_FD0=/dev/stdout
313 DEST_FD1=/dev/stdout
314 VERBOSE_OPT="-v -v"
315 ;;
316esac
317
318# Note: on my machine, 'man --path' gives /usr/share/man twice, once
319# with a trailing '/', once without.
320if [ -z "$MAN_DIR" ]; then
321 MAN_DIR=`manpath -q -C "$MAN_CONF"/man_db.conf \
322 | sed 's/:/\\n/g' \
323 | while read foo; do dirname "$foo"/.; done \
324 | sort -u \
325 | while read bar; do echo -n "$bar "; done`
326fi
327
328# If no MANDATORY_MANPATH in ${MAN_CONF}/man_db.conf, abort as well
329if [ -z "$MAN_DIR" ]; then
330 echo "No directory specified, and no directory found with \`manpath'"
331 exit 1
332fi
333
334# Check that the specified directories actually exist and are readable
335for DIR in $MAN_DIR; do
336 if [ ! -d "$DIR" -o ! -r "$DIR" ]; then
337 echo "Directory '$DIR' does not exist or is not readable"
338 exit 1
339 fi
340done
341
342# Fake?
343if [ "$FAKE" != "no" ]; then
344 echo "Actual parameters used:"
345 echo -n "Compression.......: "
346 case $COMP_METHOD in
347 --bzip2|--bz2|-b) echo -n "bzip2";;
348 --gzip|--gz|-g) echo -n "gzip";;
349 --automatic) echo -n "compressing";;
350 --decompress|-d) echo -n "decompressing";;
351 *) echo -n "unknown";;
352 esac
353 echo " ($COMP_METHOD)"
354 echo "Compression level.: $COMP_LVL"
355 echo "Compression suffix: $COMP_SUF"
356 echo -n "Force compression.: "
357 [ "foo$FORCE_OPT" = "foo-F" ] &amp;&amp; echo "yes" || echo "no"
358 echo "man_db.conf is....: ${MAN_CONF}/man_db.conf"
359 echo -n "Hard-links........: "
360 [ "foo$LN_OPT" = "foo-S" ] &amp;&amp;
361 echo "convert to soft-links" || echo "leave as is"
362 echo -n "Soft-links........: "
363 [ "foo$LN_OPT" = "foo-H" ] &amp;&amp;
364 echo "convert to hard-links" || echo "leave as is"
365 echo "Backup............: $BACKUP"
366 echo "Faking (yes!).....: $FAKE"
367 echo "Directories.......: $MAN_DIR"
368 echo "Verbosity level...: $VERBOSE_LVL"
369 exit 0
370fi
371
372# If no method was specified, print help
373if [ -z "${COMP_METHOD}" -a "${BACKUP}" = "no" ]; then
374 help
375 exit 1
376fi
377
378# In backup mode, do the backup solely
379if [ "$BACKUP" = "yes" ]; then
380 for DIR in $MAN_DIR; do
381 cd "${DIR}/.."
382 if [ ! -w "`pwd`" ]; then
383 echo "Directory '`pwd`' is not writable"
384 exit 1
385 fi
386 DIR_NAME=`basename "${DIR}"`
387 echo "Backing up $DIR..." &gt; $DEST_FD0
388 [ -f "${DIR_NAME}.tar.old" ] &amp;&amp; rm -f "${DIR_NAME}.tar.old"
389 [ -f "${DIR_NAME}.tar" ] &amp;&amp;
390 mv "${DIR_NAME}.tar" "${DIR_NAME}.tar.old"
391 tar -cvf "${DIR_NAME}.tar" "${DIR_NAME}" &gt; $DEST_FD1
392 done
393 exit 0
394fi
395
396# I know MAN_DIR has only absolute path names
397# I need to take into account the localized man, so I'm going recursive
398for DIR in $MAN_DIR; do
399 MEM_DIR=`pwd`
400 if [ ! -w "$DIR" ]; then
401 echo "Directory '$DIR' is not writable"
402 exit 1
403 fi
404 cd "$DIR"
405 for FILE in *; do
406 # Fixes the case were the directory is empty
407 if [ "foo$FILE" = "foo*" ]; then continue; fi
408
409 # Fixes the case when hard-links see their compression scheme change
410 # (from not compressed to compressed, or from bz2 to gz, or from gz
411 # to bz2)
412 # Also fixes the case when multiple version of the page are present,
413 # which are either compressed or not.
414 if [ ! -L "$FILE" -a ! -e "$FILE" ]; then continue; fi
415
416 # Do not compress whatis files
417 if [ "$FILE" = "whatis" ]; then continue; fi
418
419 if [ -d "$FILE" ]; then
420 # We are going recursive to that directory
421 echo "-&gt; Entering ${DIR}/${FILE}..." &gt; $DEST_FD0
422 # I need not pass --conf, as I specify the directory to work on
423 # But I need exit in case of error. We must change back to the
424 # original directory so $0 is resolved correctly.
425 (cd "$MEM_DIR" &amp;&amp; eval "$0" ${COMP_METHOD} ${COMP_LVL} ${LN_OPT} \
426 ${VERBOSE_OPT} ${FORCE_OPT} "${DIR}/${FILE}") || exit $?
427 echo "&lt;- Leaving ${DIR}/${FILE}." &gt; $DEST_FD1
428
429 else # !dir
430 if ! check_unique "$DIR" "$FILE"; then continue; fi
431
432 # With automatic compression, get the uncompressed file size of
433 # the file (dereferencing symlinks), and choose an appropriate
434 # compression method.
435 if [ "$COMP_METHOD" = "--automatic" ]; then
436 declare -i SIZE
437 case "$FILE" in
438 *.bz2)
439 SIZE=$(bzcat "$FILE" | wc -c) ;;
440 *.gz)
441 SIZE=$(zcat "$FILE" | wc -c) ;;
442 *)
443 SIZE=$(wc -c &lt; "$FILE") ;;
444 esac
445 if (( $SIZE &gt;= (5 * 2**10) )); then
446 COMP_SUF=.bz2
447 elif (( $SIZE &gt;= (1 * 2**10) )); then
448 COMP_SUF=.gz
449 else
450 COMP_SUF=
451 fi
452 fi
453
454 # Check if the file is already compressed with the specified method
455 BASE_FILE=`basename "$FILE" .gz`
456 BASE_FILE=`basename "$BASE_FILE" .bz2`
457 if [ "${FILE}" = "${BASE_FILE}${COMP_SUF}" \
458 -a "foo${FORCE_OPT}" = "foo" ]; then continue; fi
459
460 # If we have a symlink
461 if [ -h "$FILE" ]; then
462 case "$FILE" in
463 *.bz2)
464 EXT=bz2 ;;
465 *.gz)
466 EXT=gz ;;
467 *)
468 EXT=none ;;
469 esac
470
471 if [ ! "$EXT" = "none" ]; then
472 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 \
473 | tr -d " " | sed s/\.$EXT$//`
474 NEWNAME=`echo "$FILE" | sed s/\.$EXT$//`
475 mv "$FILE" "$NEWNAME"
476 FILE="$NEWNAME"
477 else
478 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 | tr -d " "`
479 fi
480
481 if [ "$LN_OPT" = "-H" ]; then
482 # Change this soft-link into a hard- one
483 rm -f "$FILE" &amp;&amp; ln "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
484 chmod --reference "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
485 else
486 # Keep this soft-link a soft- one.
487 rm -f "$FILE" &amp;&amp; ln -s "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
488 fi
489 echo "Relinked $FILE" &gt; $DEST_FD1
490
491 # else if we have a plain file
492 elif [ -f "$FILE" ]; then
493 # Take care of hard-links: build the list of files hard-linked
494 # to the one we are {de,}compressing.
495 # NB. This is not optimum has the file will eventually be
496 # compressed as many times it has hard-links. But for now,
497 # that's the safe way.
498 inode=`ls -li "$FILE" | awk '{print $1}'`
499 HLINKS=`find . \! -name "$FILE" -inum $inode`
500
501 if [ -n "$HLINKS" ]; then
502 # We have hard-links! Remove them now.
503 for i in $HLINKS; do rm -f "$i"; done
504 fi
505
506 # Now take care of the file that has no hard-link
507 # We do decompress first to re-compress with the selected
508 # compression ratio later on...
509 case "$FILE" in
510 *.bz2)
511 bunzip2 $FILE
512 FILE=`basename "$FILE" .bz2`
513 ;;
514 *.gz)
515 gunzip $FILE
516 FILE=`basename "$FILE" .gz`
517 ;;
518 esac
519
520 # Compress the file with the given compression ratio, if needed
521 case $COMP_SUF in
522 *bz2)
523 bzip2 ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
524 echo "Compressed $FILE" &gt; $DEST_FD1
525 ;;
526 *gz)
527 gzip ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
528 echo "Compressed $FILE" &gt; $DEST_FD1
529 ;;
530 *)
531 echo "Uncompressed $FILE" &gt; $DEST_FD1
532 ;;
533 esac
534
535 # If the file had hard-links, recreate those (either hard or soft)
536 if [ -n "$HLINKS" ]; then
537 for i in $HLINKS; do
538 NEWFILE=`echo "$i" | sed s/\.gz$// | sed s/\.bz2$//`
539 if [ "$LN_OPT" = "-S" ]; then
540 # Make this hard-link a soft- one
541 ln -s "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
542 else
543 # Keep the hard-link a hard- one
544 ln "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
545 fi
546 # Really work only for hard-links. Harmless for soft-links
547 chmod 644 "${NEWFILE}$COMP_SUF"
548 done
549 fi
550
551 else
552 # There is a problem when we get neither a symlink nor a plain
553 # file. Obviously, we shall never ever come here... :-(
554 echo -n "Whaooo... \"${DIR}/${FILE}\" is neither a symlink "
555 echo "nor a plain file. Please check:"
556 ls -l "${DIR}/${FILE}"
557 exit 1
558 fi
559 fi
560 done # for FILE
561done # for DIR</literal>
562
563EOF</userinput></screen>
564
565 <note>
566 <para>
567 Doing a very large copy/paste directly to a terminal may result in a
568 corrupted file. Copying to an editor may overcome this issue.
569 </para>
570 </note>
571
572 <para>As <systemitem class="username">root</systemitem>, make
573 <command>compressdoc</command> executable for all users:</para>
574
575<screen><userinput>chmod -v 755 /usr/sbin/compressdoc</userinput></screen>
576
577 <para>Now, as <systemitem class="username">root</systemitem>, you can issue
578 the command <command>compressdoc --bz2</command> to compress all your system man
579 pages. You can also run <command>compressdoc --help</command> to get
580 comprehensive help about what the script is able to do.</para>
581
582 <para>Don't forget that a few programs, like the <application>X Window
583 System</application> and <application>XEmacs</application> also
584 install their documentation in non-standard places (such as
585 <filename class="directory">/usr/X11R6/man</filename>, etc.). Be sure
586 to add these locations to the file <filename>/etc/man_db.conf</filename>, as
587 <envar>MANDATORY_MANPATH</envar> <replaceable>&lt;/path&gt;</replaceable>
588 lines.</para>
589
590 <para>Example:</para>
591
592<screen><literal> ...
593 MANDATORY_MANPATH /usr/share/man
594 MANDATORY_MANPATH /usr/X11R6/man
595 MANDATORY_MANPATH /usr/local/man
596 MANDATORY_MANPATH /opt/qt/doc/man
597 ...</literal></screen>
598
599 <para>Generally, package installation systems do not compress man/info pages,
600 which means you will need to run the script again if you want to keep the size
601 of your documentation as small as possible. Also, note that running the script
602 after upgrading a package is safe; when you have several versions of a page
603 (for example, one compressed and one uncompressed), the most recent one is kept
604 and the others are deleted.</para>
605
606</sect1>
Note: See TracBrowser for help on using the repository browser.