source: archive/compressdoc.xml@ 4a570af

11.0 lazarus qt5new trunk
Last change on this file since 4a570af was 4a570af, checked in by Xi Ruoyao <xry111@…>, 5 months ago

secure linuxfromscratch.org url

  • Property mode set to 100644
File size: 19.9 KB
Line 
1<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
4 <!ENTITY % general-entities SYSTEM "../../general.ent">
5 %general-entities;
6]>
7
8<sect1 id="compressdoc" xreflabel="Compressing man and info pages">
9 <?dbhtml filename="compressdoc.html"?>
10
11 <sect1info>
12 <date>$Date$</date>
13 </sect1info>
14
15 <title>Compressing Man and Info Pages</title>
16
17 <indexterm zone="compressdoc">
18 <primary sortas="b-compressdoc">compressdoc</primary>
19 </indexterm>
20
21 <para>Man and info reader programs can transparently process files compressed
22 with <command>gzip</command> or <command>bzip2</command>, a feature you can
23 use to free some disk space while keeping your documentation
24 available. However, things are not that simple; man directories tend to
25 contain links&mdash;hard and symbolic&mdash;which defeat simple ideas like
26 recursively calling <command>gzip</command> on them. A better way to go is
27 to use the script below. If you would prefer to download the file instead of
28 creating it by typing or copy-and-pasting, you can find it at
29 <ulink url="&files-anduin;/compressdoc"/> (the file should be installed in
30 the <filename class="directory">/usr/sbin</filename> directory).</para>
31
32<screen role="root"><?dbfo keep-together="auto"?><userinput>cat &gt; /usr/sbin/compressdoc &lt;&lt; "EOF"
33<literal>#!/bin/bash
34# VERSION: 20080421.1623
35#
36# Compress (with bzip2 or gzip) all man pages in a hierarchy and
37# update symlinks - By Marc Heerdink &lt;marc @ koelkast.net&gt;
38#
39# Modified to be able to gzip or bzip2 files as an option and to deal
40# with all symlinks properly by Mark Hymers &lt;markh @ &lfs-domainname;&gt;
41#
42# Modified 20030930 by Yann E. Morin &lt;yann.morin.1998 @ anciens.enib.fr&gt;
43# to accept compression/decompression, to correctly handle hard-links,
44# to allow for changing hard-links into soft- ones, to specify the
45# compression level, to parse the man.conf for all occurrences of MANPATH,
46# to allow for a backup, to allow to keep the newest version of a page.
47#
48# Modified 20040330 by Tushar Teredesai to replace $0 by the name of the
49# script.
50# (Note: It is assumed that the script is in the user's PATH)
51#
52# Modified 20050112 by Randy McMurchy to shorten line lengths and
53# correct grammar errors.
54#
55# Modified 20060128 by Alexander E. Patrakov for compatibility with Man-DB.
56#
57# Modified 20060311 by Archaic to use Man-DB manpath utility which is a
58# replacement for man --path from Man.
59#
60# Modified 20080421 by Dan Nicholson to properly execute the correct
61# compressdoc when working recursively. This means the same compressdoc
62# will be used whether a full path was given or it was resolved from PATH.
63#
64# Modified 20080421 by Dan Nicholson to be more robust with directories
65# that don't exist or don't have sufficient permissions.
66#
67# Modified 20080421 by Lars Bamberger to (sort of) automatically choose
68# a compression method based on the size of the manpage. A couple bug
69# fixes were added by Dan Nicholson.
70#
71# Modified 20080421 by Dan Nicholson to suppress warnings from manpath
72# since these are emitted when $MANPATH is set. Removed the TODO for
73# using the $MANPATH variable since manpath(1) handles this already.
74#
75# TODO:
76# - choose a default compress method to be based on the available
77# tool : gzip or bzip2;
78# - offer an option to restore a previous backup;
79# - add other compression engines (compress, zip, etc?). Needed?
80
81# Funny enough, this function prints some help.
82function help ()
83{
84 if [ -n "$1" ]; then
85 echo "Unknown option : $1"
86 fi
87 ( echo "Usage: $MY_NAME &lt;comp_method&gt; [options] [dirs]" &amp;&amp; \
88 cat &lt;&lt; EOT
89Where comp_method is one of :
90 --gzip, --gz, -g
91 --bzip2, --bz2, -b
92 Compress using gzip or bzip2.
93 --automatic
94 Compress using either gzip or bzip2, depending on the
95 size of the file to be compressed. Files larger than 5
96 kB are bzipped, files larger than 1 kB are gzipped and
97 files smaller than 1 kB are not compressed.
98
99 --decompress, -d
100 Decompress the man pages.
101
102 --backup Specify a .tar backup shall be done for all directories.
103 In case a backup already exists, it is saved as .tar.old
104 prior to making the new backup. If a .tar.old backup
105 exists, it is removed prior to saving the backup.
106 In backup mode, no other action is performed.
107
108And where options are :
109 -1 to -9, --fast, --best
110 The compression level, as accepted by gzip and bzip2.
111 When not specified, uses the default compression level
112 for the given method (-6 for gzip, and -9 for bzip2).
113 Not used when in backup or decompress modes.
114
115 --force, -F Force (re-)compression, even if the previous one was
116 the same method. Useful when changing the compression
117 ratio. By default, a page will not be re-compressed if
118 it ends with the same suffix as the method adds
119 (.bz2 for bzip2, .gz for gzip).
120
121 --soft, -S Change hard-links into soft-links. Use with _caution_
122 as the first encountered file will be used as a
123 reference. Not used when in backup mode.
124
125 --hard, -H Change soft-links into hard-links. Not used when in
126 backup mode.
127
128 --conf=dir, --conf dir
129 Specify the location of man_db.conf. Defaults to /etc.
130
131 --verbose, -v Verbose mode, print the name of the directory being
132 processed. Double the flag to turn it even more verbose,
133 and to print the name of the file being processed.
134
135 --fake, -f Fakes it. Print the actual parameters compressdoc will use.
136
137 dirs A list of space-separated _absolute_ pathnames to the
138 man directories. When empty, and only then, use manpath
139 to parse ${MAN_CONF}/man_db.conf for all valid occurrences
140 of MANDATORY_MANPATH.
141
142Note about compression:
143 There has been a discussion on blfs-support about compression ratios of
144 both gzip and bzip2 on man pages, taking into account the hosting fs,
145 the architecture, etc... On the overall, the conclusion was that gzip
146 was much more efficient on 'small' files, and bzip2 on 'big' files,
147 small and big being very dependent on the content of the files.
148
149 See the original post from Mickael A. Peters, titled
150 "Bootable Utility CD", dated 20030409.1816(+0200), and subsequent posts:
151 https://&lfs-domainname;/pipermail/blfs-support/2003-April/038817.html
152
153 On my system (x86, ext3), man pages were 35564KB before compression.
154 gzip -9 compressed them down to 20372KB (57.28%), bzip2 -9 got down to
155 19812KB (55.71%). That is a 1.57% gain in space. YMMV.
156
157 What was not taken into consideration was the decompression speed. But
158 does it make sense to? You gain fast access with uncompressed man
159 pages, or you gain space at the expense of a slight overhead in time.
160 Well, my P4-2.5GHz does not even let me notice this... :-)
161
162EOT
163) | less
164}
165
166# This function checks that the man page is unique amongst bzip2'd,
167# gzip'd and uncompressed versions.
168# $1 the directory in which the file resides
169# $2 the file name for the man page
170# Returns 0 (true) if the file is the latest and must be taken care of,
171# and 1 (false) if the file is not the latest (and has therefore been
172# deleted).
173function check_unique ()
174{
175 # NB. When there are hard-links to this file, these are
176 # _not_ deleted. In fact, if there are hard-links, they
177 # all have the same date/time, thus making them ready
178 # for deletion later on.
179
180 # Build the list of all man pages with the same name
181 DIR=$1
182 BASENAME=`basename "${2}" .bz2`
183 BASENAME=`basename "${BASENAME}" .gz`
184 GZ_FILE="$BASENAME".gz
185 BZ_FILE="$BASENAME".bz2
186
187 # Look for, and keep, the most recent one
188 LATEST=`(cd "$DIR"; ls -1rt "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}" \
189 2&gt;/dev/null | tail -n 1)`
190 for i in "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}"; do
191 [ "$LATEST" != "$i" ] &amp;&amp; rm -f "$DIR"/"$i"
192 done
193
194 # In case the specified file was the latest, return 0
195 [ "$LATEST" = "$2" ] &amp;&amp; return 0
196 # If the file was not the latest, return 1
197 return 1
198}
199
200# Name of the script
201MY_NAME=`basename $0`
202
203# OK, parse the command-line for arguments, and initialize to some
204# sensible state, that is: don't change links state, parse
205# /etc/man_db.conf, be most silent, search man_db.conf in /etc, and don't
206# force (re-)compression.
207COMP_METHOD=
208COMP_SUF=
209COMP_LVL=
210FORCE_OPT=
211LN_OPT=
212MAN_DIR=
213VERBOSE_LVL=0
214BACKUP=no
215FAKE=no
216MAN_CONF=/etc
217while [ -n "$1" ]; do
218 case $1 in
219 --gzip|--gz|-g)
220 COMP_SUF=.gz
221 COMP_METHOD=$1
222 shift
223 ;;
224 --bzip2|--bz2|-b)
225 COMP_SUF=.bz2
226 COMP_METHOD=$1
227 shift
228 ;;
229 --automatic)
230 COMP_SUF=TBD
231 COMP_METHOD=$1
232 shift
233 ;;
234 --decompress|-d)
235 COMP_SUF=
236 COMP_LVL=
237 COMP_METHOD=$1
238 shift
239 ;;
240 -[1-9]|--fast|--best)
241 COMP_LVL=$1
242 shift
243 ;;
244 --force|-F)
245 FORCE_OPT=-F
246 shift
247 ;;
248 --soft|-S)
249 LN_OPT=-S
250 shift
251 ;;
252 --hard|-H)
253 LN_OPT=-H
254 shift
255 ;;
256 --conf=*)
257 MAN_CONF=`echo $1 | cut -d '=' -f2-`
258 shift
259 ;;
260 --conf)
261 MAN_CONF="$2"
262 shift 2
263 ;;
264 --verbose|-v)
265 let VERBOSE_LVL++
266 shift
267 ;;
268 --backup)
269 BACKUP=yes
270 shift
271 ;;
272 --fake|-f)
273 FAKE=yes
274 shift
275 ;;
276 --help|-h)
277 help
278 exit 0
279 ;;
280 /*)
281 MAN_DIR="${MAN_DIR} ${1}"
282 shift
283 ;;
284 -*)
285 help $1
286 exit 1
287 ;;
288 *)
289 echo "\"$1\" is not an absolute path name"
290 exit 1
291 ;;
292 esac
293done
294
295# Redirections
296case $VERBOSE_LVL in
297 0)
298 # O, be silent
299 DEST_FD0=/dev/null
300 DEST_FD1=/dev/null
301 VERBOSE_OPT=
302 ;;
303 1)
304 # 1, be a bit verbose
305 DEST_FD0=/dev/stdout
306 DEST_FD1=/dev/null
307 VERBOSE_OPT=-v
308 ;;
309 *)
310 # 2 and above, be most verbose
311 DEST_FD0=/dev/stdout
312 DEST_FD1=/dev/stdout
313 VERBOSE_OPT="-v -v"
314 ;;
315esac
316
317# Note: on my machine, 'man --path' gives /usr/share/man twice, once
318# with a trailing '/', once without.
319if [ -z "$MAN_DIR" ]; then
320 MAN_DIR=`manpath -q -C "$MAN_CONF"/man_db.conf \
321 | sed 's/:/\\n/g' \
322 | while read foo; do dirname "$foo"/.; done \
323 | sort -u \
324 | while read bar; do echo -n "$bar "; done`
325fi
326
327# If no MANDATORY_MANPATH in ${MAN_CONF}/man_db.conf, abort as well
328if [ -z "$MAN_DIR" ]; then
329 echo "No directory specified, and no directory found with \`manpath'"
330 exit 1
331fi
332
333# Check that the specified directories actually exist and are readable
334for DIR in $MAN_DIR; do
335 if [ ! -d "$DIR" -o ! -r "$DIR" ]; then
336 echo "Directory '$DIR' does not exist or is not readable"
337 exit 1
338 fi
339done
340
341# Fake?
342if [ "$FAKE" != "no" ]; then
343 echo "Actual parameters used:"
344 echo -n "Compression.......: "
345 case $COMP_METHOD in
346 --bzip2|--bz2|-b) echo -n "bzip2";;
347 --gzip|--gz|-g) echo -n "gzip";;
348 --automatic) echo -n "compressing";;
349 --decompress|-d) echo -n "decompressing";;
350 *) echo -n "unknown";;
351 esac
352 echo " ($COMP_METHOD)"
353 echo "Compression level.: $COMP_LVL"
354 echo "Compression suffix: $COMP_SUF"
355 echo -n "Force compression.: "
356 [ "foo$FORCE_OPT" = "foo-F" ] &amp;&amp; echo "yes" || echo "no"
357 echo "man_db.conf is....: ${MAN_CONF}/man_db.conf"
358 echo -n "Hard-links........: "
359 [ "foo$LN_OPT" = "foo-S" ] &amp;&amp;
360 echo "convert to soft-links" || echo "leave as is"
361 echo -n "Soft-links........: "
362 [ "foo$LN_OPT" = "foo-H" ] &amp;&amp;
363 echo "convert to hard-links" || echo "leave as is"
364 echo "Backup............: $BACKUP"
365 echo "Faking (yes!).....: $FAKE"
366 echo "Directories.......: $MAN_DIR"
367 echo "Verbosity level...: $VERBOSE_LVL"
368 exit 0
369fi
370
371# If no method was specified, print help
372if [ -z "${COMP_METHOD}" -a "${BACKUP}" = "no" ]; then
373 help
374 exit 1
375fi
376
377# In backup mode, do the backup solely
378if [ "$BACKUP" = "yes" ]; then
379 for DIR in $MAN_DIR; do
380 cd "${DIR}/.."
381 if [ ! -w "`pwd`" ]; then
382 echo "Directory '`pwd`' is not writable"
383 exit 1
384 fi
385 DIR_NAME=`basename "${DIR}"`
386 echo "Backing up $DIR..." &gt; $DEST_FD0
387 [ -f "${DIR_NAME}.tar.old" ] &amp;&amp; rm -f "${DIR_NAME}.tar.old"
388 [ -f "${DIR_NAME}.tar" ] &amp;&amp;
389 mv "${DIR_NAME}.tar" "${DIR_NAME}.tar.old"
390 tar -cvf "${DIR_NAME}.tar" "${DIR_NAME}" &gt; $DEST_FD1
391 done
392 exit 0
393fi
394
395# I know MAN_DIR has only absolute path names
396# I need to take into account the localized man, so I'm going recursive
397for DIR in $MAN_DIR; do
398 MEM_DIR=`pwd`
399 if [ ! -w "$DIR" ]; then
400 echo "Directory '$DIR' is not writable"
401 exit 1
402 fi
403 cd "$DIR"
404 for FILE in *; do
405 # Fixes the case were the directory is empty
406 if [ "foo$FILE" = "foo*" ]; then continue; fi
407
408 # Fixes the case when hard-links see their compression scheme change
409 # (from not compressed to compressed, or from bz2 to gz, or from gz
410 # to bz2)
411 # Also fixes the case when multiple version of the page are present,
412 # which are either compressed or not.
413 if [ ! -L "$FILE" -a ! -e "$FILE" ]; then continue; fi
414
415 # Do not compress whatis files
416 if [ "$FILE" = "whatis" ]; then continue; fi
417
418 if [ -d "$FILE" ]; then
419 # We are going recursive to that directory
420 echo "-&gt; Entering ${DIR}/${FILE}..." &gt; $DEST_FD0
421 # I need not pass --conf, as I specify the directory to work on
422 # But I need exit in case of error. We must change back to the
423 # original directory so $0 is resolved correctly.
424 (cd "$MEM_DIR" &amp;&amp; eval "$0" ${COMP_METHOD} ${COMP_LVL} ${LN_OPT} \
425 ${VERBOSE_OPT} ${FORCE_OPT} "${DIR}/${FILE}") || exit $?
426 echo "&lt;- Leaving ${DIR}/${FILE}." &gt; $DEST_FD1
427
428 else # !dir
429 if ! check_unique "$DIR" "$FILE"; then continue; fi
430
431 # With automatic compression, get the uncompressed file size of
432 # the file (dereferencing symlinks), and choose an appropriate
433 # compression method.
434 if [ "$COMP_METHOD" = "--automatic" ]; then
435 declare -i SIZE
436 case "$FILE" in
437 *.bz2)
438 SIZE=$(bzcat "$FILE" | wc -c) ;;
439 *.gz)
440 SIZE=$(zcat "$FILE" | wc -c) ;;
441 *)
442 SIZE=$(wc -c &lt; "$FILE") ;;
443 esac
444 if (( $SIZE &gt;= (5 * 2**10) )); then
445 COMP_SUF=.bz2
446 elif (( $SIZE &gt;= (1 * 2**10) )); then
447 COMP_SUF=.gz
448 else
449 COMP_SUF=
450 fi
451 fi
452
453 # Check if the file is already compressed with the specified method
454 BASE_FILE=`basename "$FILE" .gz`
455 BASE_FILE=`basename "$BASE_FILE" .bz2`
456 if [ "${FILE}" = "${BASE_FILE}${COMP_SUF}" \
457 -a "foo${FORCE_OPT}" = "foo" ]; then continue; fi
458
459 # If we have a symlink
460 if [ -h "$FILE" ]; then
461 case "$FILE" in
462 *.bz2)
463 EXT=bz2 ;;
464 *.gz)
465 EXT=gz ;;
466 *)
467 EXT=none ;;
468 esac
469
470 if [ ! "$EXT" = "none" ]; then
471 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 \
472 | tr -d " " | sed s/\.$EXT$//`
473 NEWNAME=`echo "$FILE" | sed s/\.$EXT$//`
474 mv "$FILE" "$NEWNAME"
475 FILE="$NEWNAME"
476 else
477 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 | tr -d " "`
478 fi
479
480 if [ "$LN_OPT" = "-H" ]; then
481 # Change this soft-link into a hard- one
482 rm -f "$FILE" &amp;&amp; ln "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
483 chmod --reference "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
484 else
485 # Keep this soft-link a soft- one.
486 rm -f "$FILE" &amp;&amp; ln -s "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
487 fi
488 echo "Relinked $FILE" &gt; $DEST_FD1
489
490 # else if we have a plain file
491 elif [ -f "$FILE" ]; then
492 # Take care of hard-links: build the list of files hard-linked
493 # to the one we are {de,}compressing.
494 # NB. This is not optimum has the file will eventually be
495 # compressed as many times it has hard-links. But for now,
496 # that's the safe way.
497 inode=`ls -li "$FILE" | awk '{print $1}'`
498 HLINKS=`find . \! -name "$FILE" -inum $inode`
499
500 if [ -n "$HLINKS" ]; then
501 # We have hard-links! Remove them now.
502 for i in $HLINKS; do rm -f "$i"; done
503 fi
504
505 # Now take care of the file that has no hard-link
506 # We do decompress first to re-compress with the selected
507 # compression ratio later on...
508 case "$FILE" in
509 *.bz2)
510 bunzip2 $FILE
511 FILE=`basename "$FILE" .bz2`
512 ;;
513 *.gz)
514 gunzip $FILE
515 FILE=`basename "$FILE" .gz`
516 ;;
517 esac
518
519 # Compress the file with the given compression ratio, if needed
520 case $COMP_SUF in
521 *bz2)
522 bzip2 ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
523 echo "Compressed $FILE" &gt; $DEST_FD1
524 ;;
525 *gz)
526 gzip ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
527 echo "Compressed $FILE" &gt; $DEST_FD1
528 ;;
529 *)
530 echo "Uncompressed $FILE" &gt; $DEST_FD1
531 ;;
532 esac
533
534 # If the file had hard-links, recreate those (either hard or soft)
535 if [ -n "$HLINKS" ]; then
536 for i in $HLINKS; do
537 NEWFILE=`echo "$i" | sed s/\.gz$// | sed s/\.bz2$//`
538 if [ "$LN_OPT" = "-S" ]; then
539 # Make this hard-link a soft- one
540 ln -s "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
541 else
542 # Keep the hard-link a hard- one
543 ln "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
544 fi
545 # Really work only for hard-links. Harmless for soft-links
546 chmod 644 "${NEWFILE}$COMP_SUF"
547 done
548 fi
549
550 else
551 # There is a problem when we get neither a symlink nor a plain
552 # file. Obviously, we shall never ever come here... :-(
553 echo -n "Whaooo... \"${DIR}/${FILE}\" is neither a symlink "
554 echo "nor a plain file. Please check:"
555 ls -l "${DIR}/${FILE}"
556 exit 1
557 fi
558 fi
559 done # for FILE
560done # for DIR</literal>
561
562EOF</userinput></screen>
563
564 <note>
565 <para>
566 Doing a very large copy/paste directly to a terminal may result in a
567 corrupted file. Copying to an editor may overcome this issue.
568 </para>
569 </note>
570
571 <para>As <systemitem class="username">root</systemitem>, make
572 <command>compressdoc</command> executable for all users:</para>
573
574<screen><userinput>chmod -v 755 /usr/sbin/compressdoc</userinput></screen>
575
576 <para>Now, as <systemitem class="username">root</systemitem>, you can issue
577 the command <command>compressdoc --bz2</command> to compress all your system man
578 pages. You can also run <command>compressdoc --help</command> to get
579 comprehensive help about what the script is able to do.</para>
580
581 <para>Don't forget that a few programs, like the <application>X Window
582 System</application> and <application>XEmacs</application> also
583 install their documentation in non-standard places (such as
584 <filename class="directory">/usr/X11R6/man</filename>, etc.). Be sure
585 to add these locations to the file <filename>/etc/man_db.conf</filename>, as
586 <envar>MANDATORY_MANPATH</envar> <replaceable>&lt;/path&gt;</replaceable>
587 lines.</para>
588
589 <para>Example:</para>
590
591<screen><literal> ...
592 MANDATORY_MANPATH /usr/share/man
593 MANDATORY_MANPATH /usr/X11R6/man
594 MANDATORY_MANPATH /usr/local/man
595 MANDATORY_MANPATH /opt/qt/doc/man
596 ...</literal></screen>
597
598 <para>Generally, package installation systems do not compress man/info pages,
599 which means you will need to run the script again if you want to keep the size
600 of your documentation as small as possible. Also, note that running the script
601 after upgrading a package is safe; when you have several versions of a page
602 (for example, one compressed and one uncompressed), the most recent one is kept
603 and the others are deleted.</para>
604
605</sect1>
Note: See TracBrowser for help on using the repository browser.