source: postlfs/config/compressdoc.xml@ f8d632a

10.0 10.1 11.0 6.0 6.1 6.2 6.2.0 6.2.0-rc1 6.2.0-rc2 6.3 6.3-rc1 6.3-rc2 6.3-rc3 7.10 7.4 7.5 7.6 7.6-blfs 7.6-systemd 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 9.0 9.1 basic bdubbs/svn elogind gnome kde5-13430 kde5-14269 kde5-14686 ken/refactor-virt krejzi/svn lazarus nosym perl-modules qt5new systemd-11177 systemd-13485 trunk xry111/git-date xry111/git-date-for-trunk xry111/git-date-test
Last change on this file since f8d632a was f8d632a, checked in by Bruce Dubbs <bdubbs@…>, 17 years ago

New XML Chapter 3

git-svn-id: svn://svn.linuxfromscratch.org/BLFS/trunk/BOOK@2287 af4574ff-66df-0310-9fd7-8a98e5e911e0

  • Property mode set to 100644
File size: 16.6 KB
Line 
1<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
3 "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
4 <!ENTITY % general-entities SYSTEM "../../general.ent">
5 %general-entities;
6]>
7
8<sect1 id="postlfs-config-compressdoc" xreflabel="compressdoc">
9<?dbhtml filename="compressdoc.html"?>
10<title>Compressing man and info pages</title>
11
12<para>Man and info reader programs can transparently process gzip'ed or
13bzip2'ed pages, a feature you can use to free some disk space while keeping
14your documentation available. However, things are not that simple; man
15directories tend to contain links&mdash;hard and symbolic&mdash;which defeat simple
16ideas like recursively calling <command>gzip</command> on them. A better way
17to go is to use the script below.
18</para>
19
20<screen><userinput><command>cat &gt; /usr/sbin/compressdoc &lt;&lt; "EOF"</command>
21#!/bin/bash
22# VERSION: 20040320.0026
23#
24# Compress (with bzip2 or gzip) all man pages in a hierarchy and
25# update symlinks - By Marc Heerdink &lt;marc @ koelkast.net&gt;
26# Modified to be able to gzip or bzip2 files as an option and to deal
27# with all symlinks properly by Mark Hymers &lt;markh @ linuxfromscratch.org&gt;
28#
29# Modified 20030930 by Yann E. Morin &lt;yann.morin.1998 @ anciens.enib.fr&gt;
30# to accept compression/decompression, to correctly handle hard-links,
31# to allow for changing hard-links into soft- ones, to specify the
32# compression level, to parse the man.conf for all occurrences of MANPATH,
33# to allow for a backup, to allow to keep the newest version of a page.
34# Modified 20040330 by Tushar Teredesai to replace $0 by the name of the script.
35# (Note: It is assumed that the script is in the user's PATH)
36#
37# TODO:
38# - choose a default compress method to be based on the available
39# tool : gzip or bzip2;
40# - offer an option to automagically choose the best compression method
41# on a per page basis (eg. check which ofgzip/bzip2/whatever is the
42# most effective, page per page);
43# - when a MANPATH env var exists, use this instead of /etc/man.conf
44# (useful for users to (de)compress their man pages;
45# - offer an option to restore a previous backup;
46# - add other compression engines (compress, zip, etc?). Needed?
47
48# Funny enough, this function prints some help.
49function help ()
50{
51 if [ -n "$1" ]; then
52 echo "Unknown option : $1"
53 fi
54 ( echo "Usage: $MY_NAME &lt;comp_method&gt; [options] [dirs]" &amp;&amp; \
55 cat &lt;&lt; EOT
56Where comp_method is one of :
57 --gzip, --gz, -g
58 --bzip2, --bz2, -b
59 Compress using gzip or bzip2.
60
61 --decompress, -d
62 Decompress the man pages.
63
64 --backup Specify a .tar backup shall be done for every directories.
65 In case a backup already exists, it is saved as .tar.old prior
66 to making the new backup. If an .tar.old backup exist, it is
67 removed prior to saving the backup.
68 In backup mode, no other action is performed.
69
70And where options are :
71 -1 to -9, --fast, --best
72 The compression level, as accepted by gzip and bzip2. When not
73 specified, uses the default compression level for the given
74 method (-6 for gzip, and -9 for bzip2). Not used when in backup
75 or decompress modes.
76
77 --force, -F Force (re-)compression, even if the previous one was the same
78 method. Useful when changing the compression ratio. By default,
79 a page will not be re-compressed if it ends with the same suffix
80 as the method adds (.bz2 for bzip2, .gz for gzip).
81
82 --soft, -S Change hard-links into soft-links. Use with _caution_ as the
83 first encountered file will be used as a reference. Not used
84 when in backup mode.
85
86 --hard, -H Change soft-links into hard-links. Not used when in backup mode.
87
88 --conf=dir, --conf dir
89 Specify the location of man.conf. Defaults to /etc.
90
91 --verbose, -v Verbose mode, print the name of the directory being processed.
92 Double the flag to turn it even more verbose, and to print the
93 name of the file being processed.
94
95 --fake, -f Fakes it. Print the actual parameters compman will use.
96
97 dirs A list of space-separated _absolute_ pathname to the man
98 directories.
99 When empty, and only then, parse ${MAN_CONF}/man.conf for all
100 occurrences of MANPATH.
101
102Note about compression
103 There has been a discussion on blfs-support about compression ratios of
104 both gzip and bzip2 on man pages, taking into account the hosting fs,
105 the architecture, etc... On the overall, the conclusion was that gzip
106 was much efficient on 'small' files, and bzip2 on 'big' files, small and
107 big being very dependent on the content of the files.
108
109 See the original post from Mickael A. Peters, titled "Bootable Utility CD",
110 and dated 20030409.1816(+0200), and subsequent posts:
111 http://linuxfromscratch.org/pipermail/blfs-support/2003-April/038817.html
112
113 On my system (x86, ext3), man pages were 35564kiB before compression. gzip -9
114 compressed them down to 20372kiB (57.28%), bzip2 -9 got down to 19812kiB
115 (55.71%). That is a 1.57% gain in space. YMMV.
116
117 What was not taken into consideration was the decompression speed. But does
118 it make sense to? You gain fast access with uncompressed man pages, or you
119 gain space at the expense of a slight overhead in time. Well, my P4-2.5GHz
120 does not even let me notice this... :-)
121EOT
122) | less
123}
124
125# This function checks that the man page is unique amongst bzip2'd, gzip'd and
126# uncompressed versions.
127# $1 the directory in which the file resides
128# $2 the file name for the man page
129# Returns 0 (true) if the file is the latest and must be taken care of, and 1
130# (false) if the file is not the latest (and has therefore been deleted).
131function check_unique ()
132{
133 # NB. When there are hard-links to this file, these are
134 # _not_ deleted. In fact, if there are hard-links, they
135 # all have the same date/time, thus making them ready
136 # for deletion later on.
137
138 # Build the list of all man pages with the same name
139 DIR=$1
140 BASENAME=`basename "${2}" .bz2`
141 BASENAME=`basename "${BASENAME}" .gz`
142 GZ_FILE="$BASENAME".gz
143 BZ_FILE="$BASENAME".bz2
144
145 # Look for, and keep, the most recent one
146 LATEST=`(cd "$DIR"; ls -1rt "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}" 2&gt;/dev/null | tail -n 1)`
147 for i in "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}"; do
148 [ "$LATEST" != "$i" ] &amp;&amp; rm -f "$DIR"/"$i"
149 done
150
151 # In case the specified file was the latest, return 0
152 [ "$LATEST" = "$2" ] &amp;&amp; return 0
153 # If the file was not the latest, return 1
154 return 1
155}
156
157# Name of the script
158MY_NAME=`basename $0`
159
160# OK, parse the command-line for arguments, and initialize to some sensible
161# state, that is : don't change links state, parse /etc/man.conf, be most
162# silent, search man.conf in /etc, and don't force (re-)compression.
163COMP_METHOD=
164COMP_SUF=
165COMP_LVL=
166FORCE_OPT=
167LN_OPT=
168MAN_DIR=
169VERBOSE_LVL=0
170BACKUP=no
171FAKE=no
172MAN_CONF=/etc
173while [ -n "$1" ]; do
174 case $1 in
175 --gzip|--gz|-g)
176 COMP_SUF=.gz
177 COMP_METHOD=$1
178 shift
179 ;;
180 --bzip2|--bz2|-b)
181 COMP_SUF=.bz2
182 COMP_METHOD=$1
183 shift
184 ;;
185 --decompress|-d)
186 COMP_SUF=
187 COMP_LVL=
188 COMP_METHOD=$1
189 shift
190 ;;
191 -[1-9]|--fast|--best)
192 COMP_LVL=$1
193 shift
194 ;;
195 --force|-F)
196 FORCE_OPT=-F
197 shift
198 ;;
199 --soft|-S)
200 LN_OPT=-S
201 shift
202 ;;
203 --hard|-H)
204 LN_OPT=-H
205 shift
206 ;;
207 --conf=*)
208 MAN_CONF=`echo $1 | cut -d '=' -f2-`
209 shift
210 ;;
211 --conf)
212 MAN_CONF="$2"
213 shift 2
214 ;;
215 --verbose|-v)
216 let VERBOSE_LVL++
217 shift
218 ;;
219 --backup)
220 BACKUP=yes
221 shift
222 ;;
223 --fake|-f)
224 FAKE=yes
225 shift
226 ;;
227 --help|-h)
228 help
229 exit 0
230 ;;
231 /*)
232 MAN_DIR="${MAN_DIR} ${1}"
233 shift
234 ;;
235 -*)
236 help $1
237 exit 1
238 ;;
239 *)
240 echo "\"$1\" is not an absolute path name"
241 exit 1
242 ;;
243 esac
244done
245
246# Redirections
247case $VERBOSE_LVL in
248 0)
249 # O, be silent
250 DEST_FD0=/dev/null
251 DEST_FD1=/dev/null
252 VERBOSE_OPT=
253 ;;
254 1)
255 # 1, be a bit verbose
256 DEST_FD0=/dev/stdout
257 DEST_FD1=/dev/null
258 VERBOSE_OPT=-v
259 ;;
260 *)
261 # 2 and above, be most verbose
262 DEST_FD0=/dev/stdout
263 DEST_FD1=/dev/stdout
264 VERBOSE_OPT="-v -v"
265 ;;
266esac
267
268# Note: on my machine, 'man --path' gives /usr/share/man twice, once with a trailing '/', once without.
269if [ -z "$MAN_DIR" ]; then
270 MAN_DIR=`man --path -C "$MAN_CONF"/man.conf \
271 | sed 's/:/\\n/g' \
272 | while read foo; do dirname "$foo"/.; done \
273 | sort -u \
274 | while read bar; do echo -n "$bar "; done`
275fi
276
277# If no MANPATH in ${MAN_CONF}/man.conf, abort as well
278if [ -z "$MAN_DIR" ]; then
279 echo "No directory specified, and no directory found with \`man --path'"
280 exit 1
281fi
282
283# Fake?
284if [ "$FAKE" != "no" ]; then
285 echo "Actual parameters used:"
286 echo -n "Compression.......: "
287 case $COMP_METHOD in
288 --bzip2|--bz2|-b) echo -n "bzip2";;
289 --gzip|__gz|-g) echo -n "gzip";;
290 --decompress|-d) echo -n "decompressing";;
291 *) echo -n "unknown";;
292 esac
293 echo " ($COMP_METHOD)"
294 echo "Compression level.: $COMP_LVL"
295 echo "Compression suffix: $COMP_SUF"
296 echo -n "Force compression.: "
297 [ "foo$FORCE_OPT" = "foo-F" ] &amp;&amp; echo "yes" || echo "no"
298 echo "man.conf is.......: ${MAN_CONF}/man.conf"
299 echo -n "Hard-links........: "
300 [ "foo$LN_OPT" = "foo-S" ] &amp;&amp; echo "convert to soft-links" || echo "leave as is"
301 echo -n "Soft-links........: "
302 [ "foo$LN_OPT" = "foo-H" ] &amp;&amp; echo "convert to hard-links" || echo "leave as is"
303 echo "Backup............: $BACKUP"
304 echo "Faking (yes!).....: $FAKE"
305 echo "Directories.......: $MAN_DIR"
306 echo "Verbosity level...: $VERBOSE_LVL"
307 exit 0
308fi
309
310# If no method was specified, print help
311if [ -z "${COMP_METHOD}" -a "${BACKUP}" = "no" ]; then
312 help
313 exit 1
314fi
315
316# In backup mode, do the backup solely
317if [ "$BACKUP" = "yes" ]; then
318 for DIR in $MAN_DIR; do
319 cd "${DIR}/.."
320 DIR_NAME=`basename "${DIR}"`
321 echo "Backing up $DIR..." &gt; $DEST_FD0
322 [ -f "${DIR_NAME}.tar.old" ] &amp;&amp; rm -f "${DIR_NAME}.tar.old"
323 [ -f "${DIR_NAME}.tar" ] &amp;&amp; mv "${DIR_NAME}.tar" "${DIR_NAME}.tar.old"
324 tar cfv "${DIR_NAME}.tar" "${DIR_NAME}" &gt; $DEST_FD1
325 done
326 exit 0
327fi
328
329# I know MAN_DIR has only absolute path names
330# I need to take into account the localized man, so I'm going recursive
331for DIR in $MAN_DIR; do
332 MEM_DIR=`pwd`
333 cd "$DIR"
334 for FILE in *; do
335 # Fixes the case were the directory is empty
336 if [ "foo$FILE" = "foo*" ]; then continue; fi
337
338 # Fixes the case when hard-links see their compression scheme change
339 # (from not compressed to compressed, or from bz2 to gz, or from gz to bz2)
340 # Also fixes the case when multiple version of the page are present, which
341 # are either compressed or not.
342 if [ ! -L "$FILE" -a ! -e "$FILE" ]; then continue; fi
343
344 # Do not compress whatis files
345 if [ "$FILE" = "whatis" ]; then continue; fi
346
347 if [ -d "$FILE" ]; then
348 cd "${MEM_DIR}" # Go back to where we ran "$0", in case "$0"=="./compressdoc" ...
349 # We are going recursive to that directory
350 echo "-&gt; Entering ${DIR}/${FILE}..." &gt; $DEST_FD0
351 # I need not pass --conf, as I specify the directory to work on
352 # But I need exit in case of error
353 "$MY_NAME" ${COMP_METHOD} ${COMP_LVL} ${LN_OPT} ${VERBOSE_OPT} ${FORCE_OPT} "${DIR}/${FILE}" || exit 1
354 echo "&lt;- Leaving ${DIR}/${FILE}." &gt; $DEST_FD1
355 cd "$DIR" # Needed for the next iteration of the loop
356
357 else # !dir
358 if ! check_unique "$DIR" "$FILE"; then continue; fi
359
360 # Check if the file is already compressed with the specified method
361 BASE_FILE=`basename "$FILE" .gz`
362 BASE_FILE=`basename "$BASE_FILE" .bz2`
363 if [ "${FILE}" = "${BASE_FILE}${COMP_SUF}" -a "foo${FORCE_OPT}" = "foo" ]; then continue; fi
364
365 # If we have a symlink
366 if [ -h "$FILE" ]; then
367 case "$FILE" in
368 *.bz2)
369 EXT=bz2 ;;
370 *.gz)
371 EXT=gz ;;
372 *)
373 EXT=none ;;
374 esac
375
376 if [ ! "$EXT" = "none" ]; then
377 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 | tr -d " " | sed s/\.$EXT$//`
378 NEWNAME=`echo "$FILE" | sed s/\.$EXT$//`
379 mv "$FILE" "$NEWNAME"
380 FILE="$NEWNAME"
381 else
382 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 | tr -d " "`
383 fi
384
385 if [ "$LN_OPT" = "-H" ]; then
386 # Change this soft-link into a hard- one
387 rm -f "$FILE" &amp;&amp; ln "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
388 chmod --reference "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
389 else
390 # Keep this soft-link a soft- one.
391 rm -f "$FILE" &amp;&amp; ln -s "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
392 fi
393 echo "Relinked $FILE" &gt; $DEST_FD1
394
395 # else if we have a plain file
396 elif [ -f "$FILE" ]; then
397 # Take care of hard-links: build the list of files hard-linked
398 # to the one we are {de,}compressing.
399 # NB. This is not optimum has the file will eventually be compressed
400 # as many times it has hard-links. But for now, that's the safe way.
401 inode=`ls -li "$FILE" | awk '{print $1}'`
402 HLINKS=`find . \! -name "$FILE" -inum $inode`
403
404 if [ -n "$HLINKS" ]; then
405 # We have hard-links! Remove them now.
406 for i in $HLINKS; do rm -f "$i"; done
407 fi
408
409 # Now take care of the file that has no hard-link
410 # We do decompress first to re-compress with the selected
411 # compression ratio later on...
412 case "$FILE" in
413 *.bz2)
414 bunzip2 $FILE
415 FILE=`basename "$FILE" .bz2`
416 ;;
417 *.gz)
418 gunzip $FILE
419 FILE=`basename "$FILE" .gz`
420 ;;
421 esac
422
423 # Compress the file with the given compression ratio, if needed
424 case $COMP_SUF in
425 *bz2)
426 bzip2 ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
427 echo "Compressed $FILE" &gt; $DEST_FD1
428 ;;
429 *gz)
430 gzip ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
431 echo "Compressed $FILE" &gt; $DEST_FD1
432 ;;
433 *)
434 echo "Uncompressed $FILE" &gt; $DEST_FD1
435 ;;
436 esac
437
438 # If the file had hard-links, recreate those (either hard or soft)
439 if [ -n "$HLINKS" ]; then
440 for i in $HLINKS; do
441 NEWFILE=`echo "$i" | sed s/\.gz$// | sed s/\.bz2$//`
442 if [ "$LN_OPT" = "-S" ]; then
443 # Make this hard-link a soft- one
444 ln -s "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
445 else
446 # Keep the hard-link a hard- one
447 ln "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
448 fi
449 chmod 644 "${NEWFILE}$COMP_SUF" # Really work only for hard-links. Harmless for soft-links
450 done
451 fi
452
453 else
454 # There is a problem when we get neither a symlink nor a plain file
455 # Obviously, we shall never ever come here... :-(
456 echo "Whaooo... \"${DIR}/${FILE}\" is neither a symlink nor a plain file. Please check:"
457 ls -l "${DIR}/${FILE}"
458 exit 1
459 fi
460 fi
461 done # for FILE
462done # for DIR
463<command>EOF
464chmod 755 /usr/sbin/compressdoc</command></userinput></screen>
465
466<para>Now, as root, you can issue a
467<command>compressdoc --bz2</command> to compress all your system man
468pages. You can also run <command>compressdoc --help</command> to get
469comprehensive help about what the script is able to do.</para>
470
471<para> Don't forget that a few programs, like the <application>X</application>
472Window System and <application>XEmacs</application> also install their
473documentation in non standard places (such as <filename class="directory">
474/usr/X11R6/man</filename>, etc...). Be sure to add these locations to the
475file <filename>/etc/man.conf</filename>, as a
476<envar>MANPATH</envar>=<replaceable>/path</replaceable> section.</para>
477<para> Example:</para><screen><userinput>
478 ...
479 MANPATH=/usr/share/man
480 MANPATH=/usr/local/man
481 MANPATH=/usr/X11R6/man
482 MANPATH=/opt/qt/doc/man
483 ...</userinput></screen>
484
485<para>Generally, package installation systems do not compress man/info pages,
486which means you will need to run the script again if you want to keep the size
487of your documentation as small as possible. Also, note that running the script
488after upgrading a package is safe; when you have several versions of a page
489(for example, one compressed and one uncompressed), the most recent one is kept
490and the others deleted.</para>
491
492</sect1>
493
Note: See TracBrowser for help on using the repository browser.