source: postlfs/config/compressdoc.xml@ b614eb09

10.0 10.1 11.0 6.0 6.1 6.2 6.2.0 6.2.0-rc1 6.2.0-rc2 6.3 6.3-rc1 6.3-rc2 6.3-rc3 7.10 7.4 7.5 7.6 7.6-blfs 7.6-systemd 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 9.0 9.1 basic bdubbs/svn elogind gnome kde5-13430 kde5-14269 kde5-14686 ken/refactor-virt krejzi/svn lazarus nosym perl-modules qt5new systemd-11177 systemd-13485 trunk upgradedb v5_0 v5_1 v5_1-pre1 xry111/git-date xry111/git-date-for-trunk xry111/git-date-test
Last change on this file since b614eb09 was b614eb09, checked in by Larry Lawrence <larry@…>, 18 years ago

compressdoc patch, intro expansion

git-svn-id: svn://svn.linuxfromscratch.org/BLFS/trunk/BOOK@1356 af4574ff-66df-0310-9fd7-8a98e5e911e0

  • Property mode set to 100644
File size: 15.9 KB
Line 
1<sect1 id="postlfs-config-compressdoc" xreflabel="compressdoc">
2<?dbhtml filename="compressdoc.html" dir="postlfs"?>
3<title>Compressing man and info pages</title>
4
5<para>Man and info reader programs can transparently process gzip'ed or
6bzip2'ed pages, a feature you can use to free some disk space while keeping
7your documentation available. However, things are not that simple: man
8directories tend to contain links - hard and symbolic - which defeat simple
9ideas like recursively calling <command>gzip</command> on them. A better way
10to go is to use the script below.
11</para>
12
13<screen><userinput><command>cat &gt; /usr/bin/compressdoc &lt;&lt; "EOF"</command>
14#!/bin/bash
15# VERSION: 20031009.1920
16#
17# Compress (with bzip2 or gzip) all man pages in a hierarchy and
18# update symlinks - By Marc Heerdink &lt;marc @ koelkast.net&gt;
19# Modified to be able to gzip or bzip2 files as an option and to deal
20# with all symlinks properly by Mark Hymers &lt;markh @ linuxfromscratch.org&gt;
21#
22# Modified 20030930 by Yann E. Morin &lt;yann.morin.1998 @ anciens.enib.fr&gt;
23# to accept compression/decompression, to correctly handle hard-links,
24# to allow for changing hard-links into soft- ones, to specify the
25# compression level, to parse the man.conf for all occurrences of MANPATH,
26# to allow for a backup, to allow to keep the newest version of a page.
27#
28# TODO:
29# - choose a default compress method to be based on the available
30# tool : gzip or bzip2;
31# - when a MANPATH env var exists, use this instead of /etc/man.conf
32# (useful for users to (de)compress their man pages;
33# - offer an option to restore a previous backup;
34# - add other compression engines (compress, zip, etc?). Needed?
35
36# Funny enough, this function prints some help.
37function help ()
38{
39 if [ -n "$1" ]; then
40 echo "Unknown option : $1"
41 fi
42 ( echo "Usage: $0 &lt;comp_method&gt; [options] [dirs]" &amp;&amp; \
43 cat &lt;&lt; EOT
44Where comp_method is one of :
45 --gzip, --gz, -g
46 --bzip2, --bz2, -b
47 Compress using gzip or bzip2.
48
49 --decompress, -d
50 Decompress the man pages.
51
52 --backup Specify a .tar backup shall be done for every directories.
53 In case a backup already exists, it is saved as .tar.old prior
54 to making the new backup. If an .tar.old backup exist, it is
55 removed prior to saving the backup.
56 In backup mode, no other action is performed.
57
58And where options are :
59 -1 to -9, --fast, --best
60 The compression level, as accepted by gzip and bzip2. When not
61 specified, uses the default compression level for the given
62 method (-6 for gzip, and -9 for bzip2). Not used when in backup
63 or decompress modes.
64
65 --force, -F Force (re-)compression, even if the previous one was the same
66 method. Useful when changing the compression ratio. By default,
67 a page will not be re-compressed if it ends with the same suffix
68 as the method adds (.bz2 for bzip2, .gz for gzip).
69
70 --soft, -S Change hard-links into soft-links. Use with _caution_ as the
71 first encountered file will be used as a reference. Not used
72 when in backup mode.
73
74 --hard, -H Change soft-links into hard-links. Not used when in backup mode.
75
76 --conf=dir, --conf dir
77 Specify the location of man.conf. Defaults to /etc.
78
79 --verbose, -v Verbose mode, print the name of the directory being processed.
80 Double the flag to turn it even more verbose, and to print the
81 name of the file being processed.
82
83 --fake, -f Fakes it. Print the actual parameters compman will use.
84
85 dirs A list of space-separated _absolute_ pathname to the man
86 directories.
87 When empty, and only then, parse ${MAN_CONF}/man.conf for all
88 occurrences of MANPATH.
89
90Note about compression
91 There has been a discussion on blfs-support about compression ratios of
92 both gzip and bzip2 on man pages, taking into account the hosting fs,
93 the architecture, etc... On the overall, the conclusion was that gzip
94 was much efficient on 'small' files, and bzip2 on 'big' files, small and
95 big being very dependent on the content of the files.
96
97 See the original post from Mickael A. Peters, titled "Bootable Utility CD",
98 and dated 20030409.1816(+0200), and subsequent posts:
99 http://linuxfromscratch.org/pipermail/blfs-support/2003-April/038817.html
100
101 On my system (x86, ext3), man pages were 35564kiB before compression. gzip -9
102 compressed them down to 20372kiB (57.28%), bzip2 -9 got down to 19812kiB
103 (55.71%). That is a 1.57% gain in space. YMMV.
104
105 What was not taken into consideration was the decompression speed. But does
106 it make sense to? You gain fast access with uncompressed man pages, or you
107 gain space at the expense of a slight overhead in time. Well, my P4-2.5GHz
108 does not even let me notice this... :-)
109EOT
110) | less
111}
112
113# This function checks that the man page is unique amongst bzip2'd, gzip'd and
114# uncompressed versions.
115# $1 the directory in which the file resides
116# $2 the file name for the man page
117# Returns 0 (true) if the file is the latest and must be taken care of, and 1
118# (false) if the file is not the latest (and has therefore been deleted).
119function check_unique ()
120{
121 # NB. When there are hard-links to this file, these are
122 # _not_ deleted. In fact, if there are hard-links, they
123 # all have the same date/time, thus making them ready
124 # for deletion later on.
125
126 # Build the list of all man pages with the same name
127 DIR=$1
128 BASENAME=`basename "${2}" .bz2`
129 BASENAME=`basename "${BASENAME}" .gz`
130 GZ_FILE="$BASENAME".bz2
131 BZ_FILE="$BASENAME".bz2
132
133 # Look for, and keep, the most recent one
134 LATEST=`(cd "$DIR"; ls -1rt "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}" 2&gt;/dev/null | tail -1)`
135 for i in "${BASENAME}" "${GZ_FILE}".gz "${BZ_FILE}".bz2; do
136 [ "$LATEST" != "$i" ] &amp;&amp; rm -f "$DIR"/"$i"
137 done
138
139 # In case the specified file was the latest, return 0
140 [ "$LATEST" = "$2" ] &amp;&amp; return 0
141 # If the file was not the latest, return 1
142 return 1
143}
144
145# OK, parse the command-line for arguments, and initialize to some sensible
146# state, that is : don't change links state, parse /etc/man.conf, be most
147# silent, search man.conf in /etc, and don't force (re-)compression.
148COMP_METHOD=
149COMP_SUF=
150COMP_LVL=
151FORCE_OPT=
152LN_OPT=
153MAN_DIR=
154VERBOSE_LVL=0
155BACKUP=no
156FAKE=no
157MAN_CONF=/etc
158while [ -n "$1" ]; do
159 case $1 in
160 --gzip|--gz|-g)
161 COMP_SUF=.gz
162 COMP_METHOD=$1
163 shift
164 ;;
165 --bzip2|--bz2|-b)
166 COMP_SUF=.bz2
167 COMP_METHOD=$1
168 shift
169 ;;
170 --decompress|-d)
171 COMP_SUF=
172 COMP_LVL=
173 COMP_METHOD=$1
174 shift
175 ;;
176 -[1-9]|--fast|--best)
177 COMP_LVL=$1
178 shift
179 ;;
180 --force|-F)
181 FORCE_OPT=-F
182 shift
183 ;;
184 --soft|-S)
185 LN_OPT=-S
186 shift
187 ;;
188 --hard|-H)
189 LN_OPT=-H
190 shift
191 ;;
192 --conf=*)
193 MAN_CONF=`echo $1 | cut -d '=' -f2-`
194 shift
195 ;;
196 --conf)
197 MAN_CONF="$2"
198 shift 2
199 ;;
200 --verbose|-v)
201 let VERBOSE_LVL++
202 shift
203 ;;
204 --backup)
205 BACKUP=yes
206 shift
207 ;;
208 --fake|-f)
209 FAKE=yes
210 shift
211 ;;
212 --help|-h)
213 help
214 exit 0
215 ;;
216 /*)
217 MAN_DIR="${MAN_DIR} ${1}"
218 shift
219 ;;
220 -*)
221 help $1
222 exit 1
223 ;;
224 *)
225 echo "\"$1\" is not an absolute path name"
226 exit 1
227 ;;
228 esac
229done
230
231# Redirections
232case $VERBOSE_LVL in
233 0)
234 # O, be silent
235 DEST_FD0=/dev/null
236 DEST_FD1=/dev/null
237 VERBOSE_OPT=
238 ;;
239 1)
240 # 1, be a bit verbose
241 DEST_FD0=/dev/stdout
242 DEST_FD1=/dev/null
243 VERBOSE_OPT=-v
244 ;;
245 *)
246 # 2 and above, be most verbose
247 DEST_FD0=/dev/stdout
248 DEST_FD1=/dev/stdout
249 VERBOSE_OPT="-v -v"
250 ;;
251esac
252
253# Note: on my machine, 'man --path' gives /usr/share/man twice, once with a trailing '/', once without.
254if [ -z "$MAN_DIR" ]; then
255 MAN_DIR=`man --path -C "$MAN_CONF"/man.conf \
256 | sed 's/:/\\n/g' \
257 | while read foo; do dirname "$foo"/.; done \
258 | sort -u \
259 | while read bar; do echo -n "$bar "; done`
260fi
261
262# If no MANPATH in ${MAN_CONF}/man.conf, abort as well
263if [ -z "$MAN_DIR" ]; then
264 echo "No directory specified, and no directory found with \`man --path'"
265 exit 1
266fi
267
268# Fake?
269if [ "$FAKE" != "no" ]; then
270 echo "Actual parameters used:"
271 echo -n "Compression.......: "
272 case $COMP_METHOD in
273 --bzip2|--bz2|-b) echo -n "bzip2";;
274 --gzip|__gz|-g) echo -n "gzip";;
275 --decompress|-d) echo -n "decompressing";;
276 *) echo -n "unknown";;
277 esac
278 echo " ($COMP_METHOD)"
279 echo "Compression level.: $COMP_LVL"
280 echo "Compression suffix: $COMP_SUF"
281 echo -n "Force compression.: "
282 [ "foo$FORCE_OPT" = "foo-F" ] &amp;&amp; echo "yes" || echo "no"
283 echo "man.conf is.......: ${MAN_CONF}/man.conf"
284 echo -n "Hard-links........: "
285 [ "foo$LN_OPT" = "foo-S" ] &amp;&amp; echo "convert to soft-links" || echo "leave as is"
286 echo -n "Soft-links........: "
287 [ "foo$LN_OPT" = "foo-H" ] &amp;&amp; echo "convert to hard-links" || echo "leave as is"
288 echo "Backup............: $BACKUP"
289 echo "Faking (yes!).....: $FAKE"
290 echo "Directories.......: $MAN_DIR"
291 echo "Verbosity level...: $VERBOSE_LVL"
292 exit 0
293fi
294
295# If no method was specified, print help
296if [ -z "${COMP_METHOD}" -a "${BACKUP}" = "no" ]; then
297 help
298 exit 1
299fi
300
301# In backup mode, do the backup solely
302if [ "$BACKUP" = "yes" ]; then
303 for DIR in $MAN_DIR; do
304 cd "${DIR}/.."
305 DIR_NAME=`basename "${DIR}"`
306 echo "Backing up $DIR..." &gt; $DEST_FD0
307 [ -f "${DIR_NAME}.tar.old" ] &amp;&amp; rm -f "${DIR_NAME}.tar.old"
308 [ -f "${DIR_NAME}.tar" ] &amp;&amp; mv "${DIR_NAME}.tar" "${DIR_NAME}.tar.old"
309 tar cfv "${DIR_NAME}.tar" "${DIR_NAME}" &gt; $DEST_FD1
310 done
311 exit 0
312fi
313
314# I know MAN_DIR has only absolute path names
315# I need to take into account the localized man, so I'm going recursive
316for DIR in $MAN_DIR; do
317 MEM_DIR=`pwd`
318 cd "$DIR"
319 for FILE in *; do
320 # Fixes the case were the directory is empty
321 if [ "foo$FILE" = "foo*" ]; then continue; fi
322
323 # Fixes the case when hard-links see their compression scheme change
324 # (from not compressed to compressed, or from bz2 to gz, or from gz to bz2)
325 # Also fixes the case when multiple version of the page are present, which
326 # are either compressed or not.
327 if [ ! -L "$FILE" -a ! -e "$FILE" ]; then continue; fi
328
329 if [ -d "$FILE" ]; then
330 cd "${MEM_DIR}" # Go back to where we ran "$0", in case "$0"=="./compressdoc" ...
331 # We are going recursive to that directory
332 echo "-&gt; Entering ${DIR}/${FILE}..." &gt; $DEST_FD0
333 # I need not pass --conf, as I specify the directory to work on
334 # But I need exit in case of error
335 "$0" ${COMP_METHOD} ${COMP_LVL} ${LN_OPT} ${VERBOSE_OPT} ${FORCE_OPT} "${DIR}/${FILE}" || exit 1
336 echo "&lt;- Leaving ${DIR}/${FILE}." &gt; $DEST_FD1
337 cd "$DIR" # Needed for the next iteration of the loop
338
339 else # !dir
340 if ! check_unique "$DIR" "$FILE"; then continue; fi
341
342 # Check if the file is already compressed with the specified method
343 BASE_FILE=`basename "$FILE" .gz`
344 BASE_FILE=`basename "$FILE" .bz2`
345 if [ "${FILE}" = "${BASE_FILE}${COMP_SUF}" -a "foo${FORCE_OPT}" = "foo" ]; then continue; fi
346
347 # If we have a symlink
348 if [ -h "$FILE" ]; then
349 case "$FILE" in
350 *.bz2)
351 EXT=bz2 ;;
352 *.gz)
353 EXT=gz ;;
354 *)
355 EXT=none ;;
356 esac
357
358 if [ ! "$EXT" = "none" ]; then
359 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 | tr -d " " | sed s/\.$EXT$//`
360 NEWNAME=`echo "$FILE" | sed s/\.$EXT$//`
361 mv "$FILE" "$NEWNAME"
362 FILE="$NEWNAME"
363 else
364 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 | tr -d " "`
365 fi
366
367 if [ "$LN_OPT" = "-H" ]; then
368 # Change this soft-link into a hard- one
369 rm -f "$FILE" &amp;&amp; ln "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
370 chmod --reference "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
371 else
372 # Keep this soft-link a soft- one.
373 rm -f "$FILE" &amp;&amp; ln -s "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
374 fi
375 echo "Relinked $FILE" &gt; $DEST_FD1
376
377 # else if we have a plain file
378 elif [ -f "$FILE" ]; then
379 # Take care of hard-links: build the list of files hard-linked
380 # to the one we are {de,}compressing.
381 # NB. This is not optimum has the file will eventually be compressed
382 # as many times it has hard-links. But for now, that's the safe way.
383 inode=`ls -li "$FILE" | awk '{print $1}'`
384 HLINKS=`find . \! -name "$FILE" -inum $inode`
385
386 if [ -n "$HLINKS" ]; then
387 # We have hard-links! Remove them now.
388 for i in $HLINKS; do rm -f "$i"; done
389 fi
390
391 # Now take care of the file that has no hard-link
392 # We do decompress first to re-compress with the selected
393 # compression ratio later on...
394 case "$FILE" in
395 *.bz2)
396 bunzip2 $FILE
397 FILE=`basename "$FILE" .bz2`
398 ;;
399 *.gz)
400 gunzip $FILE
401 FILE=`basename "$FILE" .gz`
402 ;;
403 esac
404
405 # Compress the file with the given compression ratio, if needed
406 case $COMP_SUF in
407 *bz2)
408 bzip2 ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
409 echo "Compressed $FILE" &gt; $DEST_FD1
410 ;;
411 *gz)
412 gzip ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
413 echo "Compressed $FILE" &gt; $DEST_FD1
414 ;;
415 *)
416 echo "Uncompressed $FILE" &gt; $DEST_FD1
417 ;;
418 esac
419
420 # If the file had hard-links, recreate those (either hard or soft)
421 if [ -n "$HLINKS" ]; then
422 for i in $HLINKS; do
423 NEWFILE=`echo "$i" | sed s/\.gz$// | sed s/\.bz2$//`
424 if [ "$LN_OPT" = "-S" ]; then
425 # Make this hard-link a soft- one
426 ln -s "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
427 else
428 # Keep the hard-link a hard- one
429 ln "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
430 fi
431 chmod 644 "${NEWFILE}$COMP_SUF" # Really work only for hard-links. Harmless for soft-links
432 done
433 fi
434
435 else
436 # There is a problem when we get neither a symlink nor a plain file
437 # Obviously, we shall never ever come here... :-(
438 echo "Whaooo... \"${DIR}/${FILE}\" is neither a symlink nor a plain file. Please check:"
439 ls -l "${DIR}/${FILE}"
440 exit 1
441 fi
442 fi
443 done # for FILE
444done # for DIR
445<command>EOF
446chmod 755 /usr/bin/compressdoc</command></userinput></screen>
447
448<para>Now, as root, you can issue a
449<command>/usr/bin/compressdoc --bz2</command> to compress all your system man
450pages. You can also run <command>/usr/bin/compressdoc --help</command> to get
451a comprehensive help about what the script is able to do.</para>
452
453<para> Don't forget that a few programs, like the <application>X</application>
454Window system, <application>XEmacs</application>, also install their
455documentation in non standard places (such as <filename class="directory">
456/usr/X11R6/man</filename>, etc...). Don't forget to add those locations in the
457file <filename>/etc/man.conf</filename>, as a
458<envar>MANPATH</envar>=<replaceable>/path</replaceable> section.</para>
459<para> Example:<screen><userinput>
460 ...
461 MANPATH=/usr/share/man
462 MANPATH=/usr/local/man
463 MANPATH=/usr/X11R6/man
464 MANPATH=/opt/qt/doc/man
465 ...</userinput></screen></para>
466
467<para>Generally, package installation systems do not compress man/info pages,
468which means you will need to run the script again if you want to keep the size
469of your documentation as small as possible. Also, note that running the script
470after upgrading a package is safe: when you have several versions of a page
471(for example, one compressed and one uncompressed), the most recent one is kept
472and the others deleted.</para>
473
474</sect1>
475
Note: See TracBrowser for help on using the repository browser.