source: postlfs/config/compressdoc.xml@ 2fa79e3b

10.0 10.1 11.0 6.0 6.1 6.2 6.2.0 6.2.0-rc1 6.2.0-rc2 6.3 6.3-rc1 6.3-rc2 6.3-rc3 7.10 7.4 7.5 7.6 7.6-blfs 7.6-systemd 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 9.0 9.1 basic bdubbs/svn elogind gnome kde5-13430 kde5-14269 kde5-14686 ken/refactor-virt krejzi/svn lazarus nosym perl-modules qt5new systemd-11177 systemd-13485 trunk upgradedb v5_0 v5_0-pre1 v5_1 v5_1-pre1 xry111/git-date xry111/git-date-for-trunk xry111/git-date-test
Last change on this file since 2fa79e3b was 2fa79e3b, checked in by Larry Lawrence <larry@…>, 18 years ago

applied compressdoc patch

git-svn-id: svn://svn.linuxfromscratch.org/BLFS/trunk/BOOK@1295 af4574ff-66df-0310-9fd7-8a98e5e911e0

  • Property mode set to 100644
File size: 16.1 KB
Line 
1<sect1 id="postlfs-config-compressdoc" xreflabel="compressdoc">
2<?dbhtml filename="compressdoc.html" dir="postlfs"?>
3<title>Compressing man and info pages</title>
4
5<para>Man and info reader programs can transparently process gzip'ed or
6bzip2'ed pages, a feature you can use to free some disk space while keeping
7your documentation available. However, things are not that simple: man
8directories tend to contain links - hard and symbolic - which defeat simple
9ideas like recursively calling <command>gzip</command> on them. A better way
10to go is to use the script below.
11</para>
12
13<screen><userinput><command>cat &gt; /usr/bin/compressdoc &lt;&lt; "EOF"</command>
14
15#!/bin/bash
16# VERSION: 20031004.0245
17#
18# Compress (with bzip2 or gzip) all man pages in a hierarchy and
19# update symlinks - By Marc Heerdink &lt;marc @ koelkast.net&gt;
20# Modified to be able to gzip or bzip2 files as an option and to deal
21# with all symlinks properly by Mark Hymers &lt;markh @ linuxfromscratch.org&gt;
22#
23# Modified 20030930 by Yann E. Morin &lt;yann.morin.1998 @ anciens.enib.fr&gt;
24# to accept compression/decompression, to correctly handle hard-links,
25# to allow for changing hard-links into soft- ones, to specify the
26# compression level, to parse the man.conf for all occurrences of MANPATH,
27# to allow for a backup, to allow to keep the newest version of a page.
28#
29# TODO:
30# - choose a default compress method to be based on the available
31# tool : gzip or bzip2;
32# - when a MANPATH env var exists, use this instead of /etc/man.conf
33# (useful for users to (de)compress their man pages;
34# - offer an option to restore a previous backup;
35# - add other compression engines (compress, zip, etc?). Needed?
36
37# Funny enough, this function prints some help.
38function help ()
39{
40 if [ -n "$1" ]; then
41 echo "Unknown option : $1"
42 fi
43 ( echo "Usage: $0 &lt;comp_method&gt; [options] [dirs]" &amp;&amp; \
44 cat &lt;&lt; EOT
45Where comp_method is one of :
46 --gzip, --gz, -g
47 --bzip2, --bz2, -b
48 Compress using gzip or bzip2.
49
50 --decompress, -d
51 Decompress the man pages.
52
53 --backup Specify a .tar backup shall be done for every directories.
54 In case a backup already exists, it is saved as .tar.old prior
55 to making the new backup. If an .tar.old backup exist, it is
56 removed prior to saving the backup.
57 In backup mode, no other action is performed.
58
59And where options are :
60 -1 to -9, --fast, --best
61 The compression level, as accepted by gzip and bzip2. When not
62 specified, uses the default compression level for the given
63 method (-6 for gzip, and -9 for bzip2). Not used when in backup
64 or decompress modes.
65
66 --force, -F Force (re-)compression, even if the previous one was the same
67 method. Useful when changing the compression ratio. By default,
68 a page will not be re-compressed if it ends with the same suffix
69 as the method adds (.bz2 for bzip2, .gz for gzip).
70
71 --soft, -S Change hard-links into soft-links. Use with _caution_ as the
72 first encountered file will be used as a reference. Not used
73 when in backup mode.
74
75 --hard, -H Change soft-links into hard-links. Not used when in backup mode.
76
77 --conf=dir, --conf dir
78 Specify the location of man.conf. Defaults to /etc.
79
80 --verbose, -v Verbose mode, print the name of the directory being processed.
81 Double the flag to turn it even more verbose, and to print the
82 name of the file being processed.
83
84 --fake, -f Fakes it. Print the actual parameters compman will use.
85
86 dirs A list of space-separated _absolute_ pathname to the man
87 directories.
88 When empty, and only then, parse ${MAN_CONF}/man.conf for all
89 occurrences of MANPATH.
90
91Note about compression
92 There has been a discussion on blfs-support about compression ratios of
93 both gzip and bzip2 on man pages, taking into account the hosting fs,
94 the architecture, etc... On the overall, the conclusion was that gzip
95 was much efficient on 'small' files, and bzip2 on 'big' files, small and
96 big being very dependent on the content of the files.
97
98 See the original post from Mickael A. Peters, titled "Bootable Utility CD",
99 and dated 20030409.1816(+0200), and subsequent posts:
100 http://linuxfromscratch.org/pipermail/blfs-support/2003-April/038817.html
101
102 On my system (x86, ext3), man pages were 35564kiB before compression. gzip -9
103 compressed them down to 20372kiB (57.28%), bzip2 -9 got down to 19812kiB
104 (55.71%). That is a 1.57% gain in space. YMMV.
105
106 What was not taken into consideration was the decompression speed. But does
107 it make sense to? You gain fast access with uncompressed man pages, or you
108 gain space at the expense of a slight overhead in time. Well, my P4-2.5GHz
109 does not even let me notice this... :-)
110EOT
111) | less
112}
113
114# This function checks that the man page is unique amongst bzip2'd, gzip'd and
115# uncompressed versions.
116# $1 the directory in which the file resides
117# $2 the file name for the man page
118# Returns 0 (true) if the file is the latest and must be taken care of, and 1
119# (false) if the file is not the latest (and has therefore been deleted).
120function check_unique ()
121{
122 # NB. When there are hard-links to this file, these are
123 # _not_ deleted. In fact, if there are hard-links, they
124 # all have the same date/time, thus making them ready
125 # for deletion later on.
126
127 # Build the list of all man pages with the same name
128 DIR=$1
129 BASENAME=`basename "${2}" .bz2`
130 BASENAME=`basename "${BASENAME}" .gz`
131 LIST=
132 [ -f "$DIR"/"${BASENAME}" -o -L "$DIR"/"${BASENAME}" ] &amp;&amp; LIST="${LIST} ${BASENAME}"
133 [ -f "$DIR"/"${BASENAME}".gz -o -L "$DIR"/"${BASENAME}".gz ] &amp;&amp; LIST="${LIST} ${BASENAME}.gz"
134 [ -f "$DIR"/"${BASENAME}".bz2 -o -L "$DIR"/"${BASENAME}".bz2 ] &amp;&amp; LIST="${LIST} ${BASENAME}.bz2"
135
136 # Look for, and keep, the most recent one
137 LATEST=`(cd "$DIR"; ls -1rt $LIST | tail -1)`
138 for i in $LIST; do
139 [ "$LATEST" != "$i" ] &amp;&amp; rm -f "$DIR"/"$i"
140 done
141
142 # In case the specified file was the latest, return 0
143 [ "$LATEST" = "$2" ] &amp;&amp; return 0
144 # If the file was not the latest, return 1
145 return 1
146}
147
148# OK, parse the command-line for arguments, and initialize to some sensible
149# state, that is : don't change links state, parse /etc/man.conf, be most
150# silent, search man.conf in /etc, and don't force (re-)compression.
151COMP_METHOD=
152COMP_SUF=
153COMP_LVL=
154FORCE_OPT=
155LN_OPT=
156MAN_DIR=
157VERBOSE_LVL=0
158BACKUP=no
159FAKE=no
160MAN_CONF=/etc
161while [ -n "$1" ]; do
162 case $1 in
163 --gzip|--gz|-g)
164 COMP_SUF=.gz
165 COMP_METHOD=$1
166 shift
167 ;;
168 --bzip2|--bz2|-b)
169 COMP_SUF=.bz2
170 COMP_METHOD=$1
171 shift
172 ;;
173 --decompress|-d)
174 COMP_SUF=
175 COMP_LVL=
176 COMP_METHOD=$1
177 shift
178 ;;
179 -[1-9]|--fast|--best)
180 COMP_LVL=$1
181 shift
182 ;;
183 --force|-F)
184 FORCE_OPT=-F
185 shift
186 ;;
187 --soft|-S)
188 LN_OPT=-S
189 shift
190 ;;
191 --hard|-H)
192 LN_OPT=-H
193 shift
194 ;;
195 --conf=*)
196 MAN_CONF=`echo $1 | cut -d '=' -f2-`
197 shift
198 ;;
199 --conf)
200 MAN_CONF="$2"
201 shift 2
202 ;;
203 --verbose|-v)
204 let VERBOSE_LVL++
205 shift
206 ;;
207 --backup)
208 BACKUP=yes
209 shift
210 ;;
211 --fake|-f)
212 FAKE=yes
213 shift
214 ;;
215 --help|-h)
216 help
217 exit 0
218 ;;
219 /*)
220 MAN_DIR="${MAN_DIR} ${1}"
221 shift
222 ;;
223 -*)
224 help $1
225 exit 1
226 ;;
227 *)
228 echo "\"$1\" is not an absolute path name"
229 exit 1
230 ;;
231 esac
232done
233
234# Redirections
235case $VERBOSE_LVL in
236 0)
237 # O, be silent
238 DEST_FD0=/dev/null
239 DEST_FD1=/dev/null
240 VERBOSE_OPT=
241 ;;
242 1)
243 # 1, be a bit verbose
244 DEST_FD0=/dev/stdout
245 DEST_FD1=/dev/null
246 VERBOSE_OPT=-v
247 ;;
248 *)
249 # 2 and above, be most verbose
250 DEST_FD0=/dev/stdout
251 DEST_FD1=/dev/stdout
252 VERBOSE_OPT="-v -v"
253 ;;
254esac
255
256# Note: on my machine, 'man --path' gives /usr/share/man twice, once with a trailing '/', once without.
257if [ -z "$MAN_DIR" ]; then
258 MAN_DIR=`man --path -C "$MAN_CONF"/man.conf \
259 | sed 's/:/\\n/g' \
260 | while read foo; do dirname "$foo"/.; done \
261 | sort -u \
262 | while read bar; do echo -n "$bar "; done`
263fi
264
265# If no MANPATH in ${MAN_CONF}/man.conf, abort as well
266if [ -z "$MAN_DIR" ]; then
267 echo "No directory specified, and no directory found with \`man --path'"
268 exit 1
269fi
270
271# Fake?
272if [ "$FAKE" != "no" ]; then
273 echo "Actual parameters used:"
274 echo -n "Compression.......: "
275 case $COMP_METHOD in
276 --bzip2|--bz2|-b) echo -n "bzip2";;
277 --gzip|__gz|-g) echo -n "gzip";;
278 --decompress|-d) echo -n "decompressing";;
279 *) echo -n "unknown";;
280 esac
281 echo " ($COMP_METHOD)"
282 echo "Compression level.: $COMP_LVL"
283 echo "Compression suffix: $COMP_SUF"
284 echo -n "Force compression.: "
285 [ "foo$FORCE_OPT" = "foo-F" ] &amp;&amp; echo "yes" || echo "no"
286 echo "man.conf is.......: ${MAN_CONF}/man.conf"
287 echo -n "Hard-links........: "
288 [ "foo$LN_OPT" = "foo-S" ] &amp;&amp; echo "convert to soft-links" || echo "leave as is"
289 echo -n "Soft-links........: "
290 [ "foo$LN_OPT" = "foo-H" ] &amp;&amp; echo "convert to hard-links" || echo "leave as is"
291 echo "Backup............: $BACKUP"
292 echo "Faking (yes!).....: $FAKE"
293 echo "Directories.......: $MAN_DIR"
294 echo "Verbosity level...: $VERBOSE_LVL"
295 exit 0
296fi
297
298# If no method was specified, print help
299if [ -z "${COMP_METHOD}" -a "${BACKUP}" = "no" ]; then
300 help
301 exit 1
302fi
303
304# In backup mode, do the backup solely
305if [ "$BACKUP" = "yes" ]; then
306 for DIR in $MAN_DIR; do
307 cd "${DIR}/.."
308 DIR_NAME=`basename "${DIR}"`
309 echo "Backing up $DIR..." &gt; $DEST_FD0
310 [ -f "${DIR_NAME}.tar.old" ] &amp;&amp; rm -f "${DIR_NAME}.tar.old"
311 [ -f "${DIR_NAME}.tar" ] &amp;&amp; mv "${DIR_NAME}.tar" "${DIR_NAME}.tar.old"
312 tar cfv "${DIR_NAME}.tar" "${DIR_NAME}" &gt; $DEST_FD1
313 done
314 exit 0
315fi
316
317# I know MAN_DIR has only absolute path names
318# I need to take into account the localized man, so I'm going recursive
319for DIR in $MAN_DIR; do
320 MEM_DIR=`pwd`
321 cd "$DIR"
322 for FILE in *; do
323 # Fixes the case were the directory is empty
324 if [ "foo$FILE" = "foo*" ]; then continue; fi
325
326 # Fixes the case when hard-links see their compression scheme change
327 # (from not compressed to compressed, or from bz2 to gz, or from gz to bz2)
328 # Also fixes the case when multiple version of the page are present, which
329 # are either compressed or not.
330 if [ ! -L "$FILE" -a ! -e "$FILE" ]; then continue; fi
331
332 if [ -d "$FILE" ]; then
333 cd "${MEM_DIR}" # Go back to where we ran "$0", in case "$0"=="./compressdoc" ...
334 # We are going recursive to that directory
335 echo "-&gt; Entering ${DIR}/${FILE}..." &gt; $DEST_FD0
336 # I need not pass --conf, as I specify the directory to work on
337 # But I need exit in case of error
338 "$0" ${COMP_METHOD} ${COMP_LVL} ${LN_OPT} ${VERBOSE_OPT} ${FORCE_OPT} "${DIR}/${FILE}" || exit 1
339 echo "&lt;- Leaving ${DIR}/${FILE}." &gt; $DEST_FD1
340 cd "$DIR" # Needed for the next iteration of the loop
341
342 else # !dir
343 if ! check_unique "$DIR" "$FILE"; then continue; fi
344
345 # Check if the file is already compressed with the specified method
346 BASE_FILE=`basename \`basename "$FILE" .bz2\` .gz`
347 if [ "${FILE}" = "${BASE_FILE}${COMP_SUF}" -a "foo${FORCE_OPT}" = "foo" ]; then continue; fi
348
349 # If we have a symlink
350 if [ -h "$FILE" ]; then
351 case $FILE in
352 *.bz2)
353 EXT=bz2 ;;
354 *.gz)
355 EXT=gz ;;
356 *)
357 EXT=none ;;
358 esac
359
360 if [ ! "$EXT" = "none" ]; then
361 LINK=`ls -l $FILE | cut -d "&gt;" -f2 | tr -d " " | sed s/\.$EXT$//`
362 NEWNAME=`echo "$FILE" | sed s/\.$EXT$//`
363 mv "$FILE" "$NEWNAME"
364 FILE="$NEWNAME"
365 else
366 LINK=`ls -l $FILE | cut -d "&gt;" -f2 | tr -d " "`
367 fi
368
369 if [ "$LN_OPT" = "-H" ]; then
370 # Change this soft-link into a hard- one
371 rm -f "$FILE" &amp;&amp; ln "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
372 chmod --reference "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
373 else
374 # Keep this soft-link a soft- one.
375 rm -f "$FILE" &amp;&amp; ln -s "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
376 fi
377 echo "Relinked $FILE" &gt; $DEST_FD1
378
379 # else if we have a plain file
380 elif [ -f "$FILE" ]; then
381 # Take care of hard-links: build the list of files hard-linked
382 # to the one we are {de,}compressing.
383 # NB. This is not optimum has the file will eventually be compressed
384 # as many times it has hard-links. But for now, that's the safe way.
385 inode=`ls -li "$FILE" | awk '{print $1}'`
386 HLINKS=`find . \! -name "$FILE" -inum $inode`
387
388 if [ -n "$HLINKS" ]; then
389 # We have hard-links! Remove them now.
390 for i in $HLINKS; do rm -f "$i"; done
391 fi
392
393 # Now take care of the file that has no hard-link
394 # We do decompress first to re-compress with the selected
395 # compression ratio later on...
396 case $FILE in
397 *.bz2)
398 bunzip2 $FILE
399 FILE=`basename "$FILE" .bz2`
400 ;;
401 *.gz)
402 gunzip $FILE
403 FILE=`basename "$FILE" .gz`
404 ;;
405 esac
406
407 # Compress the file with the given compression ratio, if needed
408 case $COMP_SUF in
409 *bz2)
410 bzip2 ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
411 echo "Compressed $FILE" &gt; $DEST_FD1
412 ;;
413 *gz)
414 gzip ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
415 echo "Compressed $FILE" &gt; $DEST_FD1
416 ;;
417 *)
418 echo "Uncompressed $FILE" &gt; $DEST_FD1
419 ;;
420 esac
421
422 # If the file had hard-links, recreate those (either hard or soft)
423 if [ -n "$HLINKS" ]; then
424 for i in $HLINKS; do
425 NEWFILE=`echo $i | sed s/\.gz$// | sed s/\.bz2$//`
426 if [ "$LN_OPT" = "-S" ]; then
427 # Make this hard-link a soft- one
428 ln -s "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
429 else
430 # Keep the hard-link a hard- one
431 ln "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
432 fi
433 chmod 644 "${NEWFILE}$COMP_SUF" # Really work only for hard-links. Harmless for soft-links
434 done
435 fi
436
437 else
438 # There is a problem when we get neither a symlink nor a plain file
439 # Obviously, we shall never ever come here... :-(
440 echo "Whaooo... \"${DIR}/${FILE}\" is neither a symlink nor a plain file. Please check:"
441 ls -l ${DIR}/${FILE}
442 exit 1
443 fi
444 fi
445 done # for FILE
446done # for DIR
447<command>EOF
448chmod 755 /usr/bin/compressdoc</command></userinput></screen>
449
450<para>Now, as root, you can issue a
451<command>/usr/bin/compressdoc --bz2</command> to compress all your system man
452pages. You can also run <command>/usr/bin/compressdoc --help</command> to get
453a comprehensive help about what the script is able to do.</para>
454
455<para> Don't forget that a few programs, like the <application>X</application>
456Window system, <application>XEmacs</application>, also install their
457documentation in non standard places (such as <filename class="directory">
458/usr/X11R6/man</filename>, etc...). Don't forget to add those locations in the
459file <filename>/etc/man.conf</filename>, as a
460<envar>MANPATH</envar>=<replaceable>/path</replaceable> section.</para>
461<para> Example:<screen><userinput>
462 ...
463 MANPATH=/usr/share/man
464 MANPATH=/usr/local/man
465 MANPATH=/usr/X11R6/man
466 MANPATH=/opt/qt/doc/man
467 ...</userinput></screen></para>
468
469<para>Generally, package installation systems do not compress man/info pages,
470which means you will need to run the script again if you want to keep the size
471of your documentation as small as possible. Also, note that running the script
472after upgrading a package is safe: when you have several versions of a page
473(for example, one compressed and one uncompressed), the most recent one is kept
474and the others deleted.</para>
475
476</sect1>
477
Note: See TracBrowser for help on using the repository browser.