source: postlfs/config/compressdoc.xml@ a72f9b7

10.0 10.1 11.0 6.0 6.1 6.2 6.2.0 6.2.0-rc1 6.2.0-rc2 6.3 6.3-rc1 6.3-rc2 6.3-rc3 7.10 7.4 7.5 7.6 7.6-blfs 7.6-systemd 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 9.0 9.1 basic bdubbs/svn elogind gnome kde5-13430 kde5-14269 kde5-14686 ken/refactor-virt krejzi/svn lazarus nosym perl-modules qt5new systemd-11177 systemd-13485 trunk upgradedb v5_0 v5_1 v5_1-pre1 xry111/git-date xry111/git-date-for-trunk xry111/git-date-test
Last change on this file since a72f9b7 was a72f9b7, checked in by Igor Živković <igor@…>, 18 years ago

applied Yann's patch

git-svn-id: svn://svn.linuxfromscratch.org/BLFS/trunk/BOOK@1456 af4574ff-66df-0310-9fd7-8a98e5e911e0

  • Property mode set to 100644
File size: 16.2 KB
Line 
1<sect1 id="postlfs-config-compressdoc" xreflabel="compressdoc">
2<?dbhtml filename="compressdoc.html" dir="postlfs"?>
3<title>Compressing man and info pages</title>
4
5<para>Man and info reader programs can transparently process gzip'ed or
6bzip2'ed pages, a feature you can use to free some disk space while keeping
7your documentation available. However, things are not that simple: man
8directories tend to contain links - hard and symbolic - which defeat simple
9ideas like recursively calling <command>gzip</command> on them. A better way
10to go is to use the script below.
11</para>
12
13<screen><userinput><command>cat &gt; /usr/bin/compressdoc &lt;&lt; "EOF"</command>
14#!/bin/bash
15# VERSION: 20031029.0025
16#
17# Compress (with bzip2 or gzip) all man pages in a hierarchy and
18# update symlinks - By Marc Heerdink &lt;marc @ koelkast.net&gt;
19# Modified to be able to gzip or bzip2 files as an option and to deal
20# with all symlinks properly by Mark Hymers &lt;markh @ linuxfromscratch.org&gt;
21#
22# Modified 20030930 by Yann E. Morin &lt;yann.morin.1998 @ anciens.enib.fr&gt;
23# to accept compression/decompression, to correctly handle hard-links,
24# to allow for changing hard-links into soft- ones, to specify the
25# compression level, to parse the man.conf for all occurrences of MANPATH,
26# to allow for a backup, to allow to keep the newest version of a page.
27#
28# TODO:
29# - choose a default compress method to be based on the available
30# tool : gzip or bzip2;
31# - offer an option to automagically choose the best compression method
32# on a per page basis (eg. check which ofgzip/bzip2/whatever is the
33# most effective, page per page);
34# - when a MANPATH env var exists, use this instead of /etc/man.conf
35# (useful for users to (de)compress their man pages;
36# - offer an option to restore a previous backup;
37# - add other compression engines (compress, zip, etc?). Needed?
38
39# Funny enough, this function prints some help.
40function help ()
41{
42 if [ -n "$1" ]; then
43 echo "Unknown option : $1"
44 fi
45 ( echo "Usage: $0 &lt;comp_method&gt; [options] [dirs]" &amp;&amp; \
46 cat &lt;&lt; EOT
47Where comp_method is one of :
48 --gzip, --gz, -g
49 --bzip2, --bz2, -b
50 Compress using gzip or bzip2.
51
52 --decompress, -d
53 Decompress the man pages.
54
55 --backup Specify a .tar backup shall be done for every directories.
56 In case a backup already exists, it is saved as .tar.old prior
57 to making the new backup. If an .tar.old backup exist, it is
58 removed prior to saving the backup.
59 In backup mode, no other action is performed.
60
61And where options are :
62 -1 to -9, --fast, --best
63 The compression level, as accepted by gzip and bzip2. When not
64 specified, uses the default compression level for the given
65 method (-6 for gzip, and -9 for bzip2). Not used when in backup
66 or decompress modes.
67
68 --force, -F Force (re-)compression, even if the previous one was the same
69 method. Useful when changing the compression ratio. By default,
70 a page will not be re-compressed if it ends with the same suffix
71 as the method adds (.bz2 for bzip2, .gz for gzip).
72
73 --soft, -S Change hard-links into soft-links. Use with _caution_ as the
74 first encountered file will be used as a reference. Not used
75 when in backup mode.
76
77 --hard, -H Change soft-links into hard-links. Not used when in backup mode.
78
79 --conf=dir, --conf dir
80 Specify the location of man.conf. Defaults to /etc.
81
82 --verbose, -v Verbose mode, print the name of the directory being processed.
83 Double the flag to turn it even more verbose, and to print the
84 name of the file being processed.
85
86 --fake, -f Fakes it. Print the actual parameters compman will use.
87
88 dirs A list of space-separated _absolute_ pathname to the man
89 directories.
90 When empty, and only then, parse ${MAN_CONF}/man.conf for all
91 occurrences of MANPATH.
92
93Note about compression
94 There has been a discussion on blfs-support about compression ratios of
95 both gzip and bzip2 on man pages, taking into account the hosting fs,
96 the architecture, etc... On the overall, the conclusion was that gzip
97 was much efficient on 'small' files, and bzip2 on 'big' files, small and
98 big being very dependent on the content of the files.
99
100 See the original post from Mickael A. Peters, titled "Bootable Utility CD",
101 and dated 20030409.1816(+0200), and subsequent posts:
102 http://linuxfromscratch.org/pipermail/blfs-support/2003-April/038817.html
103
104 On my system (x86, ext3), man pages were 35564kiB before compression. gzip -9
105 compressed them down to 20372kiB (57.28%), bzip2 -9 got down to 19812kiB
106 (55.71%). That is a 1.57% gain in space. YMMV.
107
108 What was not taken into consideration was the decompression speed. But does
109 it make sense to? You gain fast access with uncompressed man pages, or you
110 gain space at the expense of a slight overhead in time. Well, my P4-2.5GHz
111 does not even let me notice this... :-)
112EOT
113) | less
114}
115
116# This function checks that the man page is unique amongst bzip2'd, gzip'd and
117# uncompressed versions.
118# $1 the directory in which the file resides
119# $2 the file name for the man page
120# Returns 0 (true) if the file is the latest and must be taken care of, and 1
121# (false) if the file is not the latest (and has therefore been deleted).
122function check_unique ()
123{
124 # NB. When there are hard-links to this file, these are
125 # _not_ deleted. In fact, if there are hard-links, they
126 # all have the same date/time, thus making them ready
127 # for deletion later on.
128
129 # Build the list of all man pages with the same name
130 DIR=$1
131 BASENAME=`basename "${2}" .bz2`
132 BASENAME=`basename "${BASENAME}" .gz`
133 GZ_FILE="$BASENAME".bz2
134 BZ_FILE="$BASENAME".bz2
135
136 # Look for, and keep, the most recent one
137 LATEST=`(cd "$DIR"; ls -1rt "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}" 2&gt;/dev/null | tail -1)`
138 for i in "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}"; do
139 [ "$LATEST" != "$i" ] &amp;&amp; rm -f "$DIR"/"$i"
140 done
141
142 # In case the specified file was the latest, return 0
143 [ "$LATEST" = "$2" ] &amp;&amp; return 0
144 # If the file was not the latest, return 1
145 return 1
146}
147
148# OK, parse the command-line for arguments, and initialize to some sensible
149# state, that is : don't change links state, parse /etc/man.conf, be most
150# silent, search man.conf in /etc, and don't force (re-)compression.
151COMP_METHOD=
152COMP_SUF=
153COMP_LVL=
154FORCE_OPT=
155LN_OPT=
156MAN_DIR=
157VERBOSE_LVL=0
158BACKUP=no
159FAKE=no
160MAN_CONF=/etc
161while [ -n "$1" ]; do
162 case $1 in
163 --gzip|--gz|-g)
164 COMP_SUF=.gz
165 COMP_METHOD=$1
166 shift
167 ;;
168 --bzip2|--bz2|-b)
169 COMP_SUF=.bz2
170 COMP_METHOD=$1
171 shift
172 ;;
173 --decompress|-d)
174 COMP_SUF=
175 COMP_LVL=
176 COMP_METHOD=$1
177 shift
178 ;;
179 -[1-9]|--fast|--best)
180 COMP_LVL=$1
181 shift
182 ;;
183 --force|-F)
184 FORCE_OPT=-F
185 shift
186 ;;
187 --soft|-S)
188 LN_OPT=-S
189 shift
190 ;;
191 --hard|-H)
192 LN_OPT=-H
193 shift
194 ;;
195 --conf=*)
196 MAN_CONF=`echo $1 | cut -d '=' -f2-`
197 shift
198 ;;
199 --conf)
200 MAN_CONF="$2"
201 shift 2
202 ;;
203 --verbose|-v)
204 let VERBOSE_LVL++
205 shift
206 ;;
207 --backup)
208 BACKUP=yes
209 shift
210 ;;
211 --fake|-f)
212 FAKE=yes
213 shift
214 ;;
215 --help|-h)
216 help
217 exit 0
218 ;;
219 /*)
220 MAN_DIR="${MAN_DIR} ${1}"
221 shift
222 ;;
223 -*)
224 help $1
225 exit 1
226 ;;
227 *)
228 echo "\"$1\" is not an absolute path name"
229 exit 1
230 ;;
231 esac
232done
233
234# Redirections
235case $VERBOSE_LVL in
236 0)
237 # O, be silent
238 DEST_FD0=/dev/null
239 DEST_FD1=/dev/null
240 VERBOSE_OPT=
241 ;;
242 1)
243 # 1, be a bit verbose
244 DEST_FD0=/dev/stdout
245 DEST_FD1=/dev/null
246 VERBOSE_OPT=-v
247 ;;
248 *)
249 # 2 and above, be most verbose
250 DEST_FD0=/dev/stdout
251 DEST_FD1=/dev/stdout
252 VERBOSE_OPT="-v -v"
253 ;;
254esac
255
256# Note: on my machine, 'man --path' gives /usr/share/man twice, once with a trailing '/', once without.
257if [ -z "$MAN_DIR" ]; then
258 MAN_DIR=`man --path -C "$MAN_CONF"/man.conf \
259 | sed 's/:/\\n/g' \
260 | while read foo; do dirname "$foo"/.; done \
261 | sort -u \
262 | while read bar; do echo -n "$bar "; done`
263fi
264
265# If no MANPATH in ${MAN_CONF}/man.conf, abort as well
266if [ -z "$MAN_DIR" ]; then
267 echo "No directory specified, and no directory found with \`man --path'"
268 exit 1
269fi
270
271# Fake?
272if [ "$FAKE" != "no" ]; then
273 echo "Actual parameters used:"
274 echo -n "Compression.......: "
275 case $COMP_METHOD in
276 --bzip2|--bz2|-b) echo -n "bzip2";;
277 --gzip|__gz|-g) echo -n "gzip";;
278 --decompress|-d) echo -n "decompressing";;
279 *) echo -n "unknown";;
280 esac
281 echo " ($COMP_METHOD)"
282 echo "Compression level.: $COMP_LVL"
283 echo "Compression suffix: $COMP_SUF"
284 echo -n "Force compression.: "
285 [ "foo$FORCE_OPT" = "foo-F" ] &amp;&amp; echo "yes" || echo "no"
286 echo "man.conf is.......: ${MAN_CONF}/man.conf"
287 echo -n "Hard-links........: "
288 [ "foo$LN_OPT" = "foo-S" ] &amp;&amp; echo "convert to soft-links" || echo "leave as is"
289 echo -n "Soft-links........: "
290 [ "foo$LN_OPT" = "foo-H" ] &amp;&amp; echo "convert to hard-links" || echo "leave as is"
291 echo "Backup............: $BACKUP"
292 echo "Faking (yes!).....: $FAKE"
293 echo "Directories.......: $MAN_DIR"
294 echo "Verbosity level...: $VERBOSE_LVL"
295 exit 0
296fi
297
298# If no method was specified, print help
299if [ -z "${COMP_METHOD}" -a "${BACKUP}" = "no" ]; then
300 help
301 exit 1
302fi
303
304# In backup mode, do the backup solely
305if [ "$BACKUP" = "yes" ]; then
306 for DIR in $MAN_DIR; do
307 cd "${DIR}/.."
308 DIR_NAME=`basename "${DIR}"`
309 echo "Backing up $DIR..." &gt; $DEST_FD0
310 [ -f "${DIR_NAME}.tar.old" ] &amp;&amp; rm -f "${DIR_NAME}.tar.old"
311 [ -f "${DIR_NAME}.tar" ] &amp;&amp; mv "${DIR_NAME}.tar" "${DIR_NAME}.tar.old"
312 tar cfv "${DIR_NAME}.tar" "${DIR_NAME}" &gt; $DEST_FD1
313 done
314 exit 0
315fi
316
317# I know MAN_DIR has only absolute path names
318# I need to take into account the localized man, so I'm going recursive
319for DIR in $MAN_DIR; do
320 MEM_DIR=`pwd`
321 cd "$DIR"
322 for FILE in *; do
323 # Fixes the case were the directory is empty
324 if [ "foo$FILE" = "foo*" ]; then continue; fi
325
326 # Fixes the case when hard-links see their compression scheme change
327 # (from not compressed to compressed, or from bz2 to gz, or from gz to bz2)
328 # Also fixes the case when multiple version of the page are present, which
329 # are either compressed or not.
330 if [ ! -L "$FILE" -a ! -e "$FILE" ]; then continue; fi
331
332 # Do not compress whatis files
333 if [ "$FILE" = "whatis" ]; then continue; fi
334
335 if [ -d "$FILE" ]; then
336 cd "${MEM_DIR}" # Go back to where we ran "$0", in case "$0"=="./compressdoc" ...
337 # We are going recursive to that directory
338 echo "-&gt; Entering ${DIR}/${FILE}..." &gt; $DEST_FD0
339 # I need not pass --conf, as I specify the directory to work on
340 # But I need exit in case of error
341 "$0" ${COMP_METHOD} ${COMP_LVL} ${LN_OPT} ${VERBOSE_OPT} ${FORCE_OPT} "${DIR}/${FILE}" || exit 1
342 echo "&lt;- Leaving ${DIR}/${FILE}." &gt; $DEST_FD1
343 cd "$DIR" # Needed for the next iteration of the loop
344
345 else # !dir
346 if ! check_unique "$DIR" "$FILE"; then continue; fi
347
348 # Check if the file is already compressed with the specified method
349 BASE_FILE=`basename "$FILE" .gz`
350 BASE_FILE=`basename "$FILE" .bz2`
351 if [ "${FILE}" = "${BASE_FILE}${COMP_SUF}" -a "foo${FORCE_OPT}" = "foo" ]; then continue; fi
352
353 # If we have a symlink
354 if [ -h "$FILE" ]; then
355 case "$FILE" in
356 *.bz2)
357 EXT=bz2 ;;
358 *.gz)
359 EXT=gz ;;
360 *)
361 EXT=none ;;
362 esac
363
364 if [ ! "$EXT" = "none" ]; then
365 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 | tr -d " " | sed s/\.$EXT$//`
366 NEWNAME=`echo "$FILE" | sed s/\.$EXT$//`
367 mv "$FILE" "$NEWNAME"
368 FILE="$NEWNAME"
369 else
370 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 | tr -d " "`
371 fi
372
373 if [ "$LN_OPT" = "-H" ]; then
374 # Change this soft-link into a hard- one
375 rm -f "$FILE" &amp;&amp; ln "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
376 chmod --reference "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
377 else
378 # Keep this soft-link a soft- one.
379 rm -f "$FILE" &amp;&amp; ln -s "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
380 fi
381 echo "Relinked $FILE" &gt; $DEST_FD1
382
383 # else if we have a plain file
384 elif [ -f "$FILE" ]; then
385 # Take care of hard-links: build the list of files hard-linked
386 # to the one we are {de,}compressing.
387 # NB. This is not optimum has the file will eventually be compressed
388 # as many times it has hard-links. But for now, that's the safe way.
389 inode=`ls -li "$FILE" | awk '{print $1}'`
390 HLINKS=`find . \! -name "$FILE" -inum $inode`
391
392 if [ -n "$HLINKS" ]; then
393 # We have hard-links! Remove them now.
394 for i in $HLINKS; do rm -f "$i"; done
395 fi
396
397 # Now take care of the file that has no hard-link
398 # We do decompress first to re-compress with the selected
399 # compression ratio later on...
400 case "$FILE" in
401 *.bz2)
402 bunzip2 $FILE
403 FILE=`basename "$FILE" .bz2`
404 ;;
405 *.gz)
406 gunzip $FILE
407 FILE=`basename "$FILE" .gz`
408 ;;
409 esac
410
411 # Compress the file with the given compression ratio, if needed
412 case $COMP_SUF in
413 *bz2)
414 bzip2 ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
415 echo "Compressed $FILE" &gt; $DEST_FD1
416 ;;
417 *gz)
418 gzip ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
419 echo "Compressed $FILE" &gt; $DEST_FD1
420 ;;
421 *)
422 echo "Uncompressed $FILE" &gt; $DEST_FD1
423 ;;
424 esac
425
426 # If the file had hard-links, recreate those (either hard or soft)
427 if [ -n "$HLINKS" ]; then
428 for i in $HLINKS; do
429 NEWFILE=`echo "$i" | sed s/\.gz$// | sed s/\.bz2$//`
430 if [ "$LN_OPT" = "-S" ]; then
431 # Make this hard-link a soft- one
432 ln -s "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
433 else
434 # Keep the hard-link a hard- one
435 ln "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
436 fi
437 chmod 644 "${NEWFILE}$COMP_SUF" # Really work only for hard-links. Harmless for soft-links
438 done
439 fi
440
441 else
442 # There is a problem when we get neither a symlink nor a plain file
443 # Obviously, we shall never ever come here... :-(
444 echo "Whaooo... \"${DIR}/${FILE}\" is neither a symlink nor a plain file. Please check:"
445 ls -l "${DIR}/${FILE}"
446 exit 1
447 fi
448 fi
449 done # for FILE
450done # for DIR
451<command>EOF
452chmod 755 /usr/bin/compressdoc</command></userinput></screen>
453
454<para>Now, as root, you can issue a
455<command>/usr/bin/compressdoc --bz2</command> to compress all your system man
456pages. You can also run <command>/usr/bin/compressdoc --help</command> to get
457a comprehensive help about what the script is able to do.</para>
458
459<para> Don't forget that a few programs, like the <application>X</application>
460Window system, <application>XEmacs</application>, also install their
461documentation in non standard places (such as <filename class="directory">
462/usr/X11R6/man</filename>, etc...). Don't forget to add those locations in the
463file <filename>/etc/man.conf</filename>, as a
464<envar>MANPATH</envar>=<replaceable>/path</replaceable> section.</para>
465<para> Example:</para><screen><userinput>
466 ...
467 MANPATH=/usr/share/man
468 MANPATH=/usr/local/man
469 MANPATH=/usr/X11R6/man
470 MANPATH=/opt/qt/doc/man
471 ...</userinput></screen>
472
473<para>Generally, package installation systems do not compress man/info pages,
474which means you will need to run the script again if you want to keep the size
475of your documentation as small as possible. Also, note that running the script
476after upgrading a package is safe: when you have several versions of a page
477(for example, one compressed and one uncompressed), the most recent one is kept
478and the others deleted.</para>
479
480</sect1>
481
Note: See TracBrowser for help on using the repository browser.