source: postlfs/config/compressdoc.xml@ 1dad4a4

10.0 10.1 11.0 11.1 11.2 11.3 12.0 12.1 6.0 6.1 6.2 6.2.0 6.2.0-rc1 6.2.0-rc2 6.3 6.3-rc1 6.3-rc2 6.3-rc3 7.10 7.4 7.5 7.6 7.6-blfs 7.6-systemd 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 9.0 9.1 basic bdubbs/svn elogind gnome kde5-13430 kde5-14269 kde5-14686 kea ken/TL2024 ken/inkscape-core-mods ken/tuningfonts krejzi/svn lazarus lxqt nosym perl-modules plabs/newcss plabs/python-mods python3.11 qt5new rahul/power-profiles-daemon renodr/vulkan-addition systemd-11177 systemd-13485 trunk upgradedb v5_1 xry111/intltool xry111/llvm18 xry111/soup3 xry111/test-20220226 xry111/xf86-video-removal
Last change on this file since 1dad4a4 was 1dad4a4, checked in by Igor Živković <igor@…>, 20 years ago

mdash entity fixes

git-svn-id: svn://svn.linuxfromscratch.org/BLFS/trunk/BOOK@2237 af4574ff-66df-0310-9fd7-8a98e5e911e0

  • Property mode set to 100644
File size: 16.4 KB
Line 
1<sect1 id="postlfs-config-compressdoc" xreflabel="compressdoc">
2<?dbhtml filename="compressdoc.html"?>
3<title>Compressing man and info pages</title>
4
5<para>Man and info reader programs can transparently process gzip'ed or
6bzip2'ed pages, a feature you can use to free some disk space while keeping
7your documentation available. However, things are not that simple: man
8directories tend to contain links - hard and symbolic - which defeat simple
9ideas like recursively calling <command>gzip</command> on them. A better way
10to go is to use the script below.
11</para>
12
13<screen><userinput><command>cat &gt; /usr/sbin/compressdoc &lt;&lt; "EOF"</command>
14#!/bin/bash
15# VERSION: 20040320.0026
16#
17# Compress (with bzip2 or gzip) all man pages in a hierarchy and
18# update symlinks - By Marc Heerdink &lt;marc @ koelkast.net&gt;
19# Modified to be able to gzip or bzip2 files as an option and to deal
20# with all symlinks properly by Mark Hymers &lt;markh @ linuxfromscratch.org&gt;
21#
22# Modified 20030930 by Yann E. Morin &lt;yann.morin.1998 @ anciens.enib.fr&gt;
23# to accept compression/decompression, to correctly handle hard-links,
24# to allow for changing hard-links into soft- ones, to specify the
25# compression level, to parse the man.conf for all occurrences of MANPATH,
26# to allow for a backup, to allow to keep the newest version of a page.
27# Modified 20040330 by Tushar Teredesai to replace $0 by the name of the script.
28# (Note: It is assumed that the script is in the user's PATH)
29#
30# TODO:
31# - choose a default compress method to be based on the available
32# tool : gzip or bzip2;
33# - offer an option to automagically choose the best compression method
34# on a per page basis (eg. check which ofgzip/bzip2/whatever is the
35# most effective, page per page);
36# - when a MANPATH env var exists, use this instead of /etc/man.conf
37# (useful for users to (de)compress their man pages;
38# - offer an option to restore a previous backup;
39# - add other compression engines (compress, zip, etc?). Needed?
40
41# Funny enough, this function prints some help.
42function help ()
43{
44 if [ -n "$1" ]; then
45 echo "Unknown option : $1"
46 fi
47 ( echo "Usage: $MY_NAME &lt;comp_method&gt; [options] [dirs]" &amp;&amp; \
48 cat &lt;&lt; EOT
49Where comp_method is one of :
50 --gzip, --gz, -g
51 --bzip2, --bz2, -b
52 Compress using gzip or bzip2.
53
54 --decompress, -d
55 Decompress the man pages.
56
57 --backup Specify a .tar backup shall be done for every directories.
58 In case a backup already exists, it is saved as .tar.old prior
59 to making the new backup. If an .tar.old backup exist, it is
60 removed prior to saving the backup.
61 In backup mode, no other action is performed.
62
63And where options are :
64 -1 to -9, --fast, --best
65 The compression level, as accepted by gzip and bzip2. When not
66 specified, uses the default compression level for the given
67 method (-6 for gzip, and -9 for bzip2). Not used when in backup
68 or decompress modes.
69
70 --force, -F Force (re-)compression, even if the previous one was the same
71 method. Useful when changing the compression ratio. By default,
72 a page will not be re-compressed if it ends with the same suffix
73 as the method adds (.bz2 for bzip2, .gz for gzip).
74
75 --soft, -S Change hard-links into soft-links. Use with _caution_ as the
76 first encountered file will be used as a reference. Not used
77 when in backup mode.
78
79 --hard, -H Change soft-links into hard-links. Not used when in backup mode.
80
81 --conf=dir, --conf dir
82 Specify the location of man.conf. Defaults to /etc.
83
84 --verbose, -v Verbose mode, print the name of the directory being processed.
85 Double the flag to turn it even more verbose, and to print the
86 name of the file being processed.
87
88 --fake, -f Fakes it. Print the actual parameters compman will use.
89
90 dirs A list of space-separated _absolute_ pathname to the man
91 directories.
92 When empty, and only then, parse ${MAN_CONF}/man.conf for all
93 occurrences of MANPATH.
94
95Note about compression
96 There has been a discussion on blfs-support about compression ratios of
97 both gzip and bzip2 on man pages, taking into account the hosting fs,
98 the architecture, etc... On the overall, the conclusion was that gzip
99 was much efficient on 'small' files, and bzip2 on 'big' files, small and
100 big being very dependent on the content of the files.
101
102 See the original post from Mickael A. Peters, titled "Bootable Utility CD",
103 and dated 20030409.1816(+0200), and subsequent posts:
104 http://linuxfromscratch.org/pipermail/blfs-support/2003-April/038817.html
105
106 On my system (x86, ext3), man pages were 35564kiB before compression. gzip -9
107 compressed them down to 20372kiB (57.28%), bzip2 -9 got down to 19812kiB
108 (55.71%). That is a 1.57% gain in space. YMMV.
109
110 What was not taken into consideration was the decompression speed. But does
111 it make sense to? You gain fast access with uncompressed man pages, or you
112 gain space at the expense of a slight overhead in time. Well, my P4-2.5GHz
113 does not even let me notice this... :-)
114EOT
115) | less
116}
117
118# This function checks that the man page is unique amongst bzip2'd, gzip'd and
119# uncompressed versions.
120# $1 the directory in which the file resides
121# $2 the file name for the man page
122# Returns 0 (true) if the file is the latest and must be taken care of, and 1
123# (false) if the file is not the latest (and has therefore been deleted).
124function check_unique ()
125{
126 # NB. When there are hard-links to this file, these are
127 # _not_ deleted. In fact, if there are hard-links, they
128 # all have the same date/time, thus making them ready
129 # for deletion later on.
130
131 # Build the list of all man pages with the same name
132 DIR=$1
133 BASENAME=`basename "${2}" .bz2`
134 BASENAME=`basename "${BASENAME}" .gz`
135 GZ_FILE="$BASENAME".gz
136 BZ_FILE="$BASENAME".bz2
137
138 # Look for, and keep, the most recent one
139 LATEST=`(cd "$DIR"; ls -1rt "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}" 2&gt;/dev/null | tail -n 1)`
140 for i in "${BASENAME}" "${GZ_FILE}" "${BZ_FILE}"; do
141 [ "$LATEST" != "$i" ] &amp;&amp; rm -f "$DIR"/"$i"
142 done
143
144 # In case the specified file was the latest, return 0
145 [ "$LATEST" = "$2" ] &amp;&amp; return 0
146 # If the file was not the latest, return 1
147 return 1
148}
149
150# Name of the script
151MY_NAME=`basename $0`
152
153# OK, parse the command-line for arguments, and initialize to some sensible
154# state, that is : don't change links state, parse /etc/man.conf, be most
155# silent, search man.conf in /etc, and don't force (re-)compression.
156COMP_METHOD=
157COMP_SUF=
158COMP_LVL=
159FORCE_OPT=
160LN_OPT=
161MAN_DIR=
162VERBOSE_LVL=0
163BACKUP=no
164FAKE=no
165MAN_CONF=/etc
166while [ -n "$1" ]; do
167 case $1 in
168 --gzip|--gz|-g)
169 COMP_SUF=.gz
170 COMP_METHOD=$1
171 shift
172 ;;
173 --bzip2|--bz2|-b)
174 COMP_SUF=.bz2
175 COMP_METHOD=$1
176 shift
177 ;;
178 --decompress|-d)
179 COMP_SUF=
180 COMP_LVL=
181 COMP_METHOD=$1
182 shift
183 ;;
184 -[1-9]|--fast|--best)
185 COMP_LVL=$1
186 shift
187 ;;
188 --force|-F)
189 FORCE_OPT=-F
190 shift
191 ;;
192 --soft|-S)
193 LN_OPT=-S
194 shift
195 ;;
196 --hard|-H)
197 LN_OPT=-H
198 shift
199 ;;
200 --conf=*)
201 MAN_CONF=`echo $1 | cut -d '=' -f2-`
202 shift
203 ;;
204 --conf)
205 MAN_CONF="$2"
206 shift 2
207 ;;
208 --verbose|-v)
209 let VERBOSE_LVL++
210 shift
211 ;;
212 --backup)
213 BACKUP=yes
214 shift
215 ;;
216 --fake|-f)
217 FAKE=yes
218 shift
219 ;;
220 --help|-h)
221 help
222 exit 0
223 ;;
224 /*)
225 MAN_DIR="${MAN_DIR} ${1}"
226 shift
227 ;;
228 -*)
229 help $1
230 exit 1
231 ;;
232 *)
233 echo "\"$1\" is not an absolute path name"
234 exit 1
235 ;;
236 esac
237done
238
239# Redirections
240case $VERBOSE_LVL in
241 0)
242 # O, be silent
243 DEST_FD0=/dev/null
244 DEST_FD1=/dev/null
245 VERBOSE_OPT=
246 ;;
247 1)
248 # 1, be a bit verbose
249 DEST_FD0=/dev/stdout
250 DEST_FD1=/dev/null
251 VERBOSE_OPT=-v
252 ;;
253 *)
254 # 2 and above, be most verbose
255 DEST_FD0=/dev/stdout
256 DEST_FD1=/dev/stdout
257 VERBOSE_OPT="-v -v"
258 ;;
259esac
260
261# Note: on my machine, 'man --path' gives /usr/share/man twice, once with a trailing '/', once without.
262if [ -z "$MAN_DIR" ]; then
263 MAN_DIR=`man --path -C "$MAN_CONF"/man.conf \
264 | sed 's/:/\\n/g' \
265 | while read foo; do dirname "$foo"/.; done \
266 | sort -u \
267 | while read bar; do echo -n "$bar "; done`
268fi
269
270# If no MANPATH in ${MAN_CONF}/man.conf, abort as well
271if [ -z "$MAN_DIR" ]; then
272 echo "No directory specified, and no directory found with \`man --path'"
273 exit 1
274fi
275
276# Fake?
277if [ "$FAKE" != "no" ]; then
278 echo "Actual parameters used:"
279 echo -n "Compression.......: "
280 case $COMP_METHOD in
281 --bzip2|--bz2|-b) echo -n "bzip2";;
282 --gzip|__gz|-g) echo -n "gzip";;
283 --decompress|-d) echo -n "decompressing";;
284 *) echo -n "unknown";;
285 esac
286 echo " ($COMP_METHOD)"
287 echo "Compression level.: $COMP_LVL"
288 echo "Compression suffix: $COMP_SUF"
289 echo -n "Force compression.: "
290 [ "foo$FORCE_OPT" = "foo-F" ] &amp;&amp; echo "yes" || echo "no"
291 echo "man.conf is.......: ${MAN_CONF}/man.conf"
292 echo -n "Hard-links........: "
293 [ "foo$LN_OPT" = "foo-S" ] &amp;&amp; echo "convert to soft-links" || echo "leave as is"
294 echo -n "Soft-links........: "
295 [ "foo$LN_OPT" = "foo-H" ] &amp;&amp; echo "convert to hard-links" || echo "leave as is"
296 echo "Backup............: $BACKUP"
297 echo "Faking (yes!).....: $FAKE"
298 echo "Directories.......: $MAN_DIR"
299 echo "Verbosity level...: $VERBOSE_LVL"
300 exit 0
301fi
302
303# If no method was specified, print help
304if [ -z "${COMP_METHOD}" -a "${BACKUP}" = "no" ]; then
305 help
306 exit 1
307fi
308
309# In backup mode, do the backup solely
310if [ "$BACKUP" = "yes" ]; then
311 for DIR in $MAN_DIR; do
312 cd "${DIR}/.."
313 DIR_NAME=`basename "${DIR}"`
314 echo "Backing up $DIR..." &gt; $DEST_FD0
315 [ -f "${DIR_NAME}.tar.old" ] &amp;&amp; rm -f "${DIR_NAME}.tar.old"
316 [ -f "${DIR_NAME}.tar" ] &amp;&amp; mv "${DIR_NAME}.tar" "${DIR_NAME}.tar.old"
317 tar cfv "${DIR_NAME}.tar" "${DIR_NAME}" &gt; $DEST_FD1
318 done
319 exit 0
320fi
321
322# I know MAN_DIR has only absolute path names
323# I need to take into account the localized man, so I'm going recursive
324for DIR in $MAN_DIR; do
325 MEM_DIR=`pwd`
326 cd "$DIR"
327 for FILE in *; do
328 # Fixes the case were the directory is empty
329 if [ "foo$FILE" = "foo*" ]; then continue; fi
330
331 # Fixes the case when hard-links see their compression scheme change
332 # (from not compressed to compressed, or from bz2 to gz, or from gz to bz2)
333 # Also fixes the case when multiple version of the page are present, which
334 # are either compressed or not.
335 if [ ! -L "$FILE" -a ! -e "$FILE" ]; then continue; fi
336
337 # Do not compress whatis files
338 if [ "$FILE" = "whatis" ]; then continue; fi
339
340 if [ -d "$FILE" ]; then
341 cd "${MEM_DIR}" # Go back to where we ran "$0", in case "$0"=="./compressdoc" ...
342 # We are going recursive to that directory
343 echo "-&gt; Entering ${DIR}/${FILE}..." &gt; $DEST_FD0
344 # I need not pass --conf, as I specify the directory to work on
345 # But I need exit in case of error
346 "$MY_NAME" ${COMP_METHOD} ${COMP_LVL} ${LN_OPT} ${VERBOSE_OPT} ${FORCE_OPT} "${DIR}/${FILE}" || exit 1
347 echo "&lt;- Leaving ${DIR}/${FILE}." &gt; $DEST_FD1
348 cd "$DIR" # Needed for the next iteration of the loop
349
350 else # !dir
351 if ! check_unique "$DIR" "$FILE"; then continue; fi
352
353 # Check if the file is already compressed with the specified method
354 BASE_FILE=`basename "$FILE" .gz`
355 BASE_FILE=`basename "$BASE_FILE" .bz2`
356 if [ "${FILE}" = "${BASE_FILE}${COMP_SUF}" -a "foo${FORCE_OPT}" = "foo" ]; then continue; fi
357
358 # If we have a symlink
359 if [ -h "$FILE" ]; then
360 case "$FILE" in
361 *.bz2)
362 EXT=bz2 ;;
363 *.gz)
364 EXT=gz ;;
365 *)
366 EXT=none ;;
367 esac
368
369 if [ ! "$EXT" = "none" ]; then
370 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 | tr -d " " | sed s/\.$EXT$//`
371 NEWNAME=`echo "$FILE" | sed s/\.$EXT$//`
372 mv "$FILE" "$NEWNAME"
373 FILE="$NEWNAME"
374 else
375 LINK=`ls -l "$FILE" | cut -d "&gt;" -f2 | tr -d " "`
376 fi
377
378 if [ "$LN_OPT" = "-H" ]; then
379 # Change this soft-link into a hard- one
380 rm -f "$FILE" &amp;&amp; ln "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
381 chmod --reference "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
382 else
383 # Keep this soft-link a soft- one.
384 rm -f "$FILE" &amp;&amp; ln -s "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
385 fi
386 echo "Relinked $FILE" &gt; $DEST_FD1
387
388 # else if we have a plain file
389 elif [ -f "$FILE" ]; then
390 # Take care of hard-links: build the list of files hard-linked
391 # to the one we are {de,}compressing.
392 # NB. This is not optimum has the file will eventually be compressed
393 # as many times it has hard-links. But for now, that's the safe way.
394 inode=`ls -li "$FILE" | awk '{print $1}'`
395 HLINKS=`find . \! -name "$FILE" -inum $inode`
396
397 if [ -n "$HLINKS" ]; then
398 # We have hard-links! Remove them now.
399 for i in $HLINKS; do rm -f "$i"; done
400 fi
401
402 # Now take care of the file that has no hard-link
403 # We do decompress first to re-compress with the selected
404 # compression ratio later on...
405 case "$FILE" in
406 *.bz2)
407 bunzip2 $FILE
408 FILE=`basename "$FILE" .bz2`
409 ;;
410 *.gz)
411 gunzip $FILE
412 FILE=`basename "$FILE" .gz`
413 ;;
414 esac
415
416 # Compress the file with the given compression ratio, if needed
417 case $COMP_SUF in
418 *bz2)
419 bzip2 ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
420 echo "Compressed $FILE" &gt; $DEST_FD1
421 ;;
422 *gz)
423 gzip ${COMP_LVL} "$FILE" &amp;&amp; chmod 644 "${FILE}${COMP_SUF}"
424 echo "Compressed $FILE" &gt; $DEST_FD1
425 ;;
426 *)
427 echo "Uncompressed $FILE" &gt; $DEST_FD1
428 ;;
429 esac
430
431 # If the file had hard-links, recreate those (either hard or soft)
432 if [ -n "$HLINKS" ]; then
433 for i in $HLINKS; do
434 NEWFILE=`echo "$i" | sed s/\.gz$// | sed s/\.bz2$//`
435 if [ "$LN_OPT" = "-S" ]; then
436 # Make this hard-link a soft- one
437 ln -s "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
438 else
439 # Keep the hard-link a hard- one
440 ln "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
441 fi
442 chmod 644 "${NEWFILE}$COMP_SUF" # Really work only for hard-links. Harmless for soft-links
443 done
444 fi
445
446 else
447 # There is a problem when we get neither a symlink nor a plain file
448 # Obviously, we shall never ever come here... :-(
449 echo "Whaooo... \"${DIR}/${FILE}\" is neither a symlink nor a plain file. Please check:"
450 ls -l "${DIR}/${FILE}"
451 exit 1
452 fi
453 fi
454 done # for FILE
455done # for DIR
456<command>EOF
457chmod 755 /usr/sbin/compressdoc</command></userinput></screen>
458
459<para>Now, as root, you can issue a
460<command>compressdoc --bz2</command> to compress all your system man
461pages. You can also run <command>compressdoc --help</command> to get
462comprehensive help about what the script is able to do.</para>
463
464<para> Don't forget that a few programs, like the <application>X</application>
465Window system and <application>XEmacs</application> also install their
466documentation in non standard places (such as <filename class="directory">
467/usr/X11R6/man</filename>, etc...). Be sure to add these locations to the
468file <filename>/etc/man.conf</filename>, as a
469<envar>MANPATH</envar>=<replaceable>/path</replaceable> section.</para>
470<para> Example:</para><screen><userinput>
471 ...
472 MANPATH=/usr/share/man
473 MANPATH=/usr/local/man
474 MANPATH=/usr/X11R6/man
475 MANPATH=/opt/qt/doc/man
476 ...</userinput></screen>
477
478<para>Generally, package installation systems do not compress man/info pages,
479which means you will need to run the script again if you want to keep the size
480of your documentation as small as possible. Also, note that running the script
481after upgrading a package is safe: when you have several versions of a page
482(for example, one compressed and one uncompressed), the most recent one is kept
483and the others deleted.</para>
484
485</sect1>
486
Note: See TracBrowser for help on using the repository browser.