1 | <sect1 id="postlfs-config-compressdoc" xreflabel="compressdoc">
|
---|
2 | <?dbhtml filename="compressdoc.html" dir="postlfs"?>
|
---|
3 | <title>Compressing man and info pages</title>
|
---|
4 |
|
---|
5 | <para>Man and info reader programs can transparently process gzip'ed or
|
---|
6 | bzip2'ed pages, a feature you can use to free some disk space while keeping
|
---|
7 | your documentation available. However, things are not that simple: man
|
---|
8 | directories tend to contain links - hard and symbolic - which defeat simple
|
---|
9 | ideas like recursively calling <command>gzip</command> on them. A better way
|
---|
10 | to go is to use the script below.
|
---|
11 | </para>
|
---|
12 |
|
---|
13 | <screen><userinput><command>cat > /usr/bin/compressdoc << "EOF"</command>
|
---|
14 |
|
---|
15 | #!/bin/bash
|
---|
16 | #
|
---|
17 | # Compress (with bzip2 or gzip) all man pages in a hierarchy and
|
---|
18 | # update symlinks - By Marc Heerdink <marc@koelkast.net>.
|
---|
19 | # Modified to be able to gzip or bzip2 files as an option and to deal
|
---|
20 | # with all symlinks properly by Mark Hymers # <markh@linuxfromscratch.org>
|
---|
21 | #
|
---|
22 | # Modified 20030925 by Yann E. Morin <yann.morin.1998 @ # anciens.enib.fr>
|
---|
23 | # to accept compression/decompression, to correctly handle hard-links,
|
---|
24 | # to allow for changing hard-links into soft- ones, to specify the
|
---|
25 | # compression level, to parse the man.conf for all occurences of MANPATH,
|
---|
26 | # to allow for a backup, to allow to keep the newest version of a page.
|
---|
27 | #
|
---|
28 | # TODO:
|
---|
29 | # - inverse the quiet option into a verbose one, so as to be silent
|
---|
30 | # by default;
|
---|
31 | # - choose a default compress method to be based on the available
|
---|
32 | # tool : gzip or bzip2;
|
---|
33 | # - when a MANPATH env var exists, use this instead of /etc/man.conf
|
---|
34 | # (usefull for users to (de)compress their man pages;
|
---|
35 | # - offer an option to restore a previous backup;
|
---|
36 | # - add other compression engines (compress, zip, etc?). Needed?
|
---|
37 |
|
---|
38 | # Funny enough, this function prints some help.
|
---|
39 | function help ()
|
---|
40 | {
|
---|
41 | if [ -n "$1" ]; then
|
---|
42 | echo "Unknown option : $1"
|
---|
43 | fi
|
---|
44 | echo "Usage: $0 <comp_method> [options] [dirs]"
|
---|
45 | cat << EOT
|
---|
46 | Where comp_method is one of :
|
---|
47 |
|
---|
48 | --gzip, --gz, -g
|
---|
49 | --bzip2, --bz2, -b
|
---|
50 | Compress using gzip or bzip2.
|
---|
51 |
|
---|
52 | --decompress, -d
|
---|
53 | Decompress the man pages.
|
---|
54 |
|
---|
55 | --backup Specify a .tar backup shall be done for every directories.
|
---|
56 | In case a backup already exists, it is saved as .tar.old prior
|
---|
57 | to making the new backup. If an .tar.old backup exist, it is
|
---|
58 | removed prior to saving the backup.
|
---|
59 | In backup mode, no other action is performed.
|
---|
60 |
|
---|
61 | And where options are :
|
---|
62 |
|
---|
63 | -1 to -9, --fast, --best
|
---|
64 | The compression level, as accepted by gzip and bzip2. When not
|
---|
65 | specified, uses the default compression level for the given
|
---|
66 | method (-6 for gzip, and -9 for bzip2). Not used when in backup
|
---|
67 | or decompress modes.
|
---|
68 |
|
---|
69 | -s Change hard-links into soft-links. Use with _caution_ as the
|
---|
70 | first encountered file will be used as a reference. Not used
|
---|
71 | when in backup mode.
|
---|
72 |
|
---|
73 | --conf=dir, --conf dir
|
---|
74 | Specify the location of man.conf. Defaults to /etc.
|
---|
75 |
|
---|
76 | --quiet, -q Quiet mode, only print the name of the directory being
|
---|
77 | processed. Add another -q flag to turn it absolutely silent.
|
---|
78 |
|
---|
79 | --fake, -f Fakes it. Print the actual parameters compman will use.
|
---|
80 |
|
---|
81 | dirs A list of space-separated _absolute_ pathname to the man
|
---|
82 | directories.
|
---|
83 | When empty, and only then, parse ${MAN_CONF}/man.conf for all
|
---|
84 | occurences of MANPATH.
|
---|
85 |
|
---|
86 | Note about compression
|
---|
87 | There has been a discussion on blfs-support about compression ratios of
|
---|
88 | both gzip and bzip2 on man pages, taking into account the hosting fs,
|
---|
89 | the architecture, etc... On the overall, the conclusion was that gzip
|
---|
90 | was much efficient on 'small' files, and bzip2 on 'big' files, small and
|
---|
91 | big being very dependent on the content of the files.
|
---|
92 |
|
---|
93 | See the original thread begining at :
|
---|
94 | http://archive.linuxfromscratch.org/mail-archives/blfs-support/2003/04/0424.html
|
---|
95 |
|
---|
96 | On my system (x86, ext3), man pages were 35564kiB before compression. gzip -9
|
---|
97 | compressed them down to 20372kiB (57.28%), bzip2 -9 got down to 19812kiB
|
---|
98 | (55.71%). That is a 1.57% gain in space. YMMV.
|
---|
99 |
|
---|
100 | What was not taken into consideration was the decompression speed. But does
|
---|
101 | it make sense to? You gain fast access with uncompressed man pages, or you
|
---|
102 | gain space at the expense of a slight overhead in time. Well, my P4-2.5GHz
|
---|
103 | does not even let me notice this... :-)
|
---|
104 | EOT
|
---|
105 | }
|
---|
106 |
|
---|
107 | # This function checks that the path is absolute
|
---|
108 | # $1 : the path to check
|
---|
109 | # $2 : path to man.conf if $1 was extracted from it
|
---|
110 | function check_path ()
|
---|
111 | {
|
---|
112 | echo checking path $1
|
---|
113 | if [ -n "`echo $1 | cut -d '/' -f1`" ]; then
|
---|
114 | echo "Path \"$1\" is not absolute."
|
---|
115 | [ -n "$2" ] && echo "Check your $2"
|
---|
116 | exit 1
|
---|
117 | fi
|
---|
118 | }
|
---|
119 |
|
---|
120 | # This function checks that the man page is unique amongst bzip2'd, gzip'd and
|
---|
121 | # the uncompressed versions.
|
---|
122 | # $1 the directory in which the file resides
|
---|
123 | # $2 the file name for the man page
|
---|
124 | function check_unique ()
|
---|
125 | {
|
---|
126 | # NB. When there are hardlink to this file, these are
|
---|
127 | # _not_ deleted. In fact, if there are hardlinks, they
|
---|
128 | # all have the same date/time, thus making them ready
|
---|
129 | # for deletion later on.
|
---|
130 |
|
---|
131 | # Build the list of all man page with the same name
|
---|
132 | BASENAME=`basename "${2}" .bz2`
|
---|
133 | BASENAME=`basename "${BASENAME}" .gz`
|
---|
134 | LIST=
|
---|
135 | [ -f "$DIR"/"${BASENAME}" ] && LIST="${LIST} ${BASENAME}"
|
---|
136 | [ -f "$DIR"/"${BASENAME}".gz ] && LIST="${LIST} ${BASENAME}.gz"
|
---|
137 | [ -f "$DIR"/"${BASENAME}".bz2 ] && LIST="${LIST} ${BASENAME}.bz2"
|
---|
138 |
|
---|
139 | # Look for, and keep, the most recent one
|
---|
140 | LATEST=`(cd "$DIR"; ls -1rt $LIST)`
|
---|
141 | for i in $LIST; do
|
---|
142 | [ "$LATEST" != "$i" ] && rm -f "$i"
|
---|
143 | done
|
---|
144 |
|
---|
145 | # In case the specified file was the latest, return 0
|
---|
146 | [ "$LATEST" = "$1" ] && return 0
|
---|
147 | # If the file was not the latest, return 1
|
---|
148 | return 1
|
---|
149 | }
|
---|
150 |
|
---|
151 | # OK, parse the command line for arguments, and initialize to some sensible
|
---|
152 | # state, that is keep hardlinks, parse /etc/man.conf, be most verbose, and
|
---|
153 | # search man.conf in /etc
|
---|
154 | COMP_METHOD=
|
---|
155 | COMP_SUF=
|
---|
156 | COMP_LVL=
|
---|
157 | LN_OPT=
|
---|
158 | MAN_DIR=
|
---|
159 | QUIET_OPT=
|
---|
160 | QUIET_LVL=0
|
---|
161 | BACKUP=no
|
---|
162 | FAKE=no
|
---|
163 | MAN_CONF=/etc
|
---|
164 | while [ -n "$1" ]; do
|
---|
165 | case $1 in
|
---|
166 | --gzip|--gz|-g)
|
---|
167 | COMP_SUF=.gz
|
---|
168 | COMP_METHOD=$1
|
---|
169 | shift
|
---|
170 | ;;
|
---|
171 | --bzip2|--bz2|-b)
|
---|
172 | COMP_SUF=.bz2
|
---|
173 | COMP_METHOD=$1
|
---|
174 | shift
|
---|
175 | ;;
|
---|
176 | --decompress|-d)
|
---|
177 | COMP_SUF=
|
---|
178 | COMP_LVL=
|
---|
179 | COMP_METHOD=$1
|
---|
180 | shift
|
---|
181 | ;;
|
---|
182 | -[1-9]|--fast|--best)
|
---|
183 | COMP_LVL=$1
|
---|
184 | shift
|
---|
185 | ;;
|
---|
186 | --soft|-s)
|
---|
187 | LN_OPT=-s
|
---|
188 | shift
|
---|
189 | ;;
|
---|
190 | --conf=*)
|
---|
191 | MAN_CONF=`echo $1 | cut -d '=' -f2-`
|
---|
192 | shift
|
---|
193 | ;;
|
---|
194 | --conf)
|
---|
195 | MAN_CONF="$2"
|
---|
196 | shift 2
|
---|
197 | ;;
|
---|
198 | --quiet|-q)
|
---|
199 | let QUIET_LVL++
|
---|
200 | QUIET_OPT="$QUIET_OPT -q"
|
---|
201 | shift
|
---|
202 | ;;
|
---|
203 | --backup)
|
---|
204 | BACKUP=yes
|
---|
205 | shift
|
---|
206 | ;;
|
---|
207 | --fake|-f)
|
---|
208 | FAKE=yes
|
---|
209 | shift
|
---|
210 | ;;
|
---|
211 | --help|-h)
|
---|
212 | help
|
---|
213 | exit 0
|
---|
214 | ;;
|
---|
215 | /*)
|
---|
216 | MAN_DIR="${MAN_DIR} ${1}"
|
---|
217 | shift
|
---|
218 | ;;
|
---|
219 | -*)
|
---|
220 | help $1
|
---|
221 | exit 1
|
---|
222 | ;;
|
---|
223 | *)
|
---|
224 | check_path $1
|
---|
225 | # We shall never return in that case! None the less, do exit
|
---|
226 | exit 1
|
---|
227 | ;;
|
---|
228 | esac
|
---|
229 | done
|
---|
230 |
|
---|
231 | # Redirections
|
---|
232 | case $QUIET_LVL in
|
---|
233 | 0)
|
---|
234 | DEST_FD0=/dev/stdout
|
---|
235 | DEST_FD1=/dev/stdout
|
---|
236 | ;;
|
---|
237 | 1)
|
---|
238 | DEST_FD0=/dev/stdout
|
---|
239 | DEST_FD1=/dev/null
|
---|
240 | ;;
|
---|
241 | *)
|
---|
242 | #2 and above, be silent
|
---|
243 | DEST_FD0=/dev/null
|
---|
244 | DEST_FD1=/dev/null
|
---|
245 | ;;
|
---|
246 | esac
|
---|
247 |
|
---|
248 | # Note: on my machine, 'man --path' gives /usr/share/man twice, once with a trailing '/', once without.
|
---|
249 | if [ -z "$MAN_DIR" ]; then
|
---|
250 | MAN_DIR=`man --path -C "$MAN_CONF"/man.conf \
|
---|
251 | | sed 's/:/\\n/g' \
|
---|
252 | | while read foo; do dirname "$foo"/.; done \
|
---|
253 | | sort -u \
|
---|
254 | | while read bar; do echo -n "$bar "; done`
|
---|
255 | fi
|
---|
256 |
|
---|
257 | # If no MANPATH in ${MAN_CONF}/man.conf, abort as well
|
---|
258 | if [ -z "$MAN_DIR" ]; then
|
---|
259 | echo "No directory specified, and no directory found in \"${MAN_CONF}/man.conf\""
|
---|
260 | exit 1
|
---|
261 | fi
|
---|
262 |
|
---|
263 | # Fake?
|
---|
264 | if [ "$FAKE" != "no" ]; then
|
---|
265 | echo "Actual parameters used:"
|
---|
266 | echo -n "Compression.......: "
|
---|
267 | case $COMP_METHOD in
|
---|
268 | --bzip2|--bz2|-b) echo -n "bzip2";;
|
---|
269 | --gzip|__gz|-g) echo -n "gzip";;
|
---|
270 | --decompress|-d) echo -n "decompressing";;
|
---|
271 | *) echo -n "unknown";;
|
---|
272 | esac
|
---|
273 | echo " ($COMP_METHOD)"
|
---|
274 | echo "Compression level.: $COMP_LVL"
|
---|
275 | echo "Compression suffix: $COMP_SUF"
|
---|
276 | echo "man.conf is.......: ${MAN_CONF}/man.conf ($MAN_CONF)"
|
---|
277 | echo -n "Hard links........: "
|
---|
278 | [ "$LN_OPT" = "-s" -o "$LN_OPT" = "--soft" ] && echo -n "Convert to symlinks" || echo -n "Keep hardlinks"
|
---|
279 | echo " ($LN_OPT)"
|
---|
280 | echo "Backup............: $BACKUP"
|
---|
281 | echo "Faking (yes!).....: $FAKE"
|
---|
282 | echo "Directories.......: $MAN_DIR"
|
---|
283 | echo "Silence level.....: $QUIET_LVL ($QUIET_OPT)"
|
---|
284 | exit 0
|
---|
285 | fi
|
---|
286 |
|
---|
287 | # If no method was specified, print help
|
---|
288 | if [ -z "${COMP_METHOD}" -a "${BACKUP}" = "no" ]; then
|
---|
289 | help
|
---|
290 | exit 1
|
---|
291 | fi
|
---|
292 |
|
---|
293 | # In backup mode, do the backup sollely
|
---|
294 | if [ "$BACKUP" = "yes" ]; then
|
---|
295 | for DIR in $MAN_DIR; do
|
---|
296 | cd "${DIR}/.."
|
---|
297 | DIR_NAME=`basename "${DIR}"`
|
---|
298 | echo "Backing up $DIR..." > $DEST_FD0
|
---|
299 | [ -f "${DIR_NAME}.tar.old" ] && rm -f "${DIR_NAME}.tar.old"
|
---|
300 | [ -f "${DIR_NAME}.tar" ] && mv "${DIR_NAME}.tar" "${DIR_NAME}.tar.old"
|
---|
301 | tar cfv "${DIR_NAME}.tar" "${DIR_NAME}" > $DEST_FD1
|
---|
302 | done
|
---|
303 | exit 0
|
---|
304 | fi
|
---|
305 |
|
---|
306 | # I know MAN_DIR has only absolute path names
|
---|
307 | # I need to take into account the localized man, so I'm going recursive
|
---|
308 | for DIR in $MAN_DIR; do
|
---|
309 | cd "$DIR"
|
---|
310 | for FILE in *; do
|
---|
311 | if [ "foo$FILE" = "foo*" ]; then continue; fi
|
---|
312 | if [ -d "$FILE" ]; then
|
---|
313 | # We are going recursive to that directory
|
---|
314 | echo "-> Entering ${DIR}/${FILE}..." > $DEST_FD0
|
---|
315 | # I need not pass --conf, as I specify the directory to work on
|
---|
316 | # But I need exit in case of error
|
---|
317 | "$0" ${COMP_METHOD} ${COMP_LVL} ${LN_OPT} ${QUIET_OPT} "${DIR}/${FILE}" || exit 1
|
---|
318 | echo "<- Leaving ${DIR}/${FILE}." > $DEST_FD1
|
---|
319 | else # !dir
|
---|
320 | if check_unique "$DIR" "$FILE"; then continue; fi
|
---|
321 |
|
---|
322 | # If we have a symlink
|
---|
323 | if [ -h "$FILE" ]; then
|
---|
324 | case $FILE in
|
---|
325 | *.bz2)
|
---|
326 | EXT=bz2 ;;
|
---|
327 | *.gz)
|
---|
328 | EXT=gz ;;
|
---|
329 | *)
|
---|
330 | EXT=none ;;
|
---|
331 | esac
|
---|
332 |
|
---|
333 | if [ "$EXT" != "none" ]; then
|
---|
334 | LINK=`ls -l $FILE | cut -d ">" -f2 | tr -d " " | sed s/\.$EXT$//`
|
---|
335 | NEWNAME=`echo "$FILE" | sed s/\.$EXT$//`
|
---|
336 | mv "$FILE" "$NEWNAME"
|
---|
337 | FILE="$NEWNAME"
|
---|
338 | else
|
---|
339 | LINK=`ls -l $FILE | cut -d ">" -f2 | tr -d " "`
|
---|
340 | fi
|
---|
341 |
|
---|
342 | rm -f "$FILE" && ln -s "${LINK}$COMP_SUF" "${FILE}$COMP_SUF"
|
---|
343 | echo "Relinked $FILE" > $DEST_FD1
|
---|
344 |
|
---|
345 | # else if we have a plain file
|
---|
346 | elif [ -f "$FILE" ]; then
|
---|
347 | # Take care of hard-links: build the list of files hard-linked
|
---|
348 | # to the one we are {de,}compressing.
|
---|
349 | # NB. This is not optimum has the file will eventually be compressed
|
---|
350 | # as many times it has hard-links. But for now, that's the safe way.
|
---|
351 | inode=`ls -li "$FILE" | awk '{print $1}'`
|
---|
352 | HLINKS=`find . \! -name "$FILE" -inum $inode`
|
---|
353 |
|
---|
354 | if [ -n "$HLINKS" ]; then
|
---|
355 | # We have hard-links! Remove them now.
|
---|
356 | for i in $HLINKS; do rm -f "$i"; done
|
---|
357 | fi
|
---|
358 |
|
---|
359 | # Now take care of the file that has no hard-link
|
---|
360 | # We do decompress first to recompress with the selected
|
---|
361 | # compression ratio later on...
|
---|
362 | case $FILE in
|
---|
363 | *.bz2)
|
---|
364 | bunzip2 $FILE
|
---|
365 | FILE=`echo $FILE | sed s/\.bz2$//`
|
---|
366 | ;;
|
---|
367 | *.gz)
|
---|
368 | gunzip $FILE
|
---|
369 | FILE=`echo $FILE | sed s/\.gz$//`
|
---|
370 | ;;
|
---|
371 | esac
|
---|
372 |
|
---|
373 | # Compress the file with the highest compression ratio, if needed
|
---|
374 | case $COMP_SUF in
|
---|
375 | *bz2)
|
---|
376 | bzip2 ${COMP_LVL} "$FILE" && chmod 644 "${FILE}${COMP_SUF}"
|
---|
377 | echo "Compressed $FILE" > $DEST_FD1
|
---|
378 | ;;
|
---|
379 | *gz)
|
---|
380 | gzip ${COMP_LVL} "$FILE" && chmod 644 "${FILE}${COMP_SUF}"
|
---|
381 | echo "Compressed $FILE" > $DEST_FD1
|
---|
382 | ;;
|
---|
383 | *)
|
---|
384 | echo "Uncompressed $FILE" > $DEST_FD1
|
---|
385 | ;;
|
---|
386 | esac
|
---|
387 |
|
---|
388 | # If the file had hard-links, recreate those (either hard or soft)
|
---|
389 | if [ -n "$HLINKS" ]; then
|
---|
390 | for i in $HLINKS; do
|
---|
391 | NEWFILE=`echo $i | sed s/\.gz$// | sed s/\.bz2$//`
|
---|
392 | ln ${LN_OPT} "${FILE}$COMP_SUF" "${NEWFILE}$COMP_SUF"
|
---|
393 | chmod 644 "${NEWFILE}$COMP_SUF" # Really work only for hard-links. Harmless for soft-links
|
---|
394 | done
|
---|
395 | fi
|
---|
396 |
|
---|
397 | else
|
---|
398 | # There is a problem when we get neither a symlink nor a plain file
|
---|
399 | # Obviously, we shall never ever come here... :-(
|
---|
400 | echo "Whaooo... \"${DIR}/${FILE}\" is neither a symlink nor a plain file. Please check:"
|
---|
401 | ls -l ${DIR}/${FILE}
|
---|
402 | exit 1
|
---|
403 | fi
|
---|
404 | fi
|
---|
405 | done # for FILE
|
---|
406 | done # for DIR
|
---|
407 |
|
---|
408 | <command>EOF
|
---|
409 | chmod 755 /usr/bin/compressdoc</command></userinput></screen>
|
---|
410 |
|
---|
411 | <para>Now, as root, you can issue a
|
---|
412 | <command>/usr/bin/compressdoc --bz2</command> to compress all your system man
|
---|
413 | pages. You can also run <command>/usr/bin/compressdoc --help</command> to get
|
---|
414 | a comprehensive help about what the script is able to do.</para>
|
---|
415 |
|
---|
416 | <para> Don't forget that a few programs, like the <application>X</application>
|
---|
417 | Window system, <application>XEmacs</application>, also install their
|
---|
418 | documentation in nonstandard places (such as <filename class="directory">
|
---|
419 | /usr/X11R6/man</filename>, etc...). Don't forget to add those locations in the
|
---|
420 | file <filename>/etc/man.conf</filename>, as a MANPATH=/path section. Example:
|
---|
421 | <screen><userinput>
|
---|
422 | ...
|
---|
423 | MANPATH=/usr/share/man
|
---|
424 | MANPATH=/usr/local/man
|
---|
425 | MANPATH=/usr/X11R6/man
|
---|
426 | MANPATH=/opt/qt/doc/man
|
---|
427 | ...
|
---|
428 | </userinput></screen></para>
|
---|
429 |
|
---|
430 | <para>Generally, package installation systems do not compress man/info pages,
|
---|
431 | which means you will need to run the script again if you want to keep the size
|
---|
432 | of your documentation as small as possible. Also, note that running the script
|
---|
433 | after upgrading a package is safe: when you have several versions of a page
|
---|
434 | (for example, one compressed and one uncompressed), the most recent one is kept
|
---|
435 | and the others deleted.</para>
|
---|
436 |
|
---|
437 | </sect1>
|
---|
438 |
|
---|