Opened 13 months ago

Closed 9 months ago

#17870 closed enhancement (fixed)

Possible: provide options to install less of texlive

Reported by: ken@… Owned by: ken@…
Priority: normal Milestone: 12.0
Component: BOOK Version: git
Severity: normal Keywords:
Cc:

Description

It would be nice to allow users to install a smaller part of the current year's texlive.

Looking at gentoo, they have a lot of different configure options to allow conditional compilation of various parts - but at this stage I have no idea how they handle the related texmf parts - and it is texmf which uses the space. I suspect they include texmf in their source, and I assume that will make it unmanageably large for us, and for our mirrors.

Looking at Arch, all their builds start from the binary install-tl-unix and install parts of into /usr, with various things to support their packaging. There is an AUR texlive full which installs in /opt, but it too uses the binary.

My understanding is that debian (and derivatives) compile from source and install parts, similarly fedora.

If providing instructions for a smaller build in BLFS, the builder will apparently need to have space for the untarred texmf tarball and then copy the necessary parts. That is true even for a minimal (non-latex) install.

Change History (9)

comment:1 by ken@…, 12 months ago

I've spent a little time looking at this, and in particular looking at only installing only plain tex. Along the way I've learned the following:

  1. The build assumes latex is the main area of interest. There are options to disable various programs, but they are not always straightforward (e.g. if you use --disable-all-pkgs then you need to --enable-web2c to even get the TeX program and a lot of other things will come along for the ride, but xetex will still be installed).
  1. 'make install' overwrites some things in texmf-dist (found by doing a DESTDIR install without texmf-dist using --enable-texlive to get the symlinks to scripts: it installed all the symlinks to scripts AND created a texmf-dist/scripts containing at least some of the scripts. I think it also installed man pages and info files: for info, note that the dir file contains all the options, i.e. the full texmf-dist/info/dir is the same as the minimal plaintex (binary) texmf/dist/info/dir even though some of the items are not in non-full binary installs.
  1. As already noted, the main unwanted items for an install of fewer packages are in texmf-dist. But they are all over the place, and often the full install puts many more files in a directory than the smaller installs do. Also, the various hyphen* files from the full install are a lot larger than for minimal installs. Further, I was trying to replicate the contents of the binary installs - these include a lot of source (e.g. for fonts, source can include scripts and data files used in originally generating what is shipped). In particular, smaller binary installs can remove files/directories at varying depths in texmf-dist. Replicating that after a full install can mostly be done, but the with decreasing returns (e.g. for a medium install I guess I could eventually do most of this, but with an extra overhead of perhaps 0.3GB installed size. Since the full texmf-dist needs to be initially installed, there is little benefit in removing all the unwanted items (e.g. an extra 100MB is neither here nor there).
  1. Based on that, attempting to get towards the items installed in the various binary schemes (particularly medium, which covers most uses) will be a quest of diminishing returns.
  1. A lot of things are included in texmf-dist for use with legacy tex or latex scripts, e.g. omega (a way of handling large glyph codepoints, initially replaced by aleph and now replaced by luatex). Most people building from source will not need these.
  1. There are many other things in texmf-dist which many people will not need, e.g. for ConTeXt (shipped as mkiv, current version using luametatex should be installed from context garden, probably in /usr - fun, uses cmake) /opt/texlive/2023/texmf-dist/doc/context/

Therefore my current view is that I should review all the texlive --disable configure switches to determine what they remove, eventually document them, and then work out what people might wish to remove after the full install before removing broken symlinks (find (directory) -xtype l | xargs rm -v) and then rerunning mktexlsr. (on this machine '-type l' renders as if it is a vertical bar, it is actually a lowercase L)

At the moment I'm thinking that some things should always be disabled in the BLFS configure (e.g. legacy parts of omega, aleph) and adding some others as optional configure switches. And then documenting post-install removal of unwanted things. But that might upset our users (you never know who is using BLFS as long as everything is fine), so I'll post on blfs-dev, and (depending on results on blfs-dev) on blfs-support.

Obsessive, me ? Yes, I think I am. Although my current (25GB+) systems usually have space for installing texlive from source, I begrudge all the items I don't really need and they add pressure to my backups.

comment:2 by ken@…, 12 months ago

In the end, I came to understand that there is no easy way in a BLFS source build to disable most individual programs, let alone prevent the associated texmf-dist files fro mbeing installed (and doing a DESTDIR install in the absence of texmf showed that some scripts were installed - probably those where there is a symlink from the program directory.

The tlpdb database can be parsed, at least in theory, to establish which packages are in which scheme, and what is in each package (there are a couple of operations I do not grok for packaging less than all of the hypen files, but keeping all will make only a very small difference). BUT - texlive is written on GNU principles - all the source and scripts needed for recreating things. Unless you are deep into TeX development you will not usually want things such as the source for tfm fonts.

I have some notes, when I come back to this I will be thinking about providing a separate file, probably plain text in ~/ken, of some considerations for removing items after the install. Some people might prefer to remove all documentation (fine only if you are always online and your preferred search engine is always working). Many people will not need all the fonts. But what to remove is a very individual decision and people will need to review individual items before deciding.

comment:3 by ken@…, 10 months ago

Status update:

Since my example is only useful to my exact usage, although it hopefully will contain enough details to guide anyone who wants to take a similar approach to removing unnecessary things, I've been working on trying to parse the tlpdb. This is certainly straining my brain (using perl, bash is just too slow). Documenting current state of play now.

My intention was to work out what is needed, then create working copies of the required files and then tar those up and use them to replace the full install.

There is documentation on it at https://tug.org/TUGboat/tb34-3/tb108preining-distro.pdf but it might be out of date. My current understanding:

  1. There are TLCore items which provide files and programs.
  1. There are a number of schemes. I have been working on scheme-medium (there are a couple of larger schemes as well as scheme-full which is what we build from source.
  1. A scheme depends on a number of collections.
  1. A collection may depend on other collections (e.g. collection-langchinese depends on collection-langcjk). A couple of collections in scheme-medium have such dependencies, but in fact they were also in the main dependencies. Looking at the file, it semes two passes should always be enough.
  1. A collection depends on packages. Every package is in exactly one collection, but what I had not expected was that a package can depend on other packages. Currently stalled here, there might be too many items to review.
  1. Apart from this, a package may contain texmf-dist/ items OR RELOC/ items which appear to be for texmf-dist. It can also depend on programs for an ARCH (x86_64-linux in our case).

There are certain limitations from cutting down after doing a full install, in particular :

(i.) the full install includes all hyphenation, other schemes have less of this.

(ii.) the shipped updmap.cfg contains all fonts, instead of only those that are present. My cut-down version similarly does this, but since the formats have already been created it does no harm.

(iii.) tlmgr cannot be used, obviously.

(iv.) Only copying the specified programs from the full build into the work area might result in broken symlinks. Care will be needed. Also, removing compiled programs risks having to reinstall all of texlive if too much is accidentally removed.

I'm going to look at the first few packages which were reported multiple times in my files, to see if they all come from the same collection (or from TLCore).

comment:4 by ken@…, 10 months ago

Continuing, with the assumption that packages depend on packages that are either in the same collection, or are in TLCore. I think the main reason for the depend items is to enable people to add packages and automatically pull in missing dependencies.

comment:5 by ken@…, 10 months ago

Got to the point where I can list the texmf-dist files and the programs my script reports as part of scheme-medium, and compare those to what is in the binary for scheme-medium from when TL2023 was released.

I have more programs (329) than are in the binary (125) so something is seriously wrong in that part.

For the files, I have occasional missing files, unwanted doc/generic/elhyphen (not in the binary) and missing all of doc/generic/pgf. I stopped comparing the list of files at that point. My attempt to parse the tlpdb to reduce the full source install towards an arbitrary scheme install is going nowhere and now abandonned.

comment:6 by Bruce Dubbs, 10 months ago

I generally do a binary install that is relatively small because I really only wanted things like tex, latex, dvips, etc. Here are some stats:

$ du -sh /mnt/texlive/
1.3G    /mnt/texlive/

$ find /mnt/texlive/ |wc -l
  36234 

$ ls -l /mnt/texlive/2022/bin/x86_64-linux/|wc -l
249

Of that last number, 164 were symlinks to places like ../../texmf-dist/scripts. That directory has a size of 110 MB.

in reply to:  6 comment:7 by ken@…, 10 months ago

Replying to Bruce Dubbs:

I generally do a binary install that is relatively small because I really only wanted things like tex, latex, dvips, etc. Here are some stats:

$ du -sh /mnt/texlive/
1.3G    /mnt/texlive/

$ find /mnt/texlive/ |wc -l
  36234 

$ ls -l /mnt/texlive/2022/bin/x86_64-linux/|wc -l
249

Of that last number, 164 were symlinks to places like ../../texmf-dist/scripts. That directory has a size of 110 MB.

I was aware of why you use that approach, I'd hoped to be able to offer something similar by removing things from the full source build, but doing that is currently beyond me - I mostly script in bash, far too slow to read all of texlive.tlpdb, and my rusty perl has led to some fatal bugs of omitting some files and adding others.

My use-case is documenting what fonts can do (if I ever get back to that) and trying to maintain our source builds: for that I have gradually increased my tex documents. There is a lot I don't need, other things I will only look at if I'm online and stumble across explanations of why|how to use them.

A hint is forthcoming, I've just uploaded my detailed thoughts on what *I* need and what I can do without, plus comments on testing and the diminishing returns from looking at removing certain items, https://www.linuxfromscratch.org/~ken/TL2023/reduced-2023-texmf.txt

comment:8 by ken@…, 9 months ago

Milestone: x-future99-Waiting

Updated and spell-checked version of hint has been submitted, keeping this open until hint is accepted.

comment:9 by ken@…, 9 months ago

Milestone: 99-Waiting12.0
Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.