Opened 2 years ago

Closed 2 years ago

#5027 closed defect (fixed)

Sysv: Bootscripts are not adapted to runlevel 1

Reported by: pierre Owned by: pierre
Priority: normal Milestone: 11.2
Component: Bootscripts Version: git
Severity: normal Keywords:
Cc:

Description

From https://wiki.linuxfromscratch.org/blfs/ticket/16177 comment 29:

  • when typing control-D in runlevel 1, you get:
    sulogin: cannot read /dev/tty1: Operation not permitted
    INIT: no more processes left in this runlevel
    

and then there is no way recover but typing alt-ctrl-delete.

  • when rebooting (or shutting down) from runlevel 1, the system tries to kill a lot of applications (all those that have a Kxxx file in rc6.d (or rc0.d)), although most of them are not running.

Change History (24)

comment:1 by Douglas R. Reno, 2 years ago

While we're here, we might want to update the email address as well. I think the new one is "lfs-support@…", but a report on the lists recently had bootscripts that said "lfs-support@…"

comment:2 by pierre, 2 years ago

Owner: changed from lfs-book to pierre
Status: newassigned

I'm not sure about the error about /dev/tty1, but since sulogin is a "once" process in runlevel 1, init thinks the runlevel is terminated when sulogin exits. sulogin should be "respawn" in runlevel 1. Or we could use agetty on only one vt.

Note that runlevel S is special in the sense that when it is finished, it starts the default runlevel. This is not the case for runlevel 1.

For the second issue, the problem is that runlevel 0 and 6 do not test whether a daemon is started before killing it (if there is a Kxxx file in those runlevel).

in reply to:  1 comment:3 by pierre, 2 years ago

Replying to Douglas R. Reno:

While we're here, we might want to update the email address as well. I think the new one is "lfs-support@…", but a report on the lists recently had bootscripts that said "lfs-support@…"

Thanks, will look at this too

comment:4 by pierre, 2 years ago

Email addresses fixed at fb651b911. Note that the tarball will not be regenerated until the lfs bootscripts version is changed in packages.ent. I do not want to do this now, since I think I'll make other changes to the bootscripts soon.

The control-D in level 1 issue is fixed at bb9cb3c0d8.

Now, for the second issue, I'd like to understand why some of the scripts that have "Default-Stop: 0 6" are installed by the Makefile as "Sxxx" in those runlevels. This interacts badly with install_initd (that we ship in blfs), which wants them as "Kxxx", and so does not remove the corresponding Sxxx symlinks. This also has us use this weird workaround of running "<script> stop" for scripts that are marked as Sxxx in this runlevel. Then true start scripts (things that must be started in those runlevels just before rebooting/halting, and the reboot/start scripts themselves) start something in the "stop" case, which is some kind of inverted logic IMO. Furthermore install_initd is unable to order them correctly, it seems, see https://github.com/lfs-book/LSB-Tools/issues/11.

comment:5 by pierre, 2 years ago

See also BLFS ticket #16277 for some problems in runlevel 2, and some proposals for runlevel policy.

comment:6 by pierre, 2 years ago

I'd like ot be able to treat runlevels 0 and 6 like the others, that is: if a symlink is of the form Sxxyyy, run "yyy start". If a symlink is of the form Kxxyyy, run "yyy stop". This means "Sxxyyy" in rc{0,6}.d would point to a script that should be only run when halting/rebooting the machine.

The problem is with scripts started in another runlevel (S for example), that should run after some script which is only started in runlevel 0/6. We have such an example in the LFS bootscripts: sendsignals must be run before umount'ing (non root) filesystems and stopping swap, otherwise umount may complain that the fs is busy, and/or some OOM may occur if one of the running processes use a lot of memory.

But with my proposal, we would have Kxxmount, Kxxswap, and Sxxsendsignals, so that signals would be sent after trying to umount, and there may be a dead lock...

One solution for umount is to use "lazy" umount (umount -l), which only umounts the filesystem(s) when it is not busy anymore. I'm not sure about swapoff...

Another solution is to have exceptionally a Kxxsendsignals file (sending signals when called as "sendsignals stop"), properly ordered with respect to Kxxmount and Kxxswap. But this is counter-intuitive to me.

Note that with the /usr merge, all applications may be expected to be on the root fs, so umounting everything else shouldn't be a problem in this respect.

comment:7 by Bruce Dubbs, 2 years ago

I can think of an issue where programs are running, but not on /. In the past, I've put /opt on a separate partition, so if halting from a system using kde apps, libreoffice, java, texlive, fop, rustc, qt, etc they should be stopped before trying to umount /opt.

Some things to note:

  1. If going to run levels 0 or 6, all scripts (doesn't matter if starting with a K or S) are sent a stop command.
  2. The K scripts are always stopped first. They are always sent a stop command.
  3. The S scripts are run second. They are sent a start command at run levels 1-5, but a stop command for run levels 0 and 6.
  4. There is no case where the previous run level is 0 or 6.

It appears that the problem is not the code itself, but the viewer's interpretation of what K and S do.

What we have now is:

K90sysklogd
S60sendsignals
S65swap
S70mountfs
S90localnet
S99reboot or S99halt

Perhaps we could rename S60sendsignals to K91killprogs (or similar) and then have K93swap, K95mountfs, K97localnet, S99reboot or S99halt.

comment:8 by pierre, 2 years ago

Maybe another possibility is to have specific scripts for:

  • unmounting the fs (let's suppose it is named unmount): would umount the fs when called as "Sxxunmount start"
  • shutdown swap (same with an appropriate name)
  • shutdown the local network interface (same with an appropriate name)

I'd like to get rid of the specificity of runlevels 0 and 6 in the rc script: that would make LSB-tools easier too.

And I'd like also to get rid of this limitation to be able to kill a daemon only if it has been started in the previous runlevel (but not for rl 0 and 6, where they are killed unconditionally). Kill a daemon if pidofproc reports that it is running, period...

So the semantics of K and S files would be:

  • S: start in that runlevel unless already running (as determined by pidofproc or statusproc)
  • K: stop in that runlevel if already running (as determined by pidofproc or statusproc)

whatever the previous runlevel is... Always run K files before S files Note that pidofproc (or statusproc) should be called from the script (the only place where the name of the daemon is known), not in from rc script.

This could be written in a README file in /etc/init.d (debian's way), and the template modified to call pidofproc (or statusproc) before actually calling killproc or start_daemon.

comment:9 by Bruce Dubbs, 2 years ago

I'm really not happy with the idea of separate scripts. The code in the start and stop sections is quite short and very easy to understand. The complexity is in the rc script, but even that is only 236 lines of code with comments. It also incorporates the ability for a user to step through each script one at a time (see IPROMPT) and does logging.

The scripts in rcS.d can assume that the function is not active as they are only run at system initialization.

The start/stop section starts at line 149 and ends at line 216.

I agree that individual scripts should check to see if they are running when starting or stopping. We do that now for ifup and ifdown.

The auxiliary boot script functions in /usr/lib/services/init-functions do some of the common work. For instance start_daemon() does check to see if a program is already running before trying to start.

comment:10 by pierre, 2 years ago

Maybe I should state (for myself as well as for others) what I try to achieve: If we only use the Makefile for installing, we can do whatever we want. But if we want to use the LSB-tools, this is another story: I'd like to have something that can indifferently use "make install-xxx" in the {b,}lfs-bootscripts directory or use "install_initd <path-to script>". Unless we remove completely the LSB-tools from BLFS, we have to be compatible, otherwise you can get a real mess (install_initd for some script renumbers the S/K files, so that if the Makefile is used again again afterwards, scripts can end wrongly sorted).

This has several implications:

  • install_initd does not understand that a file with Sxx can be called with "stop". This can be solved by using sendsignals only in stop mode, as Bruce has proposed above.
  • the Makefile should use install_initd if present, so that sorting is always done dynamically after install_initd has run once. This would involve some code like (I have not found a way to include tabs in trac, so be careful it is 8 spaces):
    install-xxx:
            # do the usual createdir, create files, and cp script
            if type install_initd >/dev/null; then \
                install_initd script; \
            else \
                #usual linking...
            fi
    

As of why LSB-tools should be used, I am not sure, but some users seem to want LSB compatibility (see BLFS ticket #16277).

comment:11 by Bruce Dubbs, 2 years ago

LSB_Tools was written by DJ. Change it to conform to what we have.

We do not invoke LSB_Tools in LFS or BLFS.

If a bootscript needs to be updated or the bootscripts Makefile needs to be updated to fix the title of this ticket, then OK. If we need to change /etc/inittab, then that's OK too, but William just did that.

Actually, I'd prefer removing LSB_Tools. At one time I spent a lot of effort getting order of the scripts correct for each of the run levels. If I made a mistake, then lets fix that, but I don't want to do a major reorganization of the boot scripts every time a new script is added and have every user use a different setup.

comment:12 by pierre, 2 years ago

I think I am the one who changed /etc/inittab :) but not a big deal. If I concentrate on the issue reported in the description, I think the problem comes from the fact the scripts directly send signals to the running process without testing first that it is running... For example the posgresql script uses:

[...]
   stop)
      log_info_msg "Stopping PostgreSQL daemon..."
      su - postgres -c "/usr/bin/pg_ctl stop -m smart -s -D /srv/pgsql/data"
      evaluate_retval
      ;;
[...]

and a message is written to the console if postgresql is not running.

comment:13 by pierre, 2 years ago

Note that we have no way to test the pid list using the script name (that is, inside rc), since:

  • the script is named postgresql while the dameon is named postgres
  • there is no pid file...

So the test has to be done inside the script.

Note also that obviously, killproc cannot be called since the command to run does not contain the name of the daemon.

comment:14 by Bruce Dubbs, 2 years ago

I thought you were the one to change inittab also, but trac thinks differently. Looking more closely, it looks like that change was for the arm branch.

As for postgresQL, the script should shut it down, not sendsignals. If it is not running, then it must have been stopped but other means (e.g. manually). I do not think a message that it is not running is wrong.

sendsignals uses killall5 which sends a signal to all processes. First a -15 (SIGTERM) and then a -9 (SIGKILL).

comment:15 by pierre, 2 years ago

William uses cherry-pick for the arm branch, and it seems the original committer name is lost (not a big deal either).

Well, the problem for postgresql (it is an example of what happens in the second "bullet" of the "Description" above), is that switching from runlevel 1 (where postgres is not running) to runlevel 0/6 unconditionally runs "/etc/init.d/posgresql stop", because runlevels 0/6 have a Kxxpostgresql in their rc directory. I think this is ok, but there shouldn't be an error message. Note that killing postgres with sendsignals would leave unwanted files on the disk, so that there would be a message that posgresql is already running at next restart...

comment:16 by pierre, 2 years ago

Let me summarize what is to be done for the bootscripts as I understand it:

  • do not reject a Kxx file if it has not a corresponding Sxx in the previous runlevel: this prevents using some scripts that are never started in runlevel 0 (or any runlevel: for example, in a recent ticket for blfs, killing elogind (which is now done in another way) when switching to runlevel 1 couldn't be done because it was never started by us (it is started by dbus)).
  • this implies that all boot scripts should be reviewed to see if they need to test whether the daemon is running before starting or stopping it. Using start_daemon and killproc allows that, but it is not always possible to use those functions. Not all bootscripts run a daemon, so in this case, I guess they do not need to be changed.
  • a README should be added to the /etc/init.d directory, with some indications on how to write/use bootscripts. The template should be upgraded too. Note that having a README in /etc/init.d is advised by the manual page of init(8).
  • Have a uniform policy for all runlevels:
    • a Kxxscript is called as "script stop"
    • a Sxxscript is called as "script start", after all the Kxxscripts
    • special case in runlevel 0/6: since sendsignals needs to be run before "swap stop", "mount stop", and "localnet stop", sendsignals should do its job in the "stop)" case.
  • in order for users to be able to run the lsb tools, all the scripts should use the LSB headers (I think DJ has done this already)
  • The new runlevel policy for LFS should be (see the ticket in BLFS,and also discussed in irc):
    runlevel S: maintenance mode
    runlevel 0: halt
    runlevel 1: single user with no network (note that nothing prevents running "ifup")
    runlevel 2: customized by users, same as 3 by default
    runlevel 3: multiuser without graphics (but nothing prevents to run "startx")
    runlevel 4: customized by users, same as 3 by default
    runlevel 5: multiuser with a display manager
    runlevel 6: reboot
    

This should be changed in the LFS page about inittab. Also, we shouldn't reference xdm and kdm, that are both obsolete. We could replace with gdm and lightdm, which are both in BLFS.

Last edited 2 years ago by pierre (previous) (diff)

comment:17 by jlocash, 2 years ago

The only question I would ask is why is there a runlevel S in your proposal? If 0 is halt you can't have a level before that. Wouldn't RL-S be the same as RL-1?

🇺🇦

comment:18 by Bruce Dubbs, 2 years ago

Run level S is for startup only. It mounts virtual file systems first and then localnet, udev, swap, partitions, console, etc. It is a perequsite for all other run levels.

comment:19 by jlocash, 2 years ago

Is there a reason that couldn't be done in RL-1?

🇺🇦

in reply to:  19 ; comment:20 by Bruce Dubbs, 2 years ago

Replying to jlocash:

Is there a reason that couldn't be done in RL-1?

What would happen if the default run level is 3?

in reply to:  20 comment:21 by jlocash, 2 years ago

Replying to Bruce Dubbs:

Replying to jlocash

Is there a reason that couldn't be done in RL-1?

What would happen if the default run level is 3?

I'm not sure what you mean. If everything were mounted the system would boot like normal?

comment:22 by Bruce Dubbs, 2 years ago

The point is that run level S does the mounting and other essential tasks. Look at /etc/rc.d/rcS.d and view the scripts in order.

Also look at /etc/inittab.

si::sysinit:/etc/rc.d/init.d/rc S

starts things up. Then

l3:3:wait:/etc/rc.d/init.d/rc 3

or

l5:5:wait:/etc/rc.d/init.d/rc 5

does the rest. Also see 'man inittab'

comment:23 by pierre, 2 years ago

RL S is defined by sysv init program. From the man page:

A  runlevel is a software configuration of the system which allows only
a selected group of processes to exist.  The processes spawned by  init
for each of these runlevels are defined in the /etc/inittab file.  Init
can be in one of eight runlevels: 0–6 and S (a.k.a. s).   The  runlevel
is  changed by having a privileged user run telinit, which sends appro‐
priate signals to init, telling it which runlevel to change to.

Runlevels S, 0, 1, and 6 are reserved.  Runlevel S is used to  initial‐
ize the system on boot.  When starting runlevel S (on boot) or runlevel
1 (switching from a multi-user runlevel) the system is entering  ``sin‐
gle-user  mode'', after which the current runlevel is S.  Runlevel 0 is
used to halt the system; runlevel 6 is used to reboot the system.

After booting through S the system  automatically  enters  one  of  the
multi-user  runlevels  2  through 5, unless there was some problem that
needs to be fixed by the administrator in single-user  mode.   Normally
after  entering single-user mode the administrator performs maintenance
and then reboots the system.
[...]

So IIUC, they say that running telinit 1 or telinit S is the same. But expermimenting with this, there is a slight difference: when runlevel S terminates (exiting sulogin), init switches to the default runlevel, when runlevel 1 terminates, then it is terminated ("no more processes left in this runlevel"). When the manual says that after running "telinit 1", the rl is S, it does not seem to be completely true.

Note that nothing prevents the administrator to enter "telinit S" when the system is in multiuser mode, so saying that runlevel S is only for system init is not quite true. It's for system maintenance, as I wrote. But what is true is that "rc S" is only run at system init with our inittab. When typing "telinit S", our inittab runs "rc 1".

Anyway, I'll begin to work along the lines of comment:16. I think it will be easier to discuss with something concrete...

comment:24 by pierre, 2 years ago

Resolution: fixed
Status: assignedclosed

Fixed at 827cc05c3

Note: See TracTickets for help on using tickets.