Opened 2 years ago

Closed 2 years ago

#16177 closed defect (fixed)

elogind not getting killed when dbus is killed

Reported by: Joe Locash Owned by: pierre
Priority: normal Milestone: 11.2
Component: BOOK Version: git
Severity: normal Keywords:
Cc:

Description

In sysvinit builds (obviously) elogind isn't getting killed in rc1 as it should. It's only a noticable problem when switching from a higher init level to 1 and then back to a higher level ie - booting into init 3, switching to 1, and then back to 3.

When switching back to a higher init level dbus will try to start elogind because of elogind's dbus service file. Since elogind is already running it detects that and won't start another process. Even though there is an elogind process running it isn't registered with the currently running dbus process so logins can take some time to process because dbus starts elogind and waits for it to register.

To reproduce:

Boot to any init level and switch to init 1. Do a ps and you will see that elogind is still running. Switch back to a higher init level (for testing I would recommend 2 or 3 to take X/Wayland out of the picture). On the console you should see a message that elogind is already running as PID xyz. Trying to login will result in a 45 second pause because dbus is waiting for elogind to register, which it can't because it's already running.

I suggest starting/stopping elogind in the init system instead of having dbus starting it and nothing shutting it down. The other distros do this.

I've attached a patch for blfs-bootscripts-20210826 that I use that has the init script I made. The elogind dbus service file should also be changed with this sed:

sed -i "s|Exec=.*|Exec=/bin/true|" src/login/org.freedesktop.login1.service.in

for a source build or just apply it to /usr/share/dbus-1/system-services/org.freedesktop.login1.service

I've tagged this for 11.2 since 11.1 is due out soon.

Attachments (2)

blfs-bootscripts_add-elogind.patch (3.6 KB ) - added by Joe Locash 2 years ago.
elogind (1.5 KB ) - added by Joe Locash 2 years ago.
Updated elogind init script

Download all attachments as: .zip

Change History (32)

by Joe Locash, 2 years ago

comment:1 by pierre, 2 years ago

I can reproduce this. I'd suggest a slightly different solution: if elogind is killed while in runlevel 1 and then the system is switched to runlevel 3, there is no need to start elogind, since dbus starts and registers it. So I think we'd need only the Kxxx symlinks, not the Sxxx ones. This way, there is no need to change the dbus service file.

comment:2 by Joe Locash, 2 years ago

The problem with doing that Pierre is the LFS init system won't shut down elogind because it didn't start it. The Kxx scripts won't get run because the Sxx scripts didn't get run.

comment:3 by pierre, 2 years ago

Ok, so "elogind start" should exist but just do nothing, maybe...

comment:4 by Joe Locash, 2 years ago

In the init script for start) change /usr/lib/elogind/elogind --daemon to /usr/bin/true Leave the Sxx symlinks in place and remove the sed.

comment:5 by pierre, 2 years ago

Owner: changed from blfs-book to pierre
Status: newassigned

comment:6 by pierre, 2 years ago

I think I will do what has been said at comment:4, and also remove the log_info_message and evaluate_retval calls: there is no point in telling the user that "/usr/bin/true" has been started, and elogind itself is not started.

comment:7 by pierre, 2 years ago

Well, trying to do sensible things with reload and restart seems to be difficult: if start becomes /usr/bin/true, then restart (as written) will just stop the daemon, and not restart it. With reload, the problem is that it may happen that the daemon is not running, but in this case killproc -HUP will generate an error, which is maybe not what it should do...

Furthermore, we have other daemons that are automatically started by dbus (accountsservice at least), and they may suffer the same problems. For example, I guess managing accounts in a DE (using the accountsservice dbus interface) would fail or be rather slow after descending to runlevel 1 and going back to runlevel 5.

So actually, the proposed solution might be better (have a script and do not rely on dbus to start the daemon), and should be applied to other daemons as well...

The problem is that dbus can start daemons, but is not a daemon monitoring tool. If we had a way to know which daemons have been started by dbus, then we could kill those daemons before killing dbus, but I do not think there is an easy way to find the list of daemons started by dbus.

in reply to:  7 comment:8 by Xi Ruoyao, 2 years ago

Replying to pierre:

The problem is that dbus can start daemons, but is not a daemon monitoring tool. If we had a way to know which daemons have been started by dbus, then we could kill those daemons before killing dbus, but I do not think there is an easy way to find the list of daemons started by dbus.

In theory we can put dbus into a cgroup and track every process in the cgroup.

comment:9 by Douglas R. Reno, 2 years ago

I know of ModemManager and upower that are also started via dbus activation on SysV

comment:10 by pierre, 2 years ago

Actually, I found a method to list all the processes activated by dbus, using dbus calls (ATM with gdbus for the POC, but I guess it is feasible with dbus-send):

for i in $(gdbus call --system \
                      --dest org.freedesktop.DBus \
                      --object-path /org/freedesktop/DBus \
                      --method org.freedesktop.DBus.ListActivatableNames | \
                 sed 's/[][(),]//g'); do
    if $(gdbus call --system \
                    --dest org.freedesktop.DBus \
                    --object-path /org/freedesktop/DBus \
                    --method org.freedesktop.DBus.NameHasOwner "$i" | \
               sed 's/[][(),]//g'); then
        echo -n $i is running, daemon:" "
        j=$(eval echo $i)
        grep Exec= /usr/share/dbus-1/system-services/${j}.service | sed s/.*=//
    fi
done

On my system running in GNOME (I hope there would be many less running after closing the DE):

'org.freedesktop.DBus' is running, daemon: grep: /usr/share/dbus-1/system-services/org.freedesktop.DBus.service: No such file or directory
'org.freedesktop.login1' is running, daemon: /lib/elogind/elogind --daemon
'org.freedesktop.ColorManager' is running, daemon: /usr/libexec/colord
'org.freedesktop.PolicyKit1' is running, daemon: /usr/lib/polkit-1/polkitd --no-debug
'org.freedesktop.ModemManager1' is running, daemon: /usr/sbin/ModemManager
'org.freedesktop.UPower' is running, daemon: /usr/libexec/upowerd
'org.freedesktop.UDisks2' is running, daemon: /usr/libexec/udisks2/udisksd
'fi.w1.wpa_supplicant1' is running, daemon: /usr/sbin/wpa_supplicant -u
'org.freedesktop.Accounts' is running, daemon: /usr/libexec/accounts-daemon
'org.freedesktop.locale1' is running, daemon: /usr/libexec/blocaled

comment:11 by pierre, 2 years ago

Could do this with dbus-send (no point to put the code here I guess. But the problem is that elogind is not known as elogind to the /proc system, but as elogind-daemon. So pidof cannot find it...

comment:12 by pierre, 2 years ago

I'm thinking of something: what would happen if we'd run:

sudo /etc/init.d/dbus restart

I guess elogind would not be killed, but the new dbus wouldn't know about it... So all daemons using dbus (whether started by dbus or started by a script) should be killed when dbus is killed. And the restart action should memorize those killed daemons and restart them!

I begin to understand the limitations of sysv, I think :)

comment:13 by Joe Locash, 2 years ago

elogind would get killed by doing a restart. However, I'm not sure dbus would restart it again until dbus received a login request so there would be no guarantee that elogind would start immediately. In the scenario you propose: having dbus start elogind, in the restart) part of the script change "$0 start" to "/usr/lib/elogind/elogind --daemon". This way elogind is started immediately after stopping it. It will register w/ dbus and dbus won't try to start it when needed.

Last edited 2 years ago by Joe Locash (previous) (diff)

in reply to:  13 comment:14 by pierre, 2 years ago

Replying to Joe Locash:

elogind would get killed by doing a restart.

I may be mistaken here, but if elogind is not killed when switching to runlevel 1, why should it be killed when dbus is restarted (that is killed then started)?

However, I'm not sure dbus would restart it again until dbus received a login request so there would be no guarantee that elogind would start immediately. In the scenario you propose: having dbus start elogind, in the restart) part of the script change "$0 start" to "/usr/lib/elogind/elogind --daemon". This way elogind is started immediately after stopping it. It will register w/ dbus and dbus won't try to start it when needed.

Yes, problem is it seems elogind is not the only daemon that is in this case.

comment:15 by Bruce Dubbs, 2 years ago

This whole issue seems to be losing the big picture. In most cases LFS is running for a single user. In that case the only run level changes that are needed are between levels 3 and 5.

In the case of a server like rivendell, dbus and elogind are not even installed.

So the question is: What is the use case of going to run level 1 or 2? It would only be needed when multiple users are using a graphical interface on the same system. Is that still a thing?

comment:16 by Joe Locash, 2 years ago

The use case for run level 1 is updating a package that could be in use for other run levels. The reason I found this issue was updating expat. In multi-user mode (run level >=2) that library was open by other processes. This is how I found the issue.

comment:17 by pierre, 2 years ago

Going to runlevel 1, I am not sure of the use case (Joe answered after I wrote this), but restarting dbus after a configuration change may be needed... We have the possibility to do that with the dbus init.d file. But my guess is that it would make such a mess that the only possibility would be to reset the computer...

comment:18 by Bruce Dubbs, 2 years ago

Did you try just updating in place?

In the case of expat, the libraries on my system are:

lrwxrwxrwx 1 root root     17 Feb 23 15:19 /usr/lib/libexpat.so -> libexpat.so.1.8.6
lrwxrwxrwx 1 root root     17 Feb 23 15:19 /usr/lib/libexpat.so.1 -> libexpat.so.1.8.6
-rwxr-xr-x 1 root root 190664 Feb 15 00:33 /usr/lib/libexpat.so.1.8.4
-rwxr-xr-x 1 root root 559304 Feb 23 15:19 /usr/lib/libexpat.so.1.8.6

The .so file is for linking a new program/package. The .so.1 file is for loading a running program. If updating, a running program will continue to us the old version until it is restarted. Then it will use the new version.

Stopping the running programs is not necessary.

comment:19 by Joe Locash, 2 years ago

"If updating, a running program will continue to us the old version until it is restarted. Then it will use the new version."

Which is exactly my reason for switching to init 1, updating, and then switching back to init 5. expat was a security update. Processes in init 2 were using libexpat so I switched to init 1 and updated. When switching back to 5 things went to hell because elogind never got killed in init 1.

Edit: meant to say processes in init 2 where using libexpat.so-1.8.4

Last edited 2 years ago by Joe Locash (previous) (diff)

comment:20 by pierre, 2 years ago

After some experiments in a VM, it seems that the only daemon that is not killed when dbus is killed is elogind. I wonder whether this does not come from the change of name: elogind is know to "ps" as elogind-daemon, while it is launched as /usr/lib/elogind/elogind...

comment:21 by Joe Locash, 2 years ago

Pierre would you please do me the favor of applying the changes I suggested in the inititial report of this ticket and report your findings?

in reply to:  21 comment:22 by pierre, 2 years ago

Replying to Joe Locash:

Pierre would you please do me the favor of applying the changes I suggested in the inititial report of this ticket and report your findings?

Sorry, I thought I had answered: it works. But the problem is it does not solve the fact that elogind is not killed when dbus is killed (all other daemons started by dbus are killed in this case). Would you mind if I changed the title of the ticket to "elogind not getting killed when dbus is killed"? This includes your report, but is not limited to the use of runlevel 1.

comment:23 by Joe Locash, 2 years ago

I don't mind at all.

comment:24 by pierre, 2 years ago

Summary: elogind not getting killedelogind not getting killed when dbus is killed

by Joe Locash, 2 years ago

Attachment: elogind added

Updated elogind init script

comment:25 by Joe Locash, 2 years ago

I added an updated elogind init script. This one leaves seats and session in place when doing a restart.

comment:26 by Xi Ruoyao, 2 years ago

Some observations: d-bus does not really kill anything on exit. A d-bus service will recieve a "Disconnect" d-bus signal when d-bus daemon exits. Most d-bus service will exit itself once recieve this signal, but elogind does not.

We should either (a) collaborate with elogind maintainers to make d-bus to exit responding the "Disconnect" signal, or (b) start/stop it via a bootscript.

comment:27 by pierre, 2 years ago

In view of (a), I've filed https://github.com/elogind/elogind/issues/224

Note that for (b), the start/stop of elogind has to be inside the dbus boot script.

comment:28 by pierre, 2 years ago

I've submitted a PR upstream: https://github.com/elogind/elogind/pull/225. A sed in the book can make an equivalent change:

sed -i '/request_name/i\
        r = sd_bus_set_exit_on_disconnect(m->bus, true);\
        if (r < 0)\
                return log_error_errno(r, "Failed to set exit on disconnect: %m");' \
src/login/logind.c

comment:29 by pierre, 2 years ago

In the course of my investigations, I've found that the lfs bootscripts are not well adapted to runlevel 1:

  • when typing control-D in runlevel 1, you get:
    sulogin: cannot read /dev/tty1: Operation not permitted
    INIT: no more processes left in this runlevel
    
    and then there is no way recover but typing alt-ctrl-delete.

Note that the error from sulogin is not the problem. The problem is that the init system does not automatically switch to the default runlevel when the last job exits in runlevel 1. This occurs only in runlevel S, which has a special behavior.

  • when rebooting (or shutting down) from runlevel 1, the system tries to kill a lot of applications (all those that have a Kxxx file in rc6.d (or rc0.d)), although most of them are not running.
Last edited 2 years ago by pierre (previous) (diff)

comment:30 by pierre, 2 years ago

Resolution: fixed
Status: assignedclosed

Fixed at 27e5cd4c3. Note that booscripts may need another ticket, but I think the reported issue in this ticket is fixed.

Note: See TracTickets for help on using tickets.