Opened 16 years ago

Closed 16 years ago

#2160 closed defect (fixed)

Bootscript functions handle stale pid files poorly

Reported by: dnicholson@… Owned by: DJ Lucas
Priority: normal Milestone: 6.4
Component: Bootscripts Version: SVN
Severity: critical Keywords:
Cc:

Description

When the bootscript specifies a pid file to use with the "-p pidfile" argument, *proc functions currently bail out when the referenced file contains an invalid pid. This causes big problems on startup when loadproc returns successfully without actually starting the service.

This might sound obscure, but it usually hurts me after I've had to do a hard boot (they happen) and all my pid files are stale. Then most services (my scripts always use -p when a pidfile is available) fails to start and I'm left with garbage. The only way to fix it is to manually hunt down and remove the stale pid files.

See BLFS ticket #2408 (http://wiki.linuxfromscratch.org/blfs/ticket/2408) for an example.

I'm attaching a patch which changes loadproc, killproc and reloadproc to warn and remove the stale pid files when they are encountered. loadproc then continues instead of bailing out. I ran some testcases with the patch, but I haven't applied the changes to my system yet.

Attachments (1)

stale-pidfiles.patch (2.4 KB ) - added by dnicholson@… 16 years ago.
Remove stale pid files when encountered

Download all attachments as: .zip

Change History (12)

by dnicholson@…, 16 years ago

Attachment: stale-pidfiles.patch added

Remove stale pid files when encountered

comment:1 by bryan@linuxfromscratch.org, 16 years ago

I'm a bit surprised that the existing cleanfs file didn't remove those stale .pid files, actually. Or weren't they in /var/run?

Or is S50 too late to clean up the .pid files that you use?

comment:2 by bdubbs@…, 16 years ago

One way to help, but not necessarily fix everything would be to add:

rm /var/run/*.pid

to the /etc/rc.d/init.d/cleanfs start procedure.

(Written and posted at the same time as the above)

comment:3 by DJ Lucas, 16 years ago

That's not correct, nor was my simple patch. Dan's patch is complete. You need to be warned if a program did not exit correctly and then clean up the mess. I'll have to make similar modification to mine.

in reply to:  1 comment:4 by dnicholson@…, 16 years ago

Replying to Bryan Kadzban:

I'm a bit surprised that the existing cleanfs file didn't remove those stale .pid files, actually. Or weren't they in /var/run?

Or is S50 too late to clean up the .pid files that you use?

I hadn't actually thought of that at all, and now I'm surprised, too. They are in /var/run, and looking at cleanfs, I don't have any idea why they wouldn't be cleaned up.

I'll try to check it out the next time the system doesn't shut down cleanly, but I would think we don't want to rely on cleanfs for this.

comment:5 by dnicholson@…, 16 years ago

I think I figured out the reason why cleanfs isn't doing the right thing for me on some occasions. My TZ is UTC-8 (PDT) and the hwclock is stored in localtime. When mountkernfs runs, setclock has not been run yet. So, /proc and /sys both have modification times 8 hours (7 w/DST) prior to the current time. cleanfs checks for files in /var/run that are older than /proc. In this case, the faulty file is within that 7 hour window because I'd rebooted this morning, too:

$ ls -ld /proc /sys /var/run/NetworkManager.pid
dr-xr-xr-x 127 root root 0 2008-05-25 08:22 /proc
drwxr-xr-x  11 root root 0 2008-05-25 08:22 /sys
-rw-r--r--   1 root root 4 2008-05-25 10:09 /var/run/NetworkManager.pid

So, cleanfs doesn't remove the stale file. I don't know the correct solution to this, but a couple things I think might help cleanfs:

  1. Don't use /proc as the marker. Since mounting this (and /sys and /dev) are the very first things, it's very likely the system clock is not accurate. Maybe /etc/mtab is a good file to use as the marker since we know we've just run checkfs and mountfs.
  1. Don't use any time marker, i.e., drop the ! -newer in the find command. This would make the assumption that nothing useful would be put in /var/run before cleanfs has been run. FWIW, on fedora they do this.

comment:6 by gerard@…, 16 years ago

What was the rationale behind using a marker of any kind? In a previous version of cleanfs it was assumed that when it ran, it would be safe to clean out /var/run (and other directories). I don't remember the reasons for implementing this change.

If keeping such a marker is desirable we could move 'setclock' up earlier in the sequence - run it as early as possible. Likely after modules and udev.

in reply to:  6 comment:7 by dnicholson@…, 16 years ago

Replying to gerard@linuxfromscratch.org:

What was the rationale behind using a marker of any kind? In a previous version of cleanfs it was assumed that when it ran, it would be safe to clean out /var/run (and other directories). I don't remember the reasons for implementing this change.

Well, I wasn't around at the time, but I imagine there are two reasons:

  1. If you accidentally run cleanfs at some later time after the initial boot, it would prevent you from blowing away /var/run and /var/lock.
  1. In case any of the early bootscripts want to write to a file (or open a socket) in /var/run, then cleanfs would notice that it was a file created since boot, instead of a stale file from a previous boot.

I think 1. is nice, but not a critical feature I'd miss. I think you can drop 2. if you assume that cleanfs is run right after mountfs. Since /var/run may not be mounted until mountfs anyway, there shouldn't be any new files to care about.

If keeping such a marker is desirable we could move 'setclock' up earlier in the sequence - run it as early as possible. Likely after modules and udev.

I think moving clock up to right after the devices are setup is a good thing regardless. The less things we have happening before the system clock is set, the better.

comment:8 by gerard@…, 16 years ago

Moving the clock up in the sequence also makes the boot logging more useful if the times are actually correct and not potentially hours off. I'll move this to lfs-dev for discussion.

comment:9 by bdubbs@…, 16 years ago

Milestone: 7.06.4

comment:10 by DJ Lucas, 16 years ago

Owner: changed from lfs-book@… to DJ Lucas
Status: newassigned

comment:11 by DJ Lucas, 16 years ago

Resolution: fixed
Status: assignedclosed

Fixed in r8701. Holding bootscripts release for #2189.

Note: See TracTickets for help on using tickets.