Opened 17 years ago
Closed 16 years ago
#2160 closed defect (fixed)
Bootscript functions handle stale pid files poorly
Reported by: | Owned by: | DJ Lucas | |
---|---|---|---|
Priority: | normal | Milestone: | 6.4 |
Component: | Bootscripts | Version: | SVN |
Severity: | critical | Keywords: | |
Cc: |
Description
When the bootscript specifies a pid file to use with the "-p pidfile" argument, *proc functions currently bail out when the referenced file contains an invalid pid. This causes big problems on startup when loadproc returns successfully without actually starting the service.
This might sound obscure, but it usually hurts me after I've had to do a hard boot (they happen) and all my pid files are stale. Then most services (my scripts always use -p when a pidfile is available) fails to start and I'm left with garbage. The only way to fix it is to manually hunt down and remove the stale pid files.
See BLFS ticket #2408 (http://wiki.linuxfromscratch.org/blfs/ticket/2408) for an example.
I'm attaching a patch which changes loadproc, killproc and reloadproc to warn and remove the stale pid files when they are encountered. loadproc then continues instead of bailing out. I ran some testcases with the patch, but I haven't applied the changes to my system yet.
Attachments (1)
Change History (12)
by , 17 years ago
Attachment: | stale-pidfiles.patch added |
---|
follow-up: 4 comment:1 by , 17 years ago
I'm a bit surprised that the existing cleanfs file didn't remove those stale .pid files, actually. Or weren't they in /var/run?
Or is S50 too late to clean up the .pid files that you use?
comment:2 by , 17 years ago
One way to help, but not necessarily fix everything would be to add:
rm /var/run/*.pid
to the /etc/rc.d/init.d/cleanfs start procedure.
(Written and posted at the same time as the above)
comment:3 by , 17 years ago
That's not correct, nor was my simple patch. Dan's patch is complete. You need to be warned if a program did not exit correctly and then clean up the mess. I'll have to make similar modification to mine.
comment:4 by , 17 years ago
Replying to Bryan Kadzban:
I'm a bit surprised that the existing cleanfs file didn't remove those stale .pid files, actually. Or weren't they in /var/run?
Or is S50 too late to clean up the .pid files that you use?
I hadn't actually thought of that at all, and now I'm surprised, too. They are in /var/run, and looking at cleanfs, I don't have any idea why they wouldn't be cleaned up.
I'll try to check it out the next time the system doesn't shut down cleanly, but I would think we don't want to rely on cleanfs for this.
comment:5 by , 17 years ago
I think I figured out the reason why cleanfs isn't doing the right thing for me on some occasions. My TZ is UTC-8 (PDT) and the hwclock is stored in localtime. When mountkernfs runs, setclock has not been run yet. So, /proc and /sys both have modification times 8 hours (7 w/DST) prior to the current time. cleanfs checks for files in /var/run that are older than /proc. In this case, the faulty file is within that 7 hour window because I'd rebooted this morning, too:
$ ls -ld /proc /sys /var/run/NetworkManager.pid dr-xr-xr-x 127 root root 0 2008-05-25 08:22 /proc drwxr-xr-x 11 root root 0 2008-05-25 08:22 /sys -rw-r--r-- 1 root root 4 2008-05-25 10:09 /var/run/NetworkManager.pid
So, cleanfs doesn't remove the stale file. I don't know the correct solution to this, but a couple things I think might help cleanfs:
- Don't use /proc as the marker. Since mounting this (and /sys and /dev) are the very first things, it's very likely the system clock is not accurate. Maybe /etc/mtab is a good file to use as the marker since we know we've just run checkfs and mountfs.
- Don't use any time marker, i.e., drop the ! -newer in the find command. This would make the assumption that nothing useful would be put in /var/run before cleanfs has been run. FWIW, on fedora they do this.
follow-up: 7 comment:6 by , 17 years ago
What was the rationale behind using a marker of any kind? In a previous version of cleanfs it was assumed that when it ran, it would be safe to clean out /var/run (and other directories). I don't remember the reasons for implementing this change.
If keeping such a marker is desirable we could move 'setclock' up earlier in the sequence - run it as early as possible. Likely after modules and udev.
comment:7 by , 17 years ago
Replying to gerard@linuxfromscratch.org:
What was the rationale behind using a marker of any kind? In a previous version of cleanfs it was assumed that when it ran, it would be safe to clean out /var/run (and other directories). I don't remember the reasons for implementing this change.
Well, I wasn't around at the time, but I imagine there are two reasons:
- If you accidentally run cleanfs at some later time after the initial boot, it would prevent you from blowing away /var/run and /var/lock.
- In case any of the early bootscripts want to write to a file (or open a socket) in /var/run, then cleanfs would notice that it was a file created since boot, instead of a stale file from a previous boot.
I think 1. is nice, but not a critical feature I'd miss. I think you can drop 2. if you assume that cleanfs is run right after mountfs. Since /var/run may not be mounted until mountfs anyway, there shouldn't be any new files to care about.
If keeping such a marker is desirable we could move 'setclock' up earlier in the sequence - run it as early as possible. Likely after modules and udev.
I think moving clock up to right after the devices are setup is a good thing regardless. The less things we have happening before the system clock is set, the better.
comment:8 by , 17 years ago
Moving the clock up in the sequence also makes the boot logging more useful if the times are actually correct and not potentially hours off. I'll move this to lfs-dev for discussion.
comment:9 by , 16 years ago
Milestone: | 7.0 → 6.4 |
---|
comment:10 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:11 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Remove stale pid files when encountered