Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#8759 closed enhancement (wontfix)

Fix clocksource for ntp

Reported by: ken@… Owned by: blfs-book@…
Priority: normal Milestone: 8.0
Component: BOOK Version: SVN
Severity: normal Keywords:
Cc:

Description

One one of my machines I had to manipulate the system date/time to ensure that PDFs of different versions of a TTF font (produced by 'fret' which includes the date and time in the heading) matched. That caused me to looked at my syslog, and I discovered that not everything was rosy.

I saw several messages like

frequency error 1726 PPM exceeds tolerance 500 PPM

although these had not previously been apparent, and something in the current kernel (after a lot of software suspends and wakens) maybe dropped the ball. However, the box was no longer synchronising.

This is apparently fixable by:

  1. Check the available clock sources in cat /sys/devices/system/clocksource/clocksource0/available_clocksource
  2. The default is now tsc, it used (years ago!) to be acpi_pm. Assuming

acpi_pm is available, echo that to /sys/devices/system/clocksource/clocksource0/current_clocksource

  1. If things improve, add that as a boot option in grub, i.e.

clocksource=acpi_pm

At the moment I've been running with acpi_pm for a bit over a day, but I haven't rebooted to force it from grub.

Change History (9)

comment:1 by ken@…, 7 years ago

Also, thinking back to Roger Koehler's problem at http://lists.linuxfromscratch.org/pipermail/blfs-support/2016-December/078699.html (his system time on recent kernels is apparently binary zero), changing

ntpd -q

to

ntpd -gq

(to allow an excessive initial adjustment) would have perhaps helped him (if the -g was explained, and this could be suggested as an initial step if the system date is way off).

comment:2 by bdubbs@…, 7 years ago

Interesting issue. On my development system I have

available: tsc hpet acpi_pm
source: tsc

On my workstation:

available: hpet acpi_pm source: hpet

I have no idea why the difference, but ntp syncs to -0.752 and -5.593 ms respectively.

Last edited 7 years ago by bdubbs@… (previous) (diff)

comment:3 by bdubbs@…, 7 years ago

This URL may be helpful:

https://software.intel.com/en-us/blogs/2013/06/20/eliminate-the-dreaded-clocksource-is-unstable-message-switch-to-tsc-for-a-stable

I'm inclined to mark this ticket as wontfix as it seems to be specific to some HW failure.

comment:4 by ken@…, 7 years ago

That link appears to be very-specific to a particular high-end CPU. In my particular case I suspect that the specific kernel version (4.9.0) and repeated swsuspend (s2ram) cycles have upset the kernel.

BOTH items in my suggestion came from what for me is the top-ranked result on google for 'ppm exceeds tolerance 500 ppm', https://access.redhat.com/solutions/35640 which was updated in April last year.

comment:5 by bdubbs@…, 7 years ago

Could you update your kernel to 4.9.3 and see if you get a recurrence?

in reply to:  5 comment:6 by ken@…, 7 years ago

Replying to bdubbs@…:

Could you update your kernel to 4.9.3 and see if you get a recurrence?

Not for a day or three. Also, the box had been sleeping at least once every day for at least a week before I noticed the problem.

comment:7 by ken@…, 7 years ago

I had to close most of my desktop applications so that I could keep bouncing Xorg to look at fontconfig options, so I've taken the opportunity to build and boot 4.9.4 using the default tsc clocksource. About to do the first swsuspend (this is on my haswell).

comment:8 by ken@…, 7 years ago

Resolution: wontfix
Status: newclosed

So far, the problem has failed to recur. I'll put a few details in the wiki. Closing as wontfix.

comment:9 by ken@…, 7 years ago

For the little it is worth, an update to this. After assuming the time was again accurate, I happened to notice it was 5 minutes slow. The cause appears to be frequent suspend to RAM. I tried various workarounds, e.g. reset it daily with ntpd -gq, retry the other clocksource. Eventually I noticed that after waking the box, it could have large offsets (several thousand milliseconds), but it did eventually sync. My thoughts now are that I had left it sleeping for several days and that caused the offset to be too great to sync.

I've added a script into /usr/lib/pm-utils/sleep.d/48ntpd to stop ntpd when going to sleep, and when waking to run ntpd -gq and then start it normally. That seems to keep it within less than a second (checking with /usr/sbin/ntpq -p). I'll add that to the wiki under ntpd.

Note: See TracTickets for help on using tickets.