Opened 13 years ago

Closed 13 years ago

#2820 closed defect (fixed)

glibc issues with --enable-kernel=2.6.22.5

Reported by: bigorneault Owned by: bdubbs@…
Priority: normal Milestone: 6.8
Component: Book Version: SVN
Severity: normal Keywords:
Cc:

Description (last modified by bdubbs@…)

When building glibc 2.12.1 and 2.12.2 on x86_64 linux with --enable-kernel=2.6.22.5, the following tests fail:

make[2]: *** [/build/glibc-build/nptl/tst-rwlock6.out] Error 1
make[2]: *** [/build/glibc-build/nptl/tst-rwlock7.out] Error 1
make[2]: *** [/build/glibc-build/nptl/tst-rwlock9.out] Error 1
make[2]: *** [/build/glibc-build/nptl/tst-rwlock11.out] Error 1
make[2]: *** [/build/glibc-build/nptl/tst-rwlock12.out] Error 11
make[2]: *** [/build/glibc-build/nptl/tst-rwlock14.out] Error 1
make[2]: *** [/build/glibc-build/nptl/tst-abstime.out] Error 1

This issues does not occur on i686. If I build glibc with --enable-kernel=2.6.29 or later, all tests pass.

Attachments (1)

glibc-2.12.2_private_futexes.patch (623 bytes ) - added by bigorneault 13 years ago.
kill ASSUME_PRIVATE_FUTEX

Download all attachments as: .zip

Change History (16)

comment:1 by bigorneault, 13 years ago

killing the ASSUME_PRIVATE_FUTEX define seems to fix the problem. See the patch.

by bigorneault, 13 years ago

kill ASSUME_PRIVATE_FUTEX

comment:2 by Bryan Kadzban, 13 years ago

Can you tell what the ASSUME_PRIVATE_FUTEX define is changing? I assume that since you say it works with 2.6.29, there's some other defined ASSUME_* symbol that affects the generated code... Can you tell why the tests are failing? (The logs should be helpful.)

Does it work to use 2.6.22 instead of 2.6.22.5? (Why are we using 2.6.22.5 anyway? That's not functionally any different from 2.6.22, it just has some bugfixes.)

comment:3 by bdubbs@…, 13 years ago

Description: modified (diff)

I'm just guessing, but I suspect the failed tests depend on the kernel configuration.

Symbol: FUTEX [=y]                                                                             
 Prompt: Enable futex support                                                                   
   Defined at init/Kconfig:903                                                                  
     Depends on: EMBEDDED [=y]                                                                    
     Location:                                                                                    
       -> General setup                                                                           
         -> Configure standard kernel features (for small systems) (EMBEDDED [=y])                
     Selects: RT_MUTEXES [=y]

The 2.6.22.5 version is what we specify as the minimum.

comment:4 by bigorneault, 13 years ago

Support for private futexes was added in 2.6.22, while support for the FUTEX_CLOCK_REALTIME flag was added in 2.6.29.

To confirm that this is an issue with the futex support, I built glibc with --enable-kernel=2.6.22.5 and adjusted manually the following #define:

1 - 2.6.22 to 2.6.28: tests fails.

#define ASSUME_PRIVATE_FUTEX 1

#undef ASSUME_FUTEX_CLOCK_REALTIME

2 - anything before 2.6.22: All tests pass.

#undef ASSUME_PRIVATE_FUTEX

#undef ASSUME_FUTEX_CLOCK_REALTIME

3 - anything later that 2.6.29: All tests pass.

#define ASSUME_PRIVATE_FUTEX 1

#define ASSUME_FUTEX_CLOCK_REALTIME 1

Maybe removing the futex from the kernel configuration could also fix this issue (need to be tested).

comment:5 by bigorneault, 13 years ago

According to http://cateee.net/lkddb/web-lkddb/FUTEX.html

"Disabling this option will cause the kernel to be built without support for "fast userspace mutexes". The resulting kernel may not run glibc-based applications correctly."

So setting CONFIG_FUTEX=n does not looks a good idea.

in reply to:  4 comment:6 by bdubbs@…, 13 years ago

Replying to bigorneault:

Support for private futexes was added in 2.6.22, while support for the FUTEX_CLOCK_REALTIME flag was added in 2.6.29.

To confirm that this is an issue with the futex support, I built glibc with --enable-kernel=2.6.22.5 and adjusted manually the following #define:

3 - anything later that 2.6.29: All tests pass.

   #define __ASSUME_PRIVATE_FUTEX    1
   #define __ASSUME_FUTEX_CLOCK_REALTIME   1

I built glibc-2.12.2 on an LFS-6.7 system using:

../glibc-2.12.2/configure --prefix=/usr --disable-profile \
--enable-add-ons --enable-kernel=2.6.30 --libexecdir=/usr/lib/glibc
make
cp -v ../glibc-2.12.2/iconvdata/gconv-modules iconvdat
make -k check 2>&1 | tee glibc-check-log

The errors definitely changed, but I still had errors:

[/sources/glibc-build/posix/annexc.out] Error 1 (ignored)
[/sources/glibc-build/nptl/tst-attr3.out] Error 1
[/sources/glibc-build/debug/tst-chk3.out] Error 1
[/sources/glibc-build/debug/tst-lfschk3.out] Error 1
[/sources/glibc-build/debug/tst-chk6.out] Error 1
[/sources/glibc-build/debug/tst-lfschk6.out] Error 1
[/sources/glibc-build/c++-types-check.out] Error 1

I'll have to rebuild a SVN system and recheck, but I don't ever recall seeing any of these errors before.

debug/tst-lfschk3.c is only

#define _FILE_OFFSET_BITS 64
#include "tst-chk3.c"

Same for tst-lfschk6.cc

Checking debug/tst-lfschk3.c, it is

#define _FORTIFY_SOURCE 2
#include "tst-chk1.c"

tst-chk1 is doing some printf testing. Something about the format string being writable when it should not be.

comment:7 by Bryan Kadzban, 13 years ago

bigorneault or bdubbs -- What was the test's output when it failed, versus when it succeeded? Say, for nptl/tst-rwlock6 (the first one that failed)? I'm curious where it's failing, and don't really have the time to build a copy of chapter 5 to try to reproduce myself at the moment. Unfortunately. :-)

in reply to:  7 comment:8 by bdubbs@…, 13 years ago

Replying to bryan@…:

bigorneault or bdubbs -- What was the test's output when it failed, versus when it succeeded? Say, for nptl/tst-rwlock6 (the first one that failed)?

Sorry, that's in binutils-build and I deleted that. I've seen these before. See #2683.

In my message to -dev on 6/27/10, I said they were seg faults.

comment:9 by bigorneault, 13 years ago

As udev >= 147 requires a minimum kernel version of 2.6.27 and is not tested with versions older than 2.6.31 (http://www.spinics.net/lists/hotplug/msg04507.html), I am going to suggest to follow Fedora 14 which moved from 2.6.18 to 2.6.32.

comment:10 by Gilles Espinasse, 13 years ago

Don't confuse requirement on the host (building machine) and for the target (the code you will run). You perfectly could build from debian-v5 (lenny) which has a 2.6.26 kernel and an old udev. The new compiled udev will only run when you boot the new kernel. You will not be able to build on a machine/kernel with --enable-kernel set to higher number than the running kernel.

RH-5/Centos-5 has a default 2.6.18 kernel. I just tested my build system on a Centos-5.5 machine with se-linux enabled and just had a few binutils tests errors more than in more recent distribs.

2.6.18, 2.6.26 or 2.6.32 have sense. It all depend how much LFS book enforce to use at least a /not too old/mostly recent/recent/in the edge/ kernel.

Fedora is know to be in the edge and is renewed every six months.

In my mind, 2.6.32 would be qualified as mostly recent but how many linux machines already run that kernel? A year ago, the linux world probably had far less than 1% of machines with that kernel (that was 40 days after that version release).

in reply to:  10 comment:11 by bdubbs@…, 13 years ago

Replying to gespinasse:

In my mind, 2.6.32 would be qualified as mostly recent but how many linux machines already run that kernel? A year ago, the linux world probably had far less than 1% of machines with that kernel (that was 40 days after that version release).

I agree. My wife's system runs 2.6.20 and there is really no need to upgrade. At work, we are running CentOS 2.6.18-194.26.1.el5. Quantum is running 2.6.18, anduin 2.6.27.4, and one other system I administer 2.6.9-89.0.16.EL.

The nice thing about Linux is that you are the only one to decide that you need an upgrade. LFS is pretty much on the leading edge, but once it's built, there is very little external pressure to upgrade.

comment:12 by bigorneault, 13 years ago

The best would still be to avoid the problem by applying my patch. I am running with this patch since 8 days and there is no problem.

comment:13 by Bryan Kadzban, 13 years ago

That patch will re-enable several workarounds for kernels that aren't supported, slowing down glibc in the process. (Probably not noticeable, but who knows.) It also won't fix the "logical bug" (that the stack is imbalanced at retq time due to mismatched #ifdef checks on entry and exit), only work around it by forcing both symbols to be the same.

Last, if we *do* change --enable-kernel at a later date for some other reason, the same code will break in the opposite direction if we just yank ASSUME_PRIVATE_FUTEX now: more data will be popped off the stack at function exit than was pushed on at function entry. Fixing the #ifdef checks in the two broken files will fix all of these.

On -dev, I think we've probably decided on using a sed to do this:

http://linuxfromscratch.org/pipermail/lfs-dev/2011-January/064523.html

Something like that should be committed soon, I think.

(Sorry, the discussion moved there, and I don't think anyone updated this bug at the time. Should have done that...)

comment:14 by bdubbs@…, 13 years ago

Owner: changed from lfs-book@… to bdubbs@…
Status: newassigned

I'm running one more full build to test the sed. If all goes well, I'll commit tomorrow.

comment:15 by bdubbs@…, 13 years ago

Resolution: fixed
Status: assignedclosed

Fixed in revision 9452.

Note: See TracTickets for help on using tickets.