Opened 12 months ago

Closed 11 months ago

Last modified 9 months ago

#18051 closed defect (fixed)

Extreme screen flickering on Intel GPUs with WebKit-2.40 in GTK-4 mode

Reported by: Douglas R. Reno Owned by: Douglas R. Reno
Priority: normal Milestone: 12.0
Component: BOOK Version: git
Severity: normal Keywords:
Cc:

Description

Filing a report here to help me track my progress on fixing this problem, since it's holding me up from getting GNOME done.

There is currently an issue where GTK-4 based applications which use WebKit, including it's own MiniBrowser, and Epiphany. It makes the window unusable and dragging the window around the screen will cause the application to hang.

No console output is present, and no messages are found in 'dmesg'. At this point, I think it's either an issue in Mesa or an issue in WebKit. There is a chance, albeit low, that it's related to GTK-4 only. Note that the GTK-3 version works perfectly and has no issues.

The original report was from Stephan Berman, see https://lists.linuxfromscratch.org/sympa/arc/blfs-dev/2023-04/msg00097.html

I'm going to try the new intel-media-driver next to see if that resolves anything, and then test this on Wayland. I'll also grab a screen recording. That'll give me more information to submit to upstream.

Note that this only seems to affect GPUs from 6th-9th gen Intel CPUs. It does not affect i965/crocus. It's most likely related to the 'iris' driver if Mesa is at fault.

Change History (45)

comment:1 by Douglas R. Reno, 12 months ago

Owner: changed from blfs-book to Douglas R. Reno
Status: newassigned
Type: enhancementdefect

comment:2 by Xi Ruoyao, 12 months ago

Hmm, if it may be related to intel-media-driver how about just uninstall it (move /usr/lib/dri/iHD_video_dri.so out) and try?

I've not installed intel-media-driver on my current system yet because it FTBFS with GCC 13, and the upstream is still working for a proper fix (some -Werror cases indicate real bugs in the code).

comment:3 by Xi Ruoyao, 12 months ago

Or if it's Mesa issue maybe it's worthy to try Mesa 23.1.0?

comment:4 by Douglas R. Reno, 12 months ago

I'll try the new Mesa shortly. Moving intel-media-driver out (as well as updating it) has no effect

Note that epiphany gives:

renodr [ /sources ]$ epiphany

(epiphany:20206): Gtk-WARNING **: 11:12:40.991: Allocating size to EphyWindow 0x55dac1add470 without calling gtk_widget_measure(). How does the code know the size to allocate?
**
Gsk:ERROR:../gsk/gl/gskgldriver.c:682:gsk_gl_driver_cache_texture: assertion failed: (texture_id > 0)
Bail out! Gsk:ERROR:../gsk/gl/gskgldriver.c:682:gsk_gl_driver_cache_texture: assertion failed: (texture_id > 0)

comment:5 by Douglas R. Reno, 12 months ago

MiniBrowser does not have that console output though, and is still broken as well

in reply to:  4 comment:6 by Xi Ruoyao, 12 months ago

Replying to Douglas R. Reno:

I'll try the new Mesa shortly. Moving intel-media-driver out (as well as updating it) has no effect

Note that epiphany gives:

renodr [ /sources ]$ epiphany

(epiphany:20206): Gtk-WARNING **: 11:12:40.991: Allocating size to EphyWindow 0x55dac1add470 without calling gtk_widget_measure(). How does the code know the size to allocate?
**
Gsk:ERROR:../gsk/gl/gskgldriver.c:682:gsk_gl_driver_cache_texture: assertion failed: (texture_id > 0)
Bail out! Gsk:ERROR:../gsk/gl/gskgldriver.c:682:gsk_gl_driver_cache_texture: assertion failed: (texture_id > 0)

Hmm, to me it looks like a GTK4 issue... And IIRC Stephan said even gtk4-demo did not work properly.

comment:7 by Douglas R. Reno, 12 months ago

That's definitely still possible. Let's see what happens after updating Mesa on my laptop...

gtk-4 demo works fine for me as well, until you try to resize the window. I think it's unrelated to the screen flickering problem as a result, but I'll run that in another window and check for suspicious output. I can't reproduce either on a system that's not Intel Iris-based though, my NVIDIA and AMDGPU systems both seem to be unaffected

Based off that, I'm wondering if something that GTK-4 is doing is triggering a bug in Mesa's driver for the Iris-family GPUs.

comment:8 by Xi Ruoyao, 12 months ago

I'm wondering why I don't see it (I'm using a Iris driver)... Again my CFLAGS covered up the issue? :(

comment:9 by Douglas R. Reno, 12 months ago

Mesa-23.1.0 honestly seems really broken...

renodr [ /sources ]$ /usr/libexec/webkitgtk-6.0/MiniBrowser 
libEGL warning: failed to get driver name for fd -1

libEGL warning: MESA-LOADER: failed to retrieve device information

libEGL warning: failed to get driver name for fd -1

libva info: VA-API version 1.18.0
libva info: User environment variable requested driver 'iHD'
libva info: Trying to open /usr/lib/dri/iHD_drv_video.so
libva info: va_openDriver() returns -1
libva info: VA-API version 1.18.0
libva info: User environment variable requested driver 'iHD'
libva info: Trying to open /usr/lib/dri/iHD_drv_video.so
libva info: va_openDriver() returns -1
libEGL warning: failed to get driver name for fd -1

libEGL warning: MESA-LOADER: failed to retrieve device information

libEGL warning: failed to get driver name for fd -1

libEGL warning: failed to get driver name for fd -1

libEGL warning: MESA-LOADER: failed to retrieve device information

libEGL warning: failed to get driver name for fd -1

Same results in the window as well

comment:10 by Xi Ruoyao, 12 months ago

Wow, "fd -1" looks like a result from failure to open /dev/dri/card1. Is the permission correct?

With systemd/elogind the permission should be like:

$ getfacl /dev/dri/card1
getfacl: Removing leading '/' from absolute path names
# file: dev/dri/card1
# owner: root
# group: video
user::rw-
user:xry111:rw-
group::rw-
mask::rw-
other::---

Note that the logged in user should have rw permission.

W/o systemd/elogind the user must be in video group to use the GPU.

comment:11 by Douglas R. Reno, 12 months ago

The permissions are correct:

renodr [ /sources/linux-6.2.12 ]$ getfacl /dev/dri/card1
getfacl: Removing leading '/' from absolute path names
# file: dev/dri/card1
# owner: root
# group: video
user::rw-
user:renodr:rw-
group::rw-
mask::rw-
other::---

comment:12 by Xi Ruoyao, 12 months ago

Then I guess we need strace or gdb to see why the "open" call failed and returned -1...

comment:13 by Xi Ruoyao, 12 months ago

Does /dev/dri/renderD128 has a correct permission too? On my system MiniBrowser tries to open it:

#0  __libc_open64 (file=0x555555664d90 "/dev/dri/renderD128", oflag=524290)
    at ../sysdeps/unix/sysv/linux/open64.c:30
#1  0x00007fffedb3bf13 in default_dmabuf_feedback_main_device.lto_priv ()
    at /usr/lib/libEGL.so.1
#2  0x00007fffefee4052 in ffi_call_unix64 () at /usr/lib/libffi.so.8
#3  0x00007fffefedf69d in ffi_call_int.lto_priv () at /usr/lib/libffi.so.8
#4  0x00007fffefee336d in ffi_call () at /usr/lib/libffi.so.8
#5  0x00007ffff0553cfe in wl_closure_invoke.constprop ()
    at /usr/lib/libwayland-client.so.0
#6  0x00007ffff055403e in dispatch_event.isra ()
    at /usr/lib/libwayland-client.so.0
#7  0x00007ffff0558697 in wl_display_dispatch_queue ()
    at /usr/lib/libwayland-client.so.0
#8  0x00007ffff05589bf in wl_display_roundtrip_queue ()
    at /usr/lib/libwayland-client.so.0
#9  0x00007fffedb25030 in dri2_initialize.lto_priv () at /usr/lib/libEGL.so.1
#10 0x00007fffedb19704 in eglInitialize () at /usr/lib/libEGL.so.1
#11 0x00007ffff62865ab in WebCore::PlatformDisplay::initializeEGLDisplay() ()
    at /usr/lib/libwebkitgtk-6.0.so.4
#12 0x00007ffff626f688 in std::once_flag::_Prepare_execution::_Prepare_execution<std::call_once<WebCore::PlatformDisplay::sharedDisplay()::{lambda()#1}>(std::once_flag&, WebCore::PlatformDisplay::sharedDisplay()::{lambda()#1}&&)::{lambda()#1}>(WebCore::PlatformDisplay::sharedDisplay()::{lambda()#1}&)::{lambda()#1}::_FU--Type <RET> for more, q to quit, c to continue without paging--
N() () at /usr/lib/libwebkitgtk-6.0.so.4
#13 0x00007ffff1e9d93c in __pthread_once_slow
    (once_control=0x7ffff7c51b38 <WebCore::PlatformDisplay::sharedDisplay()::onceFlag>, init_routine=0x7ffff02f8060 <std::__once_proxy()>)
    at pthread_once.c:116
#14 0x00007ffff6285d3b in WebCore::PlatformDisplay::sharedDisplay() [clone .localalias] [clone .lto_priv.0] () at /usr/lib/libwebkitgtk-6.0.so.4
#15 0x00007ffff4e03ccb in WebKit::HardwareAccelerationManager::singleton() [clone .part.0] () at /usr/lib/libwebkitgtk-6.0.so.4
#16 0x00007ffff4c91301 in WebKit::WebPreferences::WebPreferences(WTF::String const&, WTF::String const&, WTF::String const&) ()
    at /usr/lib/libwebkitgtk-6.0.so.4
#17 0x00007ffff4d5ce57 in webkit_settings_init(_WebKitSettings*, void*) ()
    at /usr/lib/libwebkitgtk-6.0.so.4
#18 0x00007ffff7cb1951 in g_type_create_instance ()
    at /usr/lib/libgobject-2.0.so.0
#19 0x00007ffff7cb8129 in g_object_new_internal.part.0.constprop ()
    at /usr/lib/libgobject-2.0.so.0
#20 0x00007ffff7cb8625 in g_object_new_with_properties.constprop ()
    at /usr/lib/libgobject-2.0.so.0
#21 0x00007ffff7c8d765 in g_object_new () at /usr/lib/libgobject-2.0.so.0
#22 0x00005555555620e1 in main ()

And on my system its permission is 0666.

comment:14 by Douglas R. Reno, 12 months ago

If you use something like "MESA_LOADER_DRIVER_OVERRIDE= /usr/libexec/webkitgtk-6.0/MiniBrowser", you'll get output about a missing driver (there isn't a blank driver LOL), but it'll fallback to swrast. When falling back to swrast, the window flickering does not happen. If you use MESA_LOADER_DRIVER_OVERRIDE=iris, it does the window flickering.

comment:15 by Xi Ruoyao, 12 months ago

There is https://gitlab.freedesktop.org/mesa/mesa/-/issues/8194, but its fix is already in 23.1.0...

I guess it's enough to open another issue in Mesa.

comment:16 by Douglas R. Reno, 12 months ago

I've tried the latest Cairo (1.17.8) and the development version of GTK-4 and saw no difference. I've also tried downgrading to the versions of Mesa and GTK-4 that we shipped BLFS 11.3 with (22.3.5 and 4.8.3 respectively)

On a whim, I'm going to try WebKitGTK-2.41.3, the GTK-4 version only of course since it's all that we really need to test this. There have been some changes with GBM and direct GPU access in WebKit recently.

comment:17 by Douglas R. Reno, 12 months ago

Using the development version of WebKit seems to be working properly on this system.

comment:19 by Xi Ruoyao, 12 months ago

Maybe we can try the latest 2.40 branch (https://github.com/WebKit/WebKit/commits/webkitglib/2.40) which contains a bunch of backported patches on top of 2.40.1.

comment:20 by Douglas R. Reno, 12 months ago

That sounds good to me, just started a build up of that from a git checkout. We shall see in the morning.

comment:21 by Douglas R. Reno, 12 months ago

Fresh git checkout built and runs fine from the webkitglib/2.40 branch! Now the question is... what commit fixes it?

I've been thinking about maybe the two regarding painting. I'll try to backport those in a bit, have some stuff on my dev system that I want to do

comment:23 by Bruce Dubbs, 12 months ago

It it's fixed, either wait for a release or upload a working version to anduin. There is no good reason to spend time trying to find exactly what commit broke/fixed the package.

comment:24 by Douglas R. Reno, 12 months ago

Milestone: 11.4hold

Ok. Guess we'll wait then until upstream puts out a new release, because I really do not want to have to take a huge diff between the current webkitglib/2.40 branch and 2.40.1.

Moving to hold.

comment:25 by Douglas R. Reno, 12 months ago

Updated SA-11.3-022 to note that it's not recommended to update your system to this version of WebKit until we have a new version, if you have an Intel GPU between that family of GPUs.

While this may not impact the GTK-3 version in exactly the same way, it does still have instability problems with painting which are fixed by the stable backports branch as well.

in reply to:  25 comment:26 by Xi Ruoyao, 12 months ago

Replying to Douglas R. Reno:

Updated SA-11.3-022 to note that it's not recommended to update your system to this version of WebKit until we have a new version, if you have an Intel GPU between that family of GPUs.

While this may not impact the GTK-3 version in exactly the same way, it does still have instability problems with painting which are fixed by the stable backports branch as well.

Ah, I guess I've seen the instability with both GTK3 and GTK4 viewing some web pages...

comment:27 by Douglas R. Reno, 11 months ago

Milestone: hold11.4

Moved back now that WebKitGTK+-2.40.2 is available (#18099)

comment:28 by Douglas R. Reno, 11 months ago

Now that the update has been dropped in, I will verify this is fixed tomorrow and close it :)

comment:29 by Douglas R. Reno, 11 months ago

Milestone: 11.4hold

... and it doesn't work, though it does invoke libva now so that is a step in the correct direction. I'll report back to upstream and move this over to hold, and will focus on GNOME instead.

comment:30 by Douglas R. Reno, 11 months ago

Screen recording of the issue can be found at https://linuxfromscratch.org/~renodr/webkit-issue-1.mp4

comment:31 by Douglas R. Reno, 11 months ago

Milestone: hold11.4

in reply to:  18 ; comment:32 by Xi Ruoyao, 11 months ago

in reply to:  32 comment:33 by Douglas R. Reno, 11 months ago

Replying to Xi Ruoyao:

Replying to Douglas R. Reno:

https://bugs.webkit.org/show_bug.cgi?id=256802 :)

Should we reopen it?

Just reopened it with the screen recording and some new information

comment:34 by Douglas R. Reno, 11 months ago

A new version of WebKit's unstable branch came out this morning - 2.41.5. Because of the severity of the problem, we will update to 2.41.5 temporarily until the maintainers of WebKit handle the bug in the stable branch, at which point we can revert back

The 2.41.5 release does have the most current security patches in it, so the two patches that we would've needed to backport have been applied.

comment:35 by Douglas R. Reno, 11 months ago

2.41.5 adds a dependency on libjxl (which is used in ImageMagick and KDE Frameworks). We can turn that off with -DUSE_JPEGXL=OFF, though we might want to consider adding the package in the future.

libjxl requires three packages which aren't in the book currently - gperftools (optional for libmypaint), OpenEXR (optional for gegl, opencv, ImageMagick, KF5, kio-extras, gst-plugins-bad, and GIMP), and Highway (not used by anything in the book). OpenEXR needs a package called imath. That brings a total of four new packages for this capability, though none of them are specific to WebKit and can be used in KF5, gst-plugins-bad, GIMP, kio-extras, OpenCV, ImageMagick, and GEGL.

For now I'm leaning towards passing -DUSE_JPEGXL=OFF to get this problem resolved, but if desired I can add the packages.

comment:36 by Bruce Dubbs, 11 months ago

Lets get the current problem fixed. We can consider enhancements later. What does libjxl provide?

comment:37 by Douglas R. Reno, 11 months ago

Sounds good to me. :)

libjxl provides support for JPEG-XL, which is a new version of the JPEG standard that supports extremely high compression with much better image quality. The official page at https://jpeg.org/jpegxl/ has some more information on it, but it's especially useful in web contexts

comment:38 by Xi Ruoyao, 11 months ago

I'm wondering if the issue is reproducible on other distros as well. If not maybe we are doing something wrong.

comment:39 by Douglas R. Reno, 11 months ago

Resolution: fixed
Status: assignedclosed

Fixed at f6e07a25824e74ec22493b7c1d61223cc3aceb9e

Will continue monitoring upstream.

comment:40 by Xi Ruoyao, 11 months ago

Well, now I'm getting a crash playing videos :(.

in reply to:  40 comment:41 by Xi Ruoyao, 11 months ago

Replying to Xi Ruoyao:

Well, now I'm getting a crash playing videos :(.

If I remove the custom user-agent setting the crash is gone, but then I cannot tell Bilibili to use AV1 instead of H264.

comment:42 by Xi Ruoyao, 11 months ago

Hmm, it looks like the libdav1d gstreamer plugin does not work well with DMABuf for some reason.

in reply to:  42 comment:43 by Xi Ruoyao, 11 months ago

Replying to Xi Ruoyao:

Hmm, it looks like the libdav1d gstreamer plugin does not work well with DMABuf for some reason.

Reported as https://bugs.webkit.org/show_bug.cgi?id=258200. "My own issue" anyway (dav1d is not a BLFS package), but I'm quite unhappy for "fix others' issue with my regression".

comment:45 by Bruce Dubbs, 9 months ago

Milestone: 11.412.0

Milestone renamed

Note: See TracTickets for help on using tickets.