Opened 2 years ago

Closed 2 years ago

#16033 closed defect (fixed)

Fix nouveau multithreading issues causing crashes and kernel panics

Reported by: Douglas R. Reno Owned by: Douglas R. Reno
Priority: normal Milestone: 11.1
Component: BOOK Version: git
Severity: critical Keywords:
Cc:

Description

As those in IRC will remember, I began encountering issues on my development machine with a NVIDIA GPU installed. Today I made some major developments in figuring out what is happening here, but before I discuss that, let's get into some context.

This issue has exhibited itself in two different ways.

1: Running gnome-maps in a Wayland session of GNOME instantly crashes gnome-maps (before the window can show), and in 2 runs out of 10, immediately causes a kernel panic.

2: Executing 'startplasma-wayland' to start a Wayland session of Plasma causes a GPU hang due to an issue in Mesa/the kernel. This issue can exhibit itself in several different fashions. In one case, I had a start menu that would pop up and immediately close, which would then shift the taskbar icons to the far left. After that, my keyboard and mouse would stop working. In another attempt, the system immediately started blinking the Num Lock and Scroll Lock LEDs after the KDE initialization screen comes up. In yet another attempt (trying to isolate the issue), plasmashell immediately core dumped and then the kernel panicked again.

The coredumpctl gdb output with 'bt full' for gnome-maps is:

#0  0x00007f5059c22c40 in dri2_query_image () from /usr/lib/dri/nouveau_dri.so
[Current thread is 1 (Thread 0x7f50e0e459c0 (LWP 172837))]
(gdb) bt full
#0  0x00007f5059c22c40 in dri2_query_image () at /usr/lib/dri/nouveau_dri.so
#1  0x00007f50e14694eb in create_wl_buffer () at /usr/lib/libEGL.so.1
#2  0x00007f50e146a12b in dri2_wl_swap_buffers_with_damage () at /usr/lib/libEGL.so.1
#3  0x00007f50e145b40e in dri2_swap_buffers () at /usr/lib/libEGL.so.1
#4  0x00007f50e144feda in eglSwapBuffers () at /usr/lib/libEGL.so.1
#5  0x00007f50a78b91cd in _cogl_winsys_onscreen_swap_buffers_with_damage (onscreen=0x3911050, rectangles=0x7fff01747e60, n_rectangles=0)
    at winsys/cogl-winsys-egl.c:848
        context = <optimized out>
        renderer = <optimized out>
        egl_renderer = 0x2e60940
        egl_onscreen = 0x2f20c80
#6  0x00007f50a78a35a6 in cogl_onscreen_swap_buffers_with_damage
    (onscreen=0x3911050, rectangles=rectangles@entry=0x7fff01747e60, n_rectangles=n_rectangles@entry=0) at cogl-onscreen.c:319
        framebuffer = 0x3911050
        winsys = <optimized out>
        info = <optimized out>
        __func__ = "cogl_onscreen_swap_buffers_with_damage"
#7  0x00007f50a793f371 in clutter_stage_cogl_redraw (stage_window=0x33b04c0) at cogl/clutter-stage-cogl.c:639
        stage_cogl = 0x33b04c0 [ClutterStageGdk]
        geom = {x = 0, y = 0, width = 820, height = 652}
        have_clip = <optimized out>
        may_use_clipped_redraw = <optimized out>
        use_clipped_redraw = <optimized out>
        can_blit_sub_buffer = <optimized out>
        has_buffer_age = <optimized out>
        wrapper = <optimized out>
        clip_region = <optimized out>
        damage = {0, 0, 0, 0}
        ndamage = 0
        force_swap = <optimized out>
        window_scale = <optimized out>
#8  0x00007f50a794280b in clutter_stage_gdk_redraw (stage_window=0x33b04c0) at gdk/clutter-stage-gdk.c:675
        stage_gdk = 0x33b04c0 [ClutterStageGdk]
        clock = 0x36bc120 [GdkFrameClockIdle]
#9  0x00007f50a79a99d4 in clutter_stage_do_redraw (stage=0x34faaf0 [ClutterStage]) at clutter-stage.c:1130
        backend = <optimized out>
        actor = 0x34faaf0 [ClutterStage]
        priv = 0x34fa4f0
        priv = 0x34fa4f0
#10 _clutter_stage_do_update (stage=stage@entry=0x34faaf0 [ClutterStage]) at clutter-stage.c:1186
        priv = 0x34fa4f0
#11 0x00007f50a7941bb5 in master_clock_update_stage (master_clock=0x35e3900 [ClutterMasterClockGdk], stage=0x34faaf0 [ClutterStage])
    at gdk/clutter-master-clock-gdk.c:249
        stage_updated = 0
        stage = 0x34faaf0 [ClutterStage]
        stages = <optimized out>
        l = 0x2ef9ca0 = {0x34faaf0}
#12 clutter_master_clock_gdk_update (frame_clock=0x36bc120 [GdkFrameClockIdle], master_clock=0x35e3900 [ClutterMasterClockGdk])
    at gdk/clutter-master-clock-gdk.c:306
        stage = 0x34faaf0 [ClutterStage]
        stages = <optimized out>
        l = 0x2ef9ca0 = {0x34faaf0}
#16 0x00007f50e4fc3852 in <emit signal ??? on instance 0x36bc120 [GdkFrameClockIdle]>
    (instance=instance@entry=0x36bc120, signal_id=<optimized out>, detail=detail@entry=0) at ../gobject/gsignal.c:3553
        var_args = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7fff017483d0, reg_save_area = 0x7fff01748310}}
    #13 0x00007f50e4fab4cf in g_closure_invoke
    (closure=0x391d310, return_value=return_value@entry=0x0, n_param_values=1, param_values=param_values@entry=0x7fff01748180, invocation_hint=invocation_hint@entry=0x7fff01748120) at ../gobject/gclosure.c:830
                marshal = 0x7f50e4fad420 <g_cclosure_marshal_VOID__VOID>
                marshal_data = 0x0
                in_marshal = 0
                real_closure = 0x391d2f0
                __func__ = "g_closure_invoke"
    #14 0x00007f50e4fbcfa6 in signal_emit_unlocked_R
    (node=node@entry=0x2d99480, detail=detail@entry=0, instance=instance@entry=0x36bc120, emission_return=emission_return@entry=0x0, instance_and_params=instance_and_params@entry=0x7fff01748180) at ../gobject/gsignal.c:3742
                tmp = <optimized out>
                handler = 0x3917740
                accumulator = 0x0
                emission = 
                  {next = 0x0, instance = 0x36bc120, ihint = {signal_id = 310, detail = 0, run_type = (G_SIGNAL_RUN_FIRST | G_SIGNAL_ACCUMULATOR_FIRST_RUN)}, state = EMISSION_RUN, chain_type = 0x4 [void]}
                class_closure = 0x0
                hlist = <optimized out>
                handler_list = 0x38c04c0
                return_accu = 0x0
                accu = 
                      {g_type = 0x0, data = {{v_int = 0, v_uint = 0, v_long = 0, v_ulong = 0, v_int64 = 0, v_uint64 = 0, v_float = 0, v_double = 0, v_pointer = 0x0}, {v_int = 0, v_uint = 0, v_long = 0, v_ulong = 0, v_int64 = 0, v_uint64 = 0, v_float = 0, v_double = 0, v_pointer = 0x0}}}
                signal_id = 310
                max_sequential_handler_number = 3851
                return_value_altered = 1
    #15 0x00007f50e4fc3329 in g_signal_emit_valist
    (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7fff017482f8) at ../gobject/gsignal.c:3497
                instance_and_params = 0x7fff01748180
                signal_return_type = <optimized out>
                param_values = 0x7fff01748198
                node = <optimized out>
                i = <optimized out>
                n_params = <optimized out>
                __func__ = "g_signal_emit_valist"
#17 0x00007f50ccb1919f in _gdk_frame_clock_emit_paint (frame_clock=frame_clock@entry=0x36bc120 [GdkFrameClockIdle]) at gdkframeclock.c:657
#18 0x00007f50ccb19de2 in gdk_frame_clock_paint_idle (data=0x36bc120) at gdkframeclockidle.c:597
        clock = 0x36bc120 [GdkFrameClockIdle]
        clock_idle = 0x36bc120 [GdkFrameClockIdle]
        priv = 0x36bc020
        skip_to_resume_events = 0
        timings = 0x33f5650
        __func__ = "gdk_frame_clock_paint_idle"
#19 0x00007f50ccb04d69 in gdk_threads_dispatch (data=0x2f21760, data@entry=<error reading variable: value has been optimized out>) at gdk.c:769
        dispatch = 0x2f21760
        ret = 0
#20 0x00007f50e50460a4 in g_timeout_dispatch (source=0x390eb70, callback=<optimized out>, user_data=<optimized out>) at ../glib/gmain.c:4933
        timeout_source = 0x390eb70
        again = <optimized out>
#21 0x00007f50e5045594 in g_main_dispatch (context=0x230c500) at ../glib/gmain.c:3381
        dispatch = 0x7f50e5046090 <g_timeout_dispatch>
        prev_source = 0x0
        begin_time_nsec = 0
        was_in_call = 0
        user_data = 0x2f21760
        callback = 0x7f50ccb04d40 <gdk_threads_dispatch>
        cb_funcs = <optimized out>
        cb_data = <optimized out>
        need_destroy = <optimized out>
        source = 0x390eb70
        current = 0x2318070
        i = 0
        __func__ = "g_main_dispatch"
#22 g_main_context_dispatch (context=0x230c500) at ../glib/gmain.c:4099
#23 0x00007f50e50458f8 in g_main_context_iterate (context=context@entry=0x230c500, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at ../glib/gmain.c:4175
        max_priority = 120
        timeout = 0
        some_ready = 1
        nfds = <optimized out>
        allocated_nfds = <optimized out>
        fds = 0x2dda5e0
#24 0x00007f50e504599f in g_main_context_iteration (context=context@entry=0x230c500, may_block=may_block@entry=1) at ../glib/gmain.c:4240
        retval = <optimized out>
#25 0x00007f50e4e94c4d in g_application_run (application=0x24ca0f0 [Gjs_Application], argc=24413516, argv=<optimized out>) at ../gio/gapplication.c:2569
        arguments = 0x24cce60
        status = 0
        context = 0x230c500
        acquired_context = <optimized out>
        __func__ = "g_application_run"
#26 0x00007f50e494355a in  () at /usr/lib/libffi.so.8
#27 0x00007f50e4942753 in  () at /usr/lib/libffi.so.8
#28 0x00007f50e517d94c in Gjs::Function::invoke(JSContext*, JS::CallArgs const&, JS::Handle<JSObject*>, _GIArgument*)
    (this=0x24ccb70, context=0x2328330, args=..., this_obj=..., r_value=0x0) at ../gi/function.cpp:948
        __PRETTY_FUNCTION__ = "bool Gjs::Function::invoke(JSContext*, const JS::CallArgs&, JS::HandleObject, GIArgument*)"
        return_value_p = 0x7fff01748878
        return_value = 
          {v_boolean = -459184751, v_int8 = -111 '\221', v_uint8 = 145 '\221', v_int16 = 26001, v_uint16 = 26001, v_int32 = -459184751, v_uint32 = 3835782545, v_int64 = 139985409893777, v_uint64 = 139985409893777, v_float = -2.38179554e+22, v_double = 6.9161981947520557e-310, v_short = 26001, v_ushort = 26001, v_int = -459184751, v_uint = 3835782545, v_long = 139985409893777, v_ulong = 139985409893777, v_ssize = 139985409893777, v_size = 139985409893777, v_string = 0x7f50e4a16591 <__GI___libc_free+81> "d\211+H\203\304\030[]\303\017\037D", v_pointer = 0x7f50e4a16591 <__GI___libc_free+81>}
        ffi_argc = 3
        state = 
          {m_in_cvalues = 0x24cc7e0, m_out_cvalues = 0x24cc940, m_inout_original_cvalues = 0x24ccdd0, ignore_release = std::unordered_set with 0 elements, instance_object = {<js::RootedBase<JSObject*, JS::Rooted<JSObject*> >> = {<js::MutableWrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<js::WrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328348, prev = 0x7fff01748a70, ptr = 0x1333e8c8ad00}, return_values = {<JS::Rooted<JS::StackGCVector<JS::Value, js::TempAllocPolicy> >> = {<js::RootedBase<JS::StackGCVector<JS::Value, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JS::Value, js::TempAllocPolicy> > >> = {<js::MutableWrappedPtrOperations<JS::StackGCVector<JS::Value, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JS::Value, js::TempAllocPolicy> > >> = {<js::MutableWrappedPtrOperations<JS::GCVector<JS::Value, 8, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JS::Value, js::TempAllocPolicy> > >> = {<js::WrappedPtrOperations<JS::GCVector<JS::Value, 8, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JS::Value, js::TempAllocPolicy> > >> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328398, prev = 0x7fff017490a8, ptr = {<js::VirtualTraceable> = {_vptr.VirtualTraceable = 0x7f50e5310350 <vtable for js::RootedTraceable<JS::StackGCVector<JS::Value, js::TempAllocPolicy> >+16>}, ptr = {<JS::GCVector<JS::Value, 8, js::TempAllocPolicy>> = {vector = {<js::TempAllocPolicy> = {<js::AllocPolicyBase> = {<No data fields>}, cx_ = 0x2328330}, static kElemIsPod = false, static kMaxInlineBytes = 992, static kInlineCapacity = 8, mBegin = 0x7fff01748810,
mLength = 0, mTail = {<mozilla::Vector<JS::Value, 8, js::TempAllocPolicy>::CapacityAndReserved> = {mCapacity = 8}, mBytes = "0\210t\001\377\177\000\000*\317\026\345P\177\000\000\350ID\002\000\000\000\000؈t\001\377\177\000\000\060\211t\001\377\177\000\000\000s\026\366\037f\242\"\037\212t\001\377\177\000\000`\377\377\377\377\377\377\377"}}}, <No data fields>}}}, <No data fields>}, local_error = {m_ptr = 0x0}, info = 0x2ce6ca0, gi_argc = 2, processed_c_args = 3, failed = false, can_throw_gerror = false, is_method = true}
        ffi_arg_pointers = std::unique_ptr<void *[]> = {get() = {<No data fields>}}
        gi_arg_pos = 2
        ffi_arg_pos = 3
        js_arg_pos = 1
        obj = 
          {<js::RootedBase<JSObject*, JS::Rooted<JSObject*> >> = {<js::MutableWrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<js::WrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328348, prev = 0x7fff017487c0, ptr = 0x1333e8c8ad00}
        dynamicString = "Gjs_Application.method Gio.Application.run"
        label = {m_stack = 0x0}
        errorp = 0x7fff01748850
#29 0x00007f50e517e26d in Gjs::Function::call(JSContext*, unsigned int, JS::Value*) (context=0x2328330, js_argc=1, vp=0x24449e0)
    at ../gi/function.cpp:1090
        js_argv = 
            {<JS::detail::CallArgsBase<JS::detail::IncludeUsedRval>> = {argv_ = 0x24449f0, argc_ = 1, constructing_ = false, ignoresReturnValue_ = false}, <No data fields>}
        callee = 
          {<js::RootedBase<JSObject*, JS::Rooted<JSObject*> >> = {<js::MutableWrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<js::WrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328348, prev = 0x7fff01748de0, ptr = 0x1713db0b82e0}
        priv = 0x24ccb70
#30 0x00007f50e3db18ee in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) () at /usr/lib/libmozjs-78.so
#31 0x00007f50e3da48ec in Interpret(JSContext*, js::RunState&) () at /usr/lib/libmozjs-78.so
#32 0x00007f50e3db10ae in js::RunScript(JSContext*, js::RunState&) () at /usr/lib/libmozjs-78.so
#33 0x00007f50e3db35c5 in js::Execute(JSContext*, JS::Handle<JSScript*>, JS::Handle<JSObject*>, JS::MutableHandle<JS::Value>) ()
    at /usr/lib/libmozjs-78.so
#34 0x00007f50e3ec7b13 in bool EvaluateSourceBuffer<char16_t>(JSContext*, js::ScopeKind, JS::Handle<JSObject*>, JS::ReadOnlyCompileOptions const&, JS::SourceText<char16_t>&, JS::MutableHandle<JS::Value>) () at /usr/lib/libmozjs-78.so
#35 0x00007f50e3ec7c6a in JS::Evaluate(JSContext*, JS::Handle<JS::StackGCVector<JSObject*, js::TempAllocPolicy> >, JS::ReadOnlyCompileOptions const&, JS::SourceText<char16_t>&, JS::MutableHandle<JS::Value>) () at /usr/lib/libmozjs-78.so
#36 0x00007f50e51cd08a in GjsContextPrivate::eval_with_scope(JS::Handle<JSObject*>, char const*, long, char const*, JS::MutableHandle<JS::Value>)
    (this=0x2329060, scope_object=..., script=0x2323880 "#!/usr/bin/gjs\nimports.package.start({ name: \"gnome-maps\",\n", ' ' <repeats 24 times>, "version: \"40.4\",\n", ' ' <repeats 24 times>, "prefix: \"/usr\",\n", ' ' <repeats 24 times>, "libdir: \"/usr/lib\" });\n", script_len=187, filename=0x22f3310 "/usr/bin/gnome-maps", retval=...) at ../gjs/context.cpp:1467
        eval_obj = 
          {<js::RootedBase<JSObject*, JS::Rooted<JSObject*> >> = {<js::MutableWrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<js::WrappedPtrOp
erations<JSObject*, JS::Rooted<JSObject*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328348, prev = 0x0, ptr = 0xe295e5240e0}
        items_written = 187
        error = 0x7fff01749e90
        utf16_string = {m_ptr = 0x2352000}
        buf = 
          {units_ = 0x2352000 u"#!/usr/bin/gjs\nimports.package.start({ name: \"gnome-maps\",\n", ' ' <repeats 24 times>, "version: \"40.4\",\n", ' ' <repeats 24 times>, "prefix: \"/usr\",\n", ' ' <repeats 24 times>, "libdir: \"/usr/lib\" });\n", length_ = 187, ownsUnits_ = false}
        scope_chain = 
                      {<JS::Rooted<JS::StackGCVector<JSObject*, js::TempAllocPolicy> >> = {<js::RootedBase<JS::StackGCVector<JSObject*, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JSObject*, js::TempAllocPolicy> > >> = {<js::MutableWrappedPtrOperations<JS::StackGCVector<JSObject*, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JSObject*, js::TempAllocPolicy> > >> = {<js::MutableWrappedPtrOperations<JS::GCVector<JSObject*, 8, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JSObject*, js::TempAllocPolicy> > >> = {<js::WrappedPtrOperations<JS::GCVector<JSObject*, 8, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JSObject*, js::TempAllocPolicy> > >> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328398, prev = 0x0, ptr = {<js::VirtualTraceable> = {_vptr.VirtualTraceable = 0x7f50e5310d30 <vtable for js::RootedTraceable<JS::StackGCVector<JSObject*, js::TempAllocPolicy> >+16>}, ptr = {<JS::GCVector<JSObject*, 8, js::TempAllocPolicy>> = {vector = {<js::TempAllocPolicy> = {<js::AllocPolicyBase> = {<No data fields>}, cx_ = 0x2328330}, static kElemIsPod = true, static kMaxInlineBytes = 992, static kInlineCapacity = 8, mBegin = 0x7fff01749df8, mLength = 1, mTail = {<mozilla::Vector<JSObject*, 8, js::TempAllocPolicy>::CapacityAndReserved> = {mCapacity = 8}, mBytes = "\340@R^)\016\000\000\060\203\062\002\f\000\000\000\060\203\062\002\000\000\000\000\060\236t\001\377\177\000\000/\315\025\345P\177\000\000\060\203\062\002\000\000\000\000@\237t\001\377\177\000\000`\236t\001\377\177\000"}}}, <No data fields>}}}, <No data fields>}
        options = 
              {<JS::ReadOnlyCompileOptions> = {<JS::TransitiveCompileOptions> = {_vptr.TransitiveCompileOptions = 0x7f50e492c598 <vtable for JS::CompileOptions+16>, mutedErrors_ = false, forceFullParse_ = false, forceStrictMode_ = false, sourcePragmas_ = true, filename_ = 0x22f3310 "/usr/bin/gnome-maps", introducerFilename_ = 0x0, sourceMapURL_ = 0x0, skipFilenameValidation_ = false, selfHostingMode = false, asmJSOption = JS::AsmJSOption::Enabled, throwOnAsmJSValidationFailureOption = false, forceAsync = false, discardSource = false, sourceIsLazy = false, allowHTMLComments = true, hideScriptFromDebugger = false, nonSyntacticScope = false, introductionType = 0x0, introductionLineno = 0, introductionOffset = 0, hasIntroductionInfo = false, instrumentationKinds = 0}, lineno = 1, column = 0, scriptSourceOffset = 0, isRunOnce = false, noScriptRval = false}, elementAttributeNameRoot = {<js::RootedBase<JSString*, JS::Rooted<JSString*> >> = {<js::MutableWrappedPtrOperations<JSString*, JS::Rooted<JSString*> >> = {<js::WrappedPtrOperations<JSString*, JS::Rooted<JSString*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328368, prev = 0x0, ptr = 0x0}, introductionScriptRoot = {<js::RootedBase<JSScript*, JS::Rooted<JSScript*> >> = {<js::MutableWrappedPtrOperations<JSScript*, JS::Rooted<JSScript*> >> = {<js::WrappedPtrOperations<JSScript*, JS::Rooted<JSScript*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328358, prev = 0x0, ptr = 0x0}, scriptOrModuleRoot = {<js::RootedBase<JSScript*, JS::Rooted<JSScript*> >> = {<js::MutableWrappedPtrOperations<JSScript*, JS::Rooted<JSScript*> >> = {<js::WrappedPtrOperations<JSScript*, JS::Rooted<JSScript*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328358, prev = 0x7fff01749d78, ptr = 0x0}, privateValueRoot = {<js::RootedBase<JS::Value, JS::Rooted<JS::Value> >> = {<js::MutableWrappedPtrOperations<JS::Value, JS::Rooted<JS::Value> >> = {<js::WrappedPtrOperations<JS::Value, JS::Rooted<JS::Value> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328390, prev = 0x7fff01749f40, ptr = {asBits_ = 18446196694595028864}}}
        file = {m_ptr = 0x23cc3a0}
        uri = {m_ptr = 0x23282a0 "file:///usr/bin/gnome-maps"}
        priv = 
          {<js::RootedBase<JSObject*, JS::Rooted<JSObject*> >> = {<js::MutableWrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<js::WrappedPtrOp
erations<JSObject*, JS::Rooted<JSObject*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328348, prev = 0x7fff01749e70, ptr = 0xe295e524780}
#37 0x00007f50e51cc2ee in GjsContextPrivate::eval(char const*, long, char const*, int*, _GError**)
    (this=0x2329060, script=0x2323880 "#!/usr/bin/gjs\nimports.package.start({ name: \"gnome-maps\",\n", ' ' <repeats 24 times>, "version: \"40.4\",\n", ' ' <repeats 24 times>, "prefix: \"/usr\",\n", ' ' <repeats 24 times>, "libdir: \"/usr/lib\" });\n", script_len=187, filename=0x22f3310 "/usr/bin/gnome-maps", exit_status_p=0x7fff0174a084, error=0x7fff0174a088) at ../gjs/context.cpp:1270
        reset = {m_self = 0x2329060}
        auto_profile = false
        ar = {cx_ = 0x2328330, oldRealm_ = 0x0}
        retval = 
            {<js::RootedBase<JS::Value, JS::Rooted<JS::Value> >> = {<js::MutableWrappedPtrOperations<JS::Value, JS::Rooted<JS::Value> >> = {<js::WrappedPtrOperations<JS::Value, JS::Rooted<JS::Value> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328390, prev = 0x0, ptr = {asBits_ = 18444914486360932352}}
        ok = false
#38 0x00007f50e51cbd85 in gjs_context_eval(GjsContext*, char const*, gssize, char const*, int*, GError**)
    (js_context=0x23291c0 [GjsContext], script=0x2323880 "#!/usr/bin/gjs\nimports.package.start({ name: \"gnome-maps\",\n", ' ' <repeats 24 times>, "version: \"40.4\",\n", ' ' <repeats 24 times>, "prefix: \"/usr\",\n", ' ' <repeats 24 times>, "libdir: \"/usr/lib\" });\n", script_len=187, filename=0x22f3310 "/usr/bin/gnome-maps", exit_status_p=0x7fff0174a084, error=0x7fff0174a088) at ../gjs/context.cpp:1192
        __PRETTY_FUNCTION__ = "bool gjs_context_eval(GjsContext*, const char*, gssize, const char*, int*, GError**)"
        js_context_ref = {m_ptr = 0x23291c0}
        gjs = 0x2329060
#39 0x0000000000402c65 in define_argv_and_eval_script(_GjsContext*, int, char* const*, char const*, unsigned long, char const*)
    (js_context=0x23291c0 [GjsContext], argc=0, argv=0x7fff0174a318, script=0x2323880 "#!/usr/bin/gjs\nimports.package.start({ name: \"gnome-maps\",\n", ' ' <repeats 24 times>, "version: \"40.4\",\n", ' ' <repeats 24 times>, "prefix: \"/usr\",\n", ' ' <repeats 24 times>, "libdir: \"/usr/lib\" });\n", len=187, filename=0x22f3310 "/usr/bin/gnome-maps") at ../gjs/console.cpp:191
        error = 0x0
        code = 0
#40 0x00000000004035fc in main(int, char**) (argc=2, argv=0x7fff0174a308) at ../gjs/console.cpp:384
        context = 0x22f3010
        error = 0x0
        js_context = 0x23291c0 [GjsContext]
        coverage = 0x0
        script = 0x2323880 "#!/usr/bin/gjs\nimports.package.start({ name: \"gnome-maps\",\n", ' ' <repeats 24 times>, "version: \"40.4\",\n", ' ' <repeats 24 times>, "prefix: \"/usr\",\n", ' ' <repeats 24 times>, "libdir: \"/usr/lib\" });\n"
        filename = 0x22f3310 "/usr/bin/gnome-maps"
        program_name = 0x22f3310 "/usr/bin/gnome-maps"
        len = 187
        gjs_argc = 2
        script_argc = 0
        ix = 1
        argv_copy = 0x22ed060
        argv_copy_addr = 0x22ed060
        gjs_argv = 0x22f32b0
        gjs_argv_addr = 0x22f32b0
        script_argv = 0x7fff0174a318
        env_coverage_output_path = 0x0
        interactive_mode = false
        argc_copy = 2
        program_path = {m_ptr = 0x23206c0 "/usr/bin/gnome-maps"}
        __PRETTY_FUNCTION__ = "int main(int, char**)"
        env_tracefd = 0x0
        tracefd = -1
        env_coverage_prefixes = 0x0
        code = 0

And in the case of the kernel panic and GPU hang on Plasma:

[ 7584.027896] nouveau 0000:03:00.0: gr: TRAP ch 14 [003f924000 plasmashell[172982]]
[ 7584.027905] nouveau 0000:03:00.0: gr: GPC0/TPC0/TEX: 80000049
[ 7584.027908] nouveau 0000:03:00.0: gr: GPC0/TPC1/TEX: 80000049
[ 7584.027912] nouveau 0000:03:00.0: gr: GPC0/TPC2/TEX: 80000049
[ 7584.027915] nouveau 0000:03:00.0: gr: GPC0/TPC3/TEX: 80000049
[ 7584.027921] nouveau 0000:03:00.0: fifo: read fault at 000025b000 engine 00 [PGRAPH] client 01 [GPC0/TEX] reason 02 [PAGE_NOT_PRESENT] on channel 14 [003f924000 plasmashell[172982]]
[ 7584.027924] nouveau 0000:03:00.0: fifo: gr engine fault on channel 14, recovering...
[ 7584.028137] nouveau 0000:03:00.0: plasmashell[172982]: channel 14 killed!
[ 7588.608886] nouveau 0000:03:00.0: plasmashell[173197]: multiple instances of buffer 11 on validation list
[ 7588.608891] nouveau 0000:03:00.0: plasmashell[173197]: validate_init
[ 7588.608893] nouveau 0000:03:00.0: plasmashell[173197]: validate: -22
[ 7590.896912] nouveau 0000:03:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] ch 16 [003f8d6000 plasmashell[173241]] subc 0 class 9097 mthd 1b0c data 80500904

When the issue first came up, Xi and I worked on it a little bit in IRC. The clutter tests also fail on this system, with an "Invalid Operation" error in COGL (GL Error (1282)). This points again towards an issue in Mesa. This shows up as: "(lt-actor-pick:3619853): Cogl-WARNING : 19:30:35.549: driver/gl/cogl-framebuffer-gl.c:1554: GL error (1282): Invalid operation" - line 1554 in driver/gl/cogl-framebuffer.c calls upon glReadPixels (GE( ctx, glReadPixels (x, y, )- on an Intel Skylake system, and a machine with AMDGPU graphics, this error does not occur.

Over the weekend, I had Bruce try a build of GNOME (including gnome-maps) and KDE Plasma on his Haswell system, with an NVIDIA GeForce GT210. Note that the GPU in this case is an NVIDIA Quardo 2000, belonging to a later family. In his case, gnome-maps worked properly, as did Plasma. Looking at various forum threads online tells me that this is a regular problem that became more common after the last KDE release (in the case of Plasma at least). There were multiple reports of LFS Users having this issue on LinuxQuestions, as well as some Slackware, OpenSUSE Tumbleweed, Gentoo, and Arch.

Note that GL Error 1282 is very common in Minecraft as well - which is another clue. This only occurs in multithreaded applications, such as Minecraft! This clue came up during a discussion about multithreaded applications today in Systems Programming.

Continuing down the rabbit hole, I looked at the difference in generations between Bruce's GT210 and my Quadro 2000. I then looked on the Mesa bug tracker specifically for Minecraft related issues (just because I knew it would use Multithreaded operations). I came across https://gitlab.freedesktop.org/mesa/mesa/-/issues/5871 which covers someone's recent Minecraft crashses with Mesa-21.3.4. In that bug report, there was a link to a set of patches to make the Nouveau driver threadsafe. Those patches can be found here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10752

Now is a good time to bring up the generational differences. NV50 covers the GT 210 and various cards from 2009-2011. https://nouveau.freedesktop.org/CodeNames.html#nv50familytesla is a good link to see an overview of those generations. My Quadro 2000 is a NVC0 card. I currently have a GT210 in the other machine, but I will swap it over to the Quadro 2000 to test the changes. I also have several other lower-end NVIDIA GT series cards from later families, all purchased between 2016-2019. While I could pull those out to test, I think the Quadro 2000 and the GT 210 will be sufficient for this particular problem.

Google and the forum results show that almost every generation from NVC0 onward has issues with multithreaded OpenGL applications. This makes sense, because Mesa uses the newer NIR compiler internally instead of LLVM for those families of GPUs, so multithreaded is forced to be enabled by default. In this case, it'll cause issues with Plasma and gnome-maps as well, since gnome-maps uses Clutter and Clutter is heavily multi-threaded.

My test plan is as follows (and I hope to have this done in the next day or two):

  • Test Plasma (Wayland), gnome-maps, GNOME (Wayland), and Minecraft on my separate system with the Quadro 2000, and the GT210 installed.
  • With the GT210 installed, also attempt playing a video using VLC to test the claim regarding hardware acceleration being broken with those patches upstream (only affecting the NV50 family of GPUs). If this turns out to be the case, I will attempt to pull the nv50* chunks from the upstream changes, since the NV50 series doesn't seem to be affected by this issue. Note that the particular VDPAU-related extensions on NV50 GPUs also don't seem to be enabled without firmware... which we can't carry, so this might be a non-issue for NV50 GPUs on LFS.

My debug system also has a bunch of kernel options enabled for additional debugging features, and has debug symbols for all packages built in (no stripping). Most, if not all, packages using meson are built with "--buildtype=debug" over there, which I couldn't do on my main development machine without a massive rebuild.

If the issues get fixed on the debug machine, I will install the new versions on my development system to verify the issue is fixed over there as well and then drop the patch into the book. After this is over, I will immediately turn my attention towards the remainder of my tickets. I would like to get this resolved as soon as possible.

Change History (7)

comment:1 by Douglas R. Reno, 2 years ago

Owner: changed from blfs-book to Douglas R. Reno
Status: newassigned

comment:2 by Douglas R. Reno, 2 years ago

I've discovered some additional crashes in gst-plugins-base when running the tests (note that I'm installing things at the moment on my debugging system, haven't tested the Mesa fix yet). It looks like it was the gstgl* tests.

[62893.838541] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 6 [gstglcontext[601]] get 00000162d4 put 00000167c4 ib_get 00000003 ib_put 00000008 state c000765c (err: MEM_FAULT) push 00400040
[62893.838561] nouveau 0000:01:00.0: fb: trapped read at 0000a2a3b0 on channel 6 [3f85c000 gstglcontext[601]] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 00000002 [PAGE_NOT_PRESENT]
[62893.838589] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 6 [gstglcontext[601]] get 0000a2a3b0 put 0000a2a3bc ib_get 00000004 ib_put 00000008 state c0000000 (err: MEM_FAULT) push 00400040
[62893.838612] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 6 [gstglcontext[601]] get 00000167c4 put 000001685c ib_get 00000005 ib_put 00000008 state c0000000 (err: MEM_FAULT) push 00400040
[62893.838622] nouveau 0000:01:00.0: fb: trapped read at 0000a2a3b0 on channel 6 [3f85c000 gstglcontext[601]] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 00000002 [PAGE_NOT_PRESENT]
[62893.838640] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 6 [gstglcontext[601]] get 0000a2a3b0 put 0000a2a3bc ib_get 00000006 ib_put 00000008 state c0000000 (err: MEM_FAULT) push 00400040
[62893.952167] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 7 [gstglcontext[616]] get 0000016244 put 000001624c ib_get 00000003 ib_put 00000008 state c00075f0 (err: MEM_FAULT) push 00400040
[62893.952188] nouveau 0000:01:00.0: fb: trapped read at 000062b3b0 on channel 7 [3f6f0000 gstglcontext[616]] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 00000002 [PAGE_NOT_PRESENT]
[62893.952204] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 7 [gstglcontext[616]] get 000062b3b0 put 000062b3bc ib_get 00000004 ib_put 00000008 state c0000000 (err: MEM_FAULT) push 00400040
[62893.952221] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 7 [gstglcontext[616]] get 000001624c put 00000162e4 ib_get 00000005 ib_put 00000008 state c0000000 (err: MEM_FAULT) push 00400040
[62893.952231] nouveau 0000:01:00.0: fb: trapped read at 000062b3b0 on channel 7 [3f6f0000 gstglcontext[616]] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 00000002 [PAGE_NOT_PRESENT]
[62893.952245] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 7 [gstglcontext[616]] get 000062b3b0 put 000062b3bc ib_get 00000006 ib_put 00000008 state c0000000 (err: MEM_FAULT) push 00400040

comment:3 by Douglas R. Reno, 2 years ago

Before I submit the patch, I've been working on reproducing things properly. Here's the kernel output right before the panic when running 'startplasma-wayland' on the GT210 (NV50 series). Again, this is before the patch has been applied:

[332778.703482] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 8 [ksplashqml[28868]] get 000010f560 put 00001108d8 ib_get 0000038a ib_put 0000038e state 80007098 (err: INVALID_CMD) push 00400040
[332778.715000] nouveau 0000:01:00.0: gr: DATA_ERROR 00000005 [INVALID_ENUM]
[332778.715009] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 130c data 00000000
[332778.715025] nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD]
[332778.715028] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 0fc0 data 00047454
[332778.715039] nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD]
[332778.715041] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 0fc8 data 000477b8
[332778.719764] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 8 [ksplashqml[28868]] get 000010f560 put 00001108d8 ib_get 0000038c ib_put 0000038e state 80007098 (err: INVALID_CMD) push 00400040
[332778.731262] nouveau 0000:01:00.0: gr: DATA_ERROR 00000005 [INVALID_ENUM]
[332778.731272] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 130c data 00000000
[332778.731297] nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD]
[332778.731301] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 0fc0 data 00047454
[332778.731312] nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD]
[332778.731314] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 0fc8 data 000477b8
[332779.813288] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 8 [ksplashqml[28868]] get 000016e508 put 000016f988 ib_get 000000da ib_put 000000de state 80007550 (err: INVALID_CMD) push 00400040
[332779.826753] nouveau 0000:01:00.0: gr: DATA_ERROR 00000005 [INVALID_ENUM]
[332779.826763] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 12e8 data 08020402
[332779.829522] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 8 [ksplashqml[28868]] get 000016e508 put 000016f988 ib_get 000000dc ib_put 000000e2 state 80007550 (err: INVALID_CMD) push 00400040
[332779.831921] nouveau 0000:01:00.0: gr: DATA_ERROR 00000005 [INVALID_ENUM]
[332779.831930] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 12e8 data 08020402
[332781.485324] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 9 [plasmashell[28911]] get 00000325f0 put 0000034024 ib_get 0000004a ib_put 00000055 state 80007228 (err: INVALID_CMD) push 00400040
[332781.536167] nouveau 0000:01:00.0: gr: TRAP_MP_EXEC - TP 0 MP 0: 00000010 [INVALID_OPCODE] at 07fdc0 warp 4, opcode 00e5e5e5 00e5e5e5
[332781.536181] nouveau 0000:01:00.0: gr: TRAP_MP_EXEC - TP 0 MP 1: 00000010 [INVALID_OPCODE] at 07fdc0 warp 1, opcode 00e5e5e5 00e5e5e5
[332781.536186] nouveau 0000:01:00.0: gr: 00200000 [] ch 9 [003f894000 plasmashell[28911]] subc 3 class 8597 mthd 1b0c data 1000f010

comment:4 by Douglas R. Reno, 2 years ago

comment:5 by Douglas R. Reno, 2 years ago

comment:6 by Douglas R. Reno, 2 years ago

Tested with:

KDE Plasma/Wayland

GNOME Wayland

VLC Media Player

MPlayer

Kdenlive

Xine

Discord

Zoom

Minecraft

I've also tested this on a GT 610, GT 210, GT 730, Quadro 2000, and GT 1030.

comment:7 by Douglas R. Reno, 2 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.