Opened 3 years ago
Closed 3 years ago
#16033 closed defect (fixed)
Fix nouveau multithreading issues causing crashes and kernel panics
Reported by: | Douglas R. Reno | Owned by: | Douglas R. Reno |
---|---|---|---|
Priority: | normal | Milestone: | 11.1 |
Component: | BOOK | Version: | git |
Severity: | critical | Keywords: | |
Cc: |
Description ¶
As those in IRC will remember, I began encountering issues on my development machine with a NVIDIA GPU installed. Today I made some major developments in figuring out what is happening here, but before I discuss that, let's get into some context.
This issue has exhibited itself in two different ways.
1: Running gnome-maps in a Wayland session of GNOME instantly crashes gnome-maps (before the window can show), and in 2 runs out of 10, immediately causes a kernel panic.
2: Executing 'startplasma-wayland' to start a Wayland session of Plasma causes a GPU hang due to an issue in Mesa/the kernel. This issue can exhibit itself in several different fashions. In one case, I had a start menu that would pop up and immediately close, which would then shift the taskbar icons to the far left. After that, my keyboard and mouse would stop working. In another attempt, the system immediately started blinking the Num Lock and Scroll Lock LEDs after the KDE initialization screen comes up. In yet another attempt (trying to isolate the issue), plasmashell immediately core dumped and then the kernel panicked again.
The coredumpctl gdb output with 'bt full' for gnome-maps is:
#0 0x00007f5059c22c40 in dri2_query_image () from /usr/lib/dri/nouveau_dri.so [Current thread is 1 (Thread 0x7f50e0e459c0 (LWP 172837))] (gdb) bt full #0 0x00007f5059c22c40 in dri2_query_image () at /usr/lib/dri/nouveau_dri.so #1 0x00007f50e14694eb in create_wl_buffer () at /usr/lib/libEGL.so.1 #2 0x00007f50e146a12b in dri2_wl_swap_buffers_with_damage () at /usr/lib/libEGL.so.1 #3 0x00007f50e145b40e in dri2_swap_buffers () at /usr/lib/libEGL.so.1 #4 0x00007f50e144feda in eglSwapBuffers () at /usr/lib/libEGL.so.1 #5 0x00007f50a78b91cd in _cogl_winsys_onscreen_swap_buffers_with_damage (onscreen=0x3911050, rectangles=0x7fff01747e60, n_rectangles=0) at winsys/cogl-winsys-egl.c:848 context = <optimized out> renderer = <optimized out> egl_renderer = 0x2e60940 egl_onscreen = 0x2f20c80 #6 0x00007f50a78a35a6 in cogl_onscreen_swap_buffers_with_damage (onscreen=0x3911050, rectangles=rectangles@entry=0x7fff01747e60, n_rectangles=n_rectangles@entry=0) at cogl-onscreen.c:319 framebuffer = 0x3911050 winsys = <optimized out> info = <optimized out> __func__ = "cogl_onscreen_swap_buffers_with_damage" #7 0x00007f50a793f371 in clutter_stage_cogl_redraw (stage_window=0x33b04c0) at cogl/clutter-stage-cogl.c:639 stage_cogl = 0x33b04c0 [ClutterStageGdk] geom = {x = 0, y = 0, width = 820, height = 652} have_clip = <optimized out> may_use_clipped_redraw = <optimized out> use_clipped_redraw = <optimized out> can_blit_sub_buffer = <optimized out> has_buffer_age = <optimized out> wrapper = <optimized out> clip_region = <optimized out> damage = {0, 0, 0, 0} ndamage = 0 force_swap = <optimized out> window_scale = <optimized out> #8 0x00007f50a794280b in clutter_stage_gdk_redraw (stage_window=0x33b04c0) at gdk/clutter-stage-gdk.c:675 stage_gdk = 0x33b04c0 [ClutterStageGdk] clock = 0x36bc120 [GdkFrameClockIdle] #9 0x00007f50a79a99d4 in clutter_stage_do_redraw (stage=0x34faaf0 [ClutterStage]) at clutter-stage.c:1130 backend = <optimized out> actor = 0x34faaf0 [ClutterStage] priv = 0x34fa4f0 priv = 0x34fa4f0 #10 _clutter_stage_do_update (stage=stage@entry=0x34faaf0 [ClutterStage]) at clutter-stage.c:1186 priv = 0x34fa4f0 #11 0x00007f50a7941bb5 in master_clock_update_stage (master_clock=0x35e3900 [ClutterMasterClockGdk], stage=0x34faaf0 [ClutterStage]) at gdk/clutter-master-clock-gdk.c:249 stage_updated = 0 stage = 0x34faaf0 [ClutterStage] stages = <optimized out> l = 0x2ef9ca0 = {0x34faaf0} #12 clutter_master_clock_gdk_update (frame_clock=0x36bc120 [GdkFrameClockIdle], master_clock=0x35e3900 [ClutterMasterClockGdk]) at gdk/clutter-master-clock-gdk.c:306 stage = 0x34faaf0 [ClutterStage] stages = <optimized out> l = 0x2ef9ca0 = {0x34faaf0} #16 0x00007f50e4fc3852 in <emit signal ??? on instance 0x36bc120 [GdkFrameClockIdle]> (instance=instance@entry=0x36bc120, signal_id=<optimized out>, detail=detail@entry=0) at ../gobject/gsignal.c:3553 var_args = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7fff017483d0, reg_save_area = 0x7fff01748310}} #13 0x00007f50e4fab4cf in g_closure_invoke (closure=0x391d310, return_value=return_value@entry=0x0, n_param_values=1, param_values=param_values@entry=0x7fff01748180, invocation_hint=invocation_hint@entry=0x7fff01748120) at ../gobject/gclosure.c:830 marshal = 0x7f50e4fad420 <g_cclosure_marshal_VOID__VOID> marshal_data = 0x0 in_marshal = 0 real_closure = 0x391d2f0 __func__ = "g_closure_invoke" #14 0x00007f50e4fbcfa6 in signal_emit_unlocked_R (node=node@entry=0x2d99480, detail=detail@entry=0, instance=instance@entry=0x36bc120, emission_return=emission_return@entry=0x0, instance_and_params=instance_and_params@entry=0x7fff01748180) at ../gobject/gsignal.c:3742 tmp = <optimized out> handler = 0x3917740 accumulator = 0x0 emission = {next = 0x0, instance = 0x36bc120, ihint = {signal_id = 310, detail = 0, run_type = (G_SIGNAL_RUN_FIRST | G_SIGNAL_ACCUMULATOR_FIRST_RUN)}, state = EMISSION_RUN, chain_type = 0x4 [void]} class_closure = 0x0 hlist = <optimized out> handler_list = 0x38c04c0 return_accu = 0x0 accu = {g_type = 0x0, data = {{v_int = 0, v_uint = 0, v_long = 0, v_ulong = 0, v_int64 = 0, v_uint64 = 0, v_float = 0, v_double = 0, v_pointer = 0x0}, {v_int = 0, v_uint = 0, v_long = 0, v_ulong = 0, v_int64 = 0, v_uint64 = 0, v_float = 0, v_double = 0, v_pointer = 0x0}}} signal_id = 310 max_sequential_handler_number = 3851 return_value_altered = 1 #15 0x00007f50e4fc3329 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7fff017482f8) at ../gobject/gsignal.c:3497 instance_and_params = 0x7fff01748180 signal_return_type = <optimized out> param_values = 0x7fff01748198 node = <optimized out> i = <optimized out> n_params = <optimized out> __func__ = "g_signal_emit_valist" #17 0x00007f50ccb1919f in _gdk_frame_clock_emit_paint (frame_clock=frame_clock@entry=0x36bc120 [GdkFrameClockIdle]) at gdkframeclock.c:657 #18 0x00007f50ccb19de2 in gdk_frame_clock_paint_idle (data=0x36bc120) at gdkframeclockidle.c:597 clock = 0x36bc120 [GdkFrameClockIdle] clock_idle = 0x36bc120 [GdkFrameClockIdle] priv = 0x36bc020 skip_to_resume_events = 0 timings = 0x33f5650 __func__ = "gdk_frame_clock_paint_idle" #19 0x00007f50ccb04d69 in gdk_threads_dispatch (data=0x2f21760, data@entry=<error reading variable: value has been optimized out>) at gdk.c:769 dispatch = 0x2f21760 ret = 0 #20 0x00007f50e50460a4 in g_timeout_dispatch (source=0x390eb70, callback=<optimized out>, user_data=<optimized out>) at ../glib/gmain.c:4933 timeout_source = 0x390eb70 again = <optimized out> #21 0x00007f50e5045594 in g_main_dispatch (context=0x230c500) at ../glib/gmain.c:3381 dispatch = 0x7f50e5046090 <g_timeout_dispatch> prev_source = 0x0 begin_time_nsec = 0 was_in_call = 0 user_data = 0x2f21760 callback = 0x7f50ccb04d40 <gdk_threads_dispatch> cb_funcs = <optimized out> cb_data = <optimized out> need_destroy = <optimized out> source = 0x390eb70 current = 0x2318070 i = 0 __func__ = "g_main_dispatch" #22 g_main_context_dispatch (context=0x230c500) at ../glib/gmain.c:4099 #23 0x00007f50e50458f8 in g_main_context_iterate (context=context@entry=0x230c500, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4175 max_priority = 120 timeout = 0 some_ready = 1 nfds = <optimized out> allocated_nfds = <optimized out> fds = 0x2dda5e0 #24 0x00007f50e504599f in g_main_context_iteration (context=context@entry=0x230c500, may_block=may_block@entry=1) at ../glib/gmain.c:4240 retval = <optimized out> #25 0x00007f50e4e94c4d in g_application_run (application=0x24ca0f0 [Gjs_Application], argc=24413516, argv=<optimized out>) at ../gio/gapplication.c:2569 arguments = 0x24cce60 status = 0 context = 0x230c500 acquired_context = <optimized out> __func__ = "g_application_run" #26 0x00007f50e494355a in () at /usr/lib/libffi.so.8 #27 0x00007f50e4942753 in () at /usr/lib/libffi.so.8 #28 0x00007f50e517d94c in Gjs::Function::invoke(JSContext*, JS::CallArgs const&, JS::Handle<JSObject*>, _GIArgument*) (this=0x24ccb70, context=0x2328330, args=..., this_obj=..., r_value=0x0) at ../gi/function.cpp:948 __PRETTY_FUNCTION__ = "bool Gjs::Function::invoke(JSContext*, const JS::CallArgs&, JS::HandleObject, GIArgument*)" return_value_p = 0x7fff01748878 return_value = {v_boolean = -459184751, v_int8 = -111 '\221', v_uint8 = 145 '\221', v_int16 = 26001, v_uint16 = 26001, v_int32 = -459184751, v_uint32 = 3835782545, v_int64 = 139985409893777, v_uint64 = 139985409893777, v_float = -2.38179554e+22, v_double = 6.9161981947520557e-310, v_short = 26001, v_ushort = 26001, v_int = -459184751, v_uint = 3835782545, v_long = 139985409893777, v_ulong = 139985409893777, v_ssize = 139985409893777, v_size = 139985409893777, v_string = 0x7f50e4a16591 <__GI___libc_free+81> "d\211+H\203\304\030[]\303\017\037D", v_pointer = 0x7f50e4a16591 <__GI___libc_free+81>} ffi_argc = 3 state = {m_in_cvalues = 0x24cc7e0, m_out_cvalues = 0x24cc940, m_inout_original_cvalues = 0x24ccdd0, ignore_release = std::unordered_set with 0 elements, instance_object = {<js::RootedBase<JSObject*, JS::Rooted<JSObject*> >> = {<js::MutableWrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<js::WrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328348, prev = 0x7fff01748a70, ptr = 0x1333e8c8ad00}, return_values = {<JS::Rooted<JS::StackGCVector<JS::Value, js::TempAllocPolicy> >> = {<js::RootedBase<JS::StackGCVector<JS::Value, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JS::Value, js::TempAllocPolicy> > >> = {<js::MutableWrappedPtrOperations<JS::StackGCVector<JS::Value, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JS::Value, js::TempAllocPolicy> > >> = {<js::MutableWrappedPtrOperations<JS::GCVector<JS::Value, 8, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JS::Value, js::TempAllocPolicy> > >> = {<js::WrappedPtrOperations<JS::GCVector<JS::Value, 8, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JS::Value, js::TempAllocPolicy> > >> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328398, prev = 0x7fff017490a8, ptr = {<js::VirtualTraceable> = {_vptr.VirtualTraceable = 0x7f50e5310350 <vtable for js::RootedTraceable<JS::StackGCVector<JS::Value, js::TempAllocPolicy> >+16>}, ptr = {<JS::GCVector<JS::Value, 8, js::TempAllocPolicy>> = {vector = {<js::TempAllocPolicy> = {<js::AllocPolicyBase> = {<No data fields>}, cx_ = 0x2328330}, static kElemIsPod = false, static kMaxInlineBytes = 992, static kInlineCapacity = 8, mBegin = 0x7fff01748810, mLength = 0, mTail = {<mozilla::Vector<JS::Value, 8, js::TempAllocPolicy>::CapacityAndReserved> = {mCapacity = 8}, mBytes = "0\210t\001\377\177\000\000*\317\026\345P\177\000\000\350ID\002\000\000\000\000؈t\001\377\177\000\000\060\211t\001\377\177\000\000\000s\026\366\037f\242\"\037\212t\001\377\177\000\000`\377\377\377\377\377\377\377"}}}, <No data fields>}}}, <No data fields>}, local_error = {m_ptr = 0x0}, info = 0x2ce6ca0, gi_argc = 2, processed_c_args = 3, failed = false, can_throw_gerror = false, is_method = true} ffi_arg_pointers = std::unique_ptr<void *[]> = {get() = {<No data fields>}} gi_arg_pos = 2 ffi_arg_pos = 3 js_arg_pos = 1 obj = {<js::RootedBase<JSObject*, JS::Rooted<JSObject*> >> = {<js::MutableWrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<js::WrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328348, prev = 0x7fff017487c0, ptr = 0x1333e8c8ad00} dynamicString = "Gjs_Application.method Gio.Application.run" label = {m_stack = 0x0} errorp = 0x7fff01748850 #29 0x00007f50e517e26d in Gjs::Function::call(JSContext*, unsigned int, JS::Value*) (context=0x2328330, js_argc=1, vp=0x24449e0) at ../gi/function.cpp:1090 js_argv = {<JS::detail::CallArgsBase<JS::detail::IncludeUsedRval>> = {argv_ = 0x24449f0, argc_ = 1, constructing_ = false, ignoresReturnValue_ = false}, <No data fields>} callee = {<js::RootedBase<JSObject*, JS::Rooted<JSObject*> >> = {<js::MutableWrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<js::WrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328348, prev = 0x7fff01748de0, ptr = 0x1713db0b82e0} priv = 0x24ccb70 #30 0x00007f50e3db18ee in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) () at /usr/lib/libmozjs-78.so #31 0x00007f50e3da48ec in Interpret(JSContext*, js::RunState&) () at /usr/lib/libmozjs-78.so #32 0x00007f50e3db10ae in js::RunScript(JSContext*, js::RunState&) () at /usr/lib/libmozjs-78.so #33 0x00007f50e3db35c5 in js::Execute(JSContext*, JS::Handle<JSScript*>, JS::Handle<JSObject*>, JS::MutableHandle<JS::Value>) () at /usr/lib/libmozjs-78.so #34 0x00007f50e3ec7b13 in bool EvaluateSourceBuffer<char16_t>(JSContext*, js::ScopeKind, JS::Handle<JSObject*>, JS::ReadOnlyCompileOptions const&, JS::SourceText<char16_t>&, JS::MutableHandle<JS::Value>) () at /usr/lib/libmozjs-78.so #35 0x00007f50e3ec7c6a in JS::Evaluate(JSContext*, JS::Handle<JS::StackGCVector<JSObject*, js::TempAllocPolicy> >, JS::ReadOnlyCompileOptions const&, JS::SourceText<char16_t>&, JS::MutableHandle<JS::Value>) () at /usr/lib/libmozjs-78.so #36 0x00007f50e51cd08a in GjsContextPrivate::eval_with_scope(JS::Handle<JSObject*>, char const*, long, char const*, JS::MutableHandle<JS::Value>) (this=0x2329060, scope_object=..., script=0x2323880 "#!/usr/bin/gjs\nimports.package.start({ name: \"gnome-maps\",\n", ' ' <repeats 24 times>, "version: \"40.4\",\n", ' ' <repeats 24 times>, "prefix: \"/usr\",\n", ' ' <repeats 24 times>, "libdir: \"/usr/lib\" });\n", script_len=187, filename=0x22f3310 "/usr/bin/gnome-maps", retval=...) at ../gjs/context.cpp:1467 eval_obj = {<js::RootedBase<JSObject*, JS::Rooted<JSObject*> >> = {<js::MutableWrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<js::WrappedPtrOp erations<JSObject*, JS::Rooted<JSObject*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328348, prev = 0x0, ptr = 0xe295e5240e0} items_written = 187 error = 0x7fff01749e90 utf16_string = {m_ptr = 0x2352000} buf = {units_ = 0x2352000 u"#!/usr/bin/gjs\nimports.package.start({ name: \"gnome-maps\",\n", ' ' <repeats 24 times>, "version: \"40.4\",\n", ' ' <repeats 24 times>, "prefix: \"/usr\",\n", ' ' <repeats 24 times>, "libdir: \"/usr/lib\" });\n", length_ = 187, ownsUnits_ = false} scope_chain = {<JS::Rooted<JS::StackGCVector<JSObject*, js::TempAllocPolicy> >> = {<js::RootedBase<JS::StackGCVector<JSObject*, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JSObject*, js::TempAllocPolicy> > >> = {<js::MutableWrappedPtrOperations<JS::StackGCVector<JSObject*, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JSObject*, js::TempAllocPolicy> > >> = {<js::MutableWrappedPtrOperations<JS::GCVector<JSObject*, 8, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JSObject*, js::TempAllocPolicy> > >> = {<js::WrappedPtrOperations<JS::GCVector<JSObject*, 8, js::TempAllocPolicy>, JS::Rooted<JS::StackGCVector<JSObject*, js::TempAllocPolicy> > >> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328398, prev = 0x0, ptr = {<js::VirtualTraceable> = {_vptr.VirtualTraceable = 0x7f50e5310d30 <vtable for js::RootedTraceable<JS::StackGCVector<JSObject*, js::TempAllocPolicy> >+16>}, ptr = {<JS::GCVector<JSObject*, 8, js::TempAllocPolicy>> = {vector = {<js::TempAllocPolicy> = {<js::AllocPolicyBase> = {<No data fields>}, cx_ = 0x2328330}, static kElemIsPod = true, static kMaxInlineBytes = 992, static kInlineCapacity = 8, mBegin = 0x7fff01749df8, mLength = 1, mTail = {<mozilla::Vector<JSObject*, 8, js::TempAllocPolicy>::CapacityAndReserved> = {mCapacity = 8}, mBytes = "\340@R^)\016\000\000\060\203\062\002\f\000\000\000\060\203\062\002\000\000\000\000\060\236t\001\377\177\000\000/\315\025\345P\177\000\000\060\203\062\002\000\000\000\000@\237t\001\377\177\000\000`\236t\001\377\177\000"}}}, <No data fields>}}}, <No data fields>} options = {<JS::ReadOnlyCompileOptions> = {<JS::TransitiveCompileOptions> = {_vptr.TransitiveCompileOptions = 0x7f50e492c598 <vtable for JS::CompileOptions+16>, mutedErrors_ = false, forceFullParse_ = false, forceStrictMode_ = false, sourcePragmas_ = true, filename_ = 0x22f3310 "/usr/bin/gnome-maps", introducerFilename_ = 0x0, sourceMapURL_ = 0x0, skipFilenameValidation_ = false, selfHostingMode = false, asmJSOption = JS::AsmJSOption::Enabled, throwOnAsmJSValidationFailureOption = false, forceAsync = false, discardSource = false, sourceIsLazy = false, allowHTMLComments = true, hideScriptFromDebugger = false, nonSyntacticScope = false, introductionType = 0x0, introductionLineno = 0, introductionOffset = 0, hasIntroductionInfo = false, instrumentationKinds = 0}, lineno = 1, column = 0, scriptSourceOffset = 0, isRunOnce = false, noScriptRval = false}, elementAttributeNameRoot = {<js::RootedBase<JSString*, JS::Rooted<JSString*> >> = {<js::MutableWrappedPtrOperations<JSString*, JS::Rooted<JSString*> >> = {<js::WrappedPtrOperations<JSString*, JS::Rooted<JSString*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328368, prev = 0x0, ptr = 0x0}, introductionScriptRoot = {<js::RootedBase<JSScript*, JS::Rooted<JSScript*> >> = {<js::MutableWrappedPtrOperations<JSScript*, JS::Rooted<JSScript*> >> = {<js::WrappedPtrOperations<JSScript*, JS::Rooted<JSScript*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328358, prev = 0x0, ptr = 0x0}, scriptOrModuleRoot = {<js::RootedBase<JSScript*, JS::Rooted<JSScript*> >> = {<js::MutableWrappedPtrOperations<JSScript*, JS::Rooted<JSScript*> >> = {<js::WrappedPtrOperations<JSScript*, JS::Rooted<JSScript*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328358, prev = 0x7fff01749d78, ptr = 0x0}, privateValueRoot = {<js::RootedBase<JS::Value, JS::Rooted<JS::Value> >> = {<js::MutableWrappedPtrOperations<JS::Value, JS::Rooted<JS::Value> >> = {<js::WrappedPtrOperations<JS::Value, JS::Rooted<JS::Value> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328390, prev = 0x7fff01749f40, ptr = {asBits_ = 18446196694595028864}}} file = {m_ptr = 0x23cc3a0} uri = {m_ptr = 0x23282a0 "file:///usr/bin/gnome-maps"} priv = {<js::RootedBase<JSObject*, JS::Rooted<JSObject*> >> = {<js::MutableWrappedPtrOperations<JSObject*, JS::Rooted<JSObject*> >> = {<js::WrappedPtrOp erations<JSObject*, JS::Rooted<JSObject*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328348, prev = 0x7fff01749e70, ptr = 0xe295e524780} #37 0x00007f50e51cc2ee in GjsContextPrivate::eval(char const*, long, char const*, int*, _GError**) (this=0x2329060, script=0x2323880 "#!/usr/bin/gjs\nimports.package.start({ name: \"gnome-maps\",\n", ' ' <repeats 24 times>, "version: \"40.4\",\n", ' ' <repeats 24 times>, "prefix: \"/usr\",\n", ' ' <repeats 24 times>, "libdir: \"/usr/lib\" });\n", script_len=187, filename=0x22f3310 "/usr/bin/gnome-maps", exit_status_p=0x7fff0174a084, error=0x7fff0174a088) at ../gjs/context.cpp:1270 reset = {m_self = 0x2329060} auto_profile = false ar = {cx_ = 0x2328330, oldRealm_ = 0x0} retval = {<js::RootedBase<JS::Value, JS::Rooted<JS::Value> >> = {<js::MutableWrappedPtrOperations<JS::Value, JS::Rooted<JS::Value> >> = {<js::WrappedPtrOperations<JS::Value, JS::Rooted<JS::Value> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x2328390, prev = 0x0, ptr = {asBits_ = 18444914486360932352}} ok = false #38 0x00007f50e51cbd85 in gjs_context_eval(GjsContext*, char const*, gssize, char const*, int*, GError**) (js_context=0x23291c0 [GjsContext], script=0x2323880 "#!/usr/bin/gjs\nimports.package.start({ name: \"gnome-maps\",\n", ' ' <repeats 24 times>, "version: \"40.4\",\n", ' ' <repeats 24 times>, "prefix: \"/usr\",\n", ' ' <repeats 24 times>, "libdir: \"/usr/lib\" });\n", script_len=187, filename=0x22f3310 "/usr/bin/gnome-maps", exit_status_p=0x7fff0174a084, error=0x7fff0174a088) at ../gjs/context.cpp:1192 __PRETTY_FUNCTION__ = "bool gjs_context_eval(GjsContext*, const char*, gssize, const char*, int*, GError**)" js_context_ref = {m_ptr = 0x23291c0} gjs = 0x2329060 #39 0x0000000000402c65 in define_argv_and_eval_script(_GjsContext*, int, char* const*, char const*, unsigned long, char const*) (js_context=0x23291c0 [GjsContext], argc=0, argv=0x7fff0174a318, script=0x2323880 "#!/usr/bin/gjs\nimports.package.start({ name: \"gnome-maps\",\n", ' ' <repeats 24 times>, "version: \"40.4\",\n", ' ' <repeats 24 times>, "prefix: \"/usr\",\n", ' ' <repeats 24 times>, "libdir: \"/usr/lib\" });\n", len=187, filename=0x22f3310 "/usr/bin/gnome-maps") at ../gjs/console.cpp:191 error = 0x0 code = 0 #40 0x00000000004035fc in main(int, char**) (argc=2, argv=0x7fff0174a308) at ../gjs/console.cpp:384 context = 0x22f3010 error = 0x0 js_context = 0x23291c0 [GjsContext] coverage = 0x0 script = 0x2323880 "#!/usr/bin/gjs\nimports.package.start({ name: \"gnome-maps\",\n", ' ' <repeats 24 times>, "version: \"40.4\",\n", ' ' <repeats 24 times>, "prefix: \"/usr\",\n", ' ' <repeats 24 times>, "libdir: \"/usr/lib\" });\n" filename = 0x22f3310 "/usr/bin/gnome-maps" program_name = 0x22f3310 "/usr/bin/gnome-maps" len = 187 gjs_argc = 2 script_argc = 0 ix = 1 argv_copy = 0x22ed060 argv_copy_addr = 0x22ed060 gjs_argv = 0x22f32b0 gjs_argv_addr = 0x22f32b0 script_argv = 0x7fff0174a318 env_coverage_output_path = 0x0 interactive_mode = false argc_copy = 2 program_path = {m_ptr = 0x23206c0 "/usr/bin/gnome-maps"} __PRETTY_FUNCTION__ = "int main(int, char**)" env_tracefd = 0x0 tracefd = -1 env_coverage_prefixes = 0x0 code = 0
And in the case of the kernel panic and GPU hang on Plasma:
[ 7584.027896] nouveau 0000:03:00.0: gr: TRAP ch 14 [003f924000 plasmashell[172982]] [ 7584.027905] nouveau 0000:03:00.0: gr: GPC0/TPC0/TEX: 80000049 [ 7584.027908] nouveau 0000:03:00.0: gr: GPC0/TPC1/TEX: 80000049 [ 7584.027912] nouveau 0000:03:00.0: gr: GPC0/TPC2/TEX: 80000049 [ 7584.027915] nouveau 0000:03:00.0: gr: GPC0/TPC3/TEX: 80000049 [ 7584.027921] nouveau 0000:03:00.0: fifo: read fault at 000025b000 engine 00 [PGRAPH] client 01 [GPC0/TEX] reason 02 [PAGE_NOT_PRESENT] on channel 14 [003f924000 plasmashell[172982]] [ 7584.027924] nouveau 0000:03:00.0: fifo: gr engine fault on channel 14, recovering... [ 7584.028137] nouveau 0000:03:00.0: plasmashell[172982]: channel 14 killed! [ 7588.608886] nouveau 0000:03:00.0: plasmashell[173197]: multiple instances of buffer 11 on validation list [ 7588.608891] nouveau 0000:03:00.0: plasmashell[173197]: validate_init [ 7588.608893] nouveau 0000:03:00.0: plasmashell[173197]: validate: -22 [ 7590.896912] nouveau 0000:03:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] ch 16 [003f8d6000 plasmashell[173241]] subc 0 class 9097 mthd 1b0c data 80500904
When the issue first came up, Xi and I worked on it a little bit in IRC. The clutter tests also fail on this system, with an "Invalid Operation" error in COGL (GL Error (1282)). This points again towards an issue in Mesa. This shows up as: "(lt-actor-pick:3619853): Cogl-WARNING : 19:30:35.549: driver/gl/cogl-framebuffer-gl.c:1554: GL error (1282): Invalid operation" - line 1554 in driver/gl/cogl-framebuffer.c calls upon glReadPixels (GE( ctx, glReadPixels (x, y, )- on an Intel Skylake system, and a machine with AMDGPU graphics, this error does not occur.
Over the weekend, I had Bruce try a build of GNOME (including gnome-maps) and KDE Plasma on his Haswell system, with an NVIDIA GeForce GT210. Note that the GPU in this case is an NVIDIA Quardo 2000, belonging to a later family. In his case, gnome-maps worked properly, as did Plasma. Looking at various forum threads online tells me that this is a regular problem that became more common after the last KDE release (in the case of Plasma at least). There were multiple reports of LFS Users having this issue on LinuxQuestions, as well as some Slackware, OpenSUSE Tumbleweed, Gentoo, and Arch.
Note that GL Error 1282 is very common in Minecraft as well - which is another clue. This only occurs in multithreaded applications, such as Minecraft! This clue came up during a discussion about multithreaded applications today in Systems Programming.
Continuing down the rabbit hole, I looked at the difference in generations between Bruce's GT210 and my Quadro 2000. I then looked on the Mesa bug tracker specifically for Minecraft related issues (just because I knew it would use Multithreaded operations). I came across https://gitlab.freedesktop.org/mesa/mesa/-/issues/5871 which covers someone's recent Minecraft crashses with Mesa-21.3.4. In that bug report, there was a link to a set of patches to make the Nouveau driver threadsafe. Those patches can be found here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10752
Now is a good time to bring up the generational differences. NV50 covers the GT 210 and various cards from 2009-2011. https://nouveau.freedesktop.org/CodeNames.html#nv50familytesla is a good link to see an overview of those generations. My Quadro 2000 is a NVC0 card. I currently have a GT210 in the other machine, but I will swap it over to the Quadro 2000 to test the changes. I also have several other lower-end NVIDIA GT series cards from later families, all purchased between 2016-2019. While I could pull those out to test, I think the Quadro 2000 and the GT 210 will be sufficient for this particular problem.
Google and the forum results show that almost every generation from NVC0 onward has issues with multithreaded OpenGL applications. This makes sense, because Mesa uses the newer NIR compiler internally instead of LLVM for those families of GPUs, so multithreaded is forced to be enabled by default. In this case, it'll cause issues with Plasma and gnome-maps as well, since gnome-maps uses Clutter and Clutter is heavily multi-threaded.
My test plan is as follows (and I hope to have this done in the next day or two):
- Test Plasma (Wayland), gnome-maps, GNOME (Wayland), and Minecraft on my separate system with the Quadro 2000, and the GT210 installed.
- With the GT210 installed, also attempt playing a video using VLC to test the claim regarding hardware acceleration being broken with those patches upstream (only affecting the NV50 family of GPUs). If this turns out to be the case, I will attempt to pull the nv50* chunks from the upstream changes, since the NV50 series doesn't seem to be affected by this issue. Note that the particular VDPAU-related extensions on NV50 GPUs also don't seem to be enabled without firmware... which we can't carry, so this might be a non-issue for NV50 GPUs on LFS.
My debug system also has a bunch of kernel options enabled for additional debugging features, and has debug symbols for all packages built in (no stripping). Most, if not all, packages using meson are built with "--buildtype=debug" over there, which I couldn't do on my main development machine without a massive rebuild.
If the issues get fixed on the debug machine, I will install the new versions on my development system to verify the issue is fixed over there as well and then drop the patch into the book. After this is over, I will immediately turn my attention towards the remainder of my tickets. I would like to get this resolved as soon as possible.
Change History (7)
comment:1 by , 3 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 3 years ago
comment:3 by , 3 years ago
Before I submit the patch, I've been working on reproducing things properly. Here's the kernel output right before the panic when running 'startplasma-wayland' on the GT210 (NV50 series). Again, this is before the patch has been applied:
[332778.703482] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 8 [ksplashqml[28868]] get 000010f560 put 00001108d8 ib_get 0000038a ib_put 0000038e state 80007098 (err: INVALID_CMD) push 00400040 [332778.715000] nouveau 0000:01:00.0: gr: DATA_ERROR 00000005 [INVALID_ENUM] [332778.715009] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 130c data 00000000 [332778.715025] nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] [332778.715028] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 0fc0 data 00047454 [332778.715039] nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] [332778.715041] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 0fc8 data 000477b8 [332778.719764] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 8 [ksplashqml[28868]] get 000010f560 put 00001108d8 ib_get 0000038c ib_put 0000038e state 80007098 (err: INVALID_CMD) push 00400040 [332778.731262] nouveau 0000:01:00.0: gr: DATA_ERROR 00000005 [INVALID_ENUM] [332778.731272] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 130c data 00000000 [332778.731297] nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] [332778.731301] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 0fc0 data 00047454 [332778.731312] nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] [332778.731314] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 0fc8 data 000477b8 [332779.813288] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 8 [ksplashqml[28868]] get 000016e508 put 000016f988 ib_get 000000da ib_put 000000de state 80007550 (err: INVALID_CMD) push 00400040 [332779.826753] nouveau 0000:01:00.0: gr: DATA_ERROR 00000005 [INVALID_ENUM] [332779.826763] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 12e8 data 08020402 [332779.829522] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 8 [ksplashqml[28868]] get 000016e508 put 000016f988 ib_get 000000dc ib_put 000000e2 state 80007550 (err: INVALID_CMD) push 00400040 [332779.831921] nouveau 0000:01:00.0: gr: DATA_ERROR 00000005 [INVALID_ENUM] [332779.831930] nouveau 0000:01:00.0: gr: 00100000 [] ch 8 [003f8a4000 ksplashqml[28868]] subc 3 class 8597 mthd 12e8 data 08020402 [332781.485324] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 9 [plasmashell[28911]] get 00000325f0 put 0000034024 ib_get 0000004a ib_put 00000055 state 80007228 (err: INVALID_CMD) push 00400040 [332781.536167] nouveau 0000:01:00.0: gr: TRAP_MP_EXEC - TP 0 MP 0: 00000010 [INVALID_OPCODE] at 07fdc0 warp 4, opcode 00e5e5e5 00e5e5e5 [332781.536181] nouveau 0000:01:00.0: gr: TRAP_MP_EXEC - TP 0 MP 1: 00000010 [INVALID_OPCODE] at 07fdc0 warp 1, opcode 00e5e5e5 00e5e5e5 [332781.536186] nouveau 0000:01:00.0: gr: 00200000 [] ch 9 [003f894000 plasmashell[28911]] subc 3 class 8597 mthd 1b0c data 1000f010
comment:4 by , 3 years ago
https://linuxfromscratch.org/~renodr/NouveauCrashPlasma.jpeg
A picture of the crash that occurs
comment:6 by , 3 years ago
Tested with:
KDE Plasma/Wayland
GNOME Wayland
VLC Media Player
MPlayer
Kdenlive
Xine
Discord
Zoom
Minecraft
I've also tested this on a GT 610, GT 210, GT 730, Quadro 2000, and GT 1030.
comment:7 by , 3 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I've discovered some additional crashes in gst-plugins-base when running the tests (note that I'm installing things at the moment on my debugging system, haven't tested the Mesa fix yet). It looks like it was the gstgl* tests.