Opened 9 months ago

Closed 8 months ago

#4745 closed task (fixed)

systemd-247

Reported by: Douglas R. Reno Owned by: Douglas R. Reno
Priority: normal Milestone: 10.1
Component: Book Version: SVN
Severity: normal Keywords:
Cc:

Description

A new pre-release version of systemd has been tagged.

Change History (14)

comment:1 by Douglas R. Reno, 9 months ago

Owner: changed from lfs-book to Douglas R. Reno
Status: newassigned

comment:2 by Douglas R. Reno, 9 months ago

Summary: systemd-247-rc1 (Wait until final release)systemd-247-rc2 (Wait until final release)

Now -rc2

comment:3 by Douglas R. Reno, 8 months ago

Summary: systemd-247-rc2 (Wait until final release)systemd-247

comment:4 by Douglas R. Reno, 8 months ago

Milestone: Hold10.1

comment:5 by Xi Ruoyao, 8 months ago

We may need -Dmode=release option, to prevent enabling some "experimental" features.

comment:6 by Douglas R. Reno, 8 months ago

Adding -Dmode=release sounds like a sane idea to me, especially after reading the NEWS file

comment:7 by Douglas R. Reno, 8 months ago

Looks like systemd-disect will now be installed in /usr/bin, so I'll be sure to document that in the book.

comment:8 by Douglas R. Reno, 8 months ago

We should add -Dpamconfdir=/etc/pam.d in BLFS to prevent the systemd-user file from being installed in /usr/lib/pam.d.

comment:9 by Douglas R. Reno, 8 months ago

CHANGES WITH 247:

KERNEL API INCOMPATIBILITY: Linux 4.14 introduced two new uevents "bind" and "unbind" to the Linux device model. When this kernel change was made, systemd-udevd was only minimally updated to handle and propagate these new event types. The introduction of these new uevents (which are typically generated for USB devices and devices needing a firmware upload before being functional) resulted in a number of issues which we so far didn't address. We hoped the kernel maintainers would themselves address these issues in some form, but that did not happen. To handle them properly, many (if not most) udev rules files shipped in various packages need updating, and so do many programs that monitor or enumerate devices with libudev or sd-device, or otherwise process uevents. Please note that this incompatibility is not fault of systemd or udev, but caused by an incompatible kernel change that happened back in Linux 4.14, but is becoming more and more visible as the new uevents are generated by more kernel drivers.

To minimize issues resulting from this kernel change (but not avoid them entirely) starting with systemd-udevd 247 the udev "tags" concept (which is a concept for marking and filtering devices during enumeration and monitoring) has been reworked: udev tags are now "sticky", meaning that once a tag is assigned to a device it will not be removed from the device again until the device itself is removed (i.e. unplugged). This makes sure that any application monitoring devices that match a specific tag is guaranteed to both see uevents where the device starts being relevant, and those where it stops being relevant (the latter now regularly happening due to the new "unbind" uevent type). The udev tags concept is hence now a concept tied to a *device* instead of a device *event* — unlike for example udev properties whose lifecycle (as before) is generally tied to a device event, meaning that the previously determined properties are forgotten whenever a new uevent is processed.

With the newly redefined udev tags concept, sometimes it's necessary to determine which tags are the ones applied by the most recent uevent/database update, in order to discern them from those originating from earlier uevents/database updates of the same device. To accommodate for this a new automatic property CURRENT_TAGS has been added that works similar to the existing TAGS property but only lists tags set by the most recent uevent/database update. Similarly, the libudev/sd-device API has been updated with new functions to enumerate these 'current' tags, in addition to the existing APIs that now enumerate the 'sticky' ones.

To properly handle "bind"/"unbind" on Linux 4.14 and newer it is essential that all udev rules files and applications are updated to handle the new events. Specifically:

  • All rule files that currently use a header guard similar to ACTION!="add|change",GOTO="xyz_end" should be updated to use ACTION=="remove",GOTO="xyz_end" instead, so that the properties/tags they add are also applied whenever "bind" (or "unbind") is seen. (This is most important for all physical device types — those for which "bind" and "unbind" are currently generated, for all other device types this change is still recommended but not as important — but certainly prepares for future kernel uevent type additions).
  • Similarly, all code monitoring devices that contains an 'if' branch discerning the "add" + "change" uevent actions from all other uevents actions (i.e. considering devices only relevant after "add" or "change", and irrelevant on all other events) should be reworked to instead negatively check for "remove" only (i.e. considering devices relevant after all event types, except for "remove", which invalidates the device). Note that this also means that devices should be considered relevant on "unbind", even though conceptually this — in some form — invalidates the device. Since the precise effect of "unbind" is not generically defined, devices should be considered relevant even after "unbind", however I/O errors accessing the device should then be handled gracefully.
  • Any code that uses device tags for deciding whether a device is relevant or not most likely needs to be updated to use the new udev_device_has_current_tag() API (or sd_device_has_current_tag() in case sd-device is used), to check whether the tag is set at the moment an uevent is seen (as opposed to the existing udev_device_has_tag() API which checks if the tag ever existed on the device, following the API concept redefinition explained above).

We are very sorry for this breakage and the requirement to update packages using these interfaces. We'd again like to underline that this is not caused by systemd/udev changes, but result of a kernel behaviour change.

  • UPCOMING INCOMPATIBILITY: So far most downstream distribution packages have not retriggered devices once the udev package (or any auxiliary package installing additional udev rules) is updated. We intend to work with major distributions to change this, so that "udevadm trigger -a change" is issued on such upgrades, ensuring that the updated ruleset is applied to the devices already discovered, so that (asynchronously) after the upgrade completed the udev database is consistent with the updated rule set. This means udev rules must be ready to be retriggered with a "change" action any time, and result in correct and complete udev database entries. While the majority of udev rule files known to us currently get this right, some don't. Specifically, there are udev rules files included in various packages that only set udev properties on the "add" action, but do not handle the "change" action. If a device matching those rules is retriggered with the "change" action (as is intended here) it would suddenly lose the relevant properties. This always has been problematic, but as soon as all udev devices are triggered on relevant package upgrades this will become particularly so. It is strongly recommended to fix offending rules so that they can handle a "change" action at any time, and acquire all necessary udev properties even then. Or in other words: the header guard mentioned above (ACTION=="remove",GOTO="xyz_end") is the correct approach to handle this, as it makes sure rules are rerun on "change" correctly, and accumulate the correct and complete set of udev properties. udev rule definitions that cannot handle "change" events being triggered at arbitrary times should be considered buggy.
  • The MountAPIVFS= service file setting now defaults to on if RootImage= and RootDirectory= are used, which means that with those two settings /proc/, /sys/ and /dev/ are automatically properly set up for services. Previous behaviour may be restored by explicitly setting MountAPIVFS=off.
  • Since PAM 1.2.0 (2015) configuration snippets may be placed in /usr/lib/pam.d/ in addition to /etc/pam.d/. If a file exists in the latter it takes precedence over the former, similar to how most of systemd's own configuration is handled. Given that PAM stack definitions are primarily put together by OS vendors/distributions (though possibly overridden by users), this systemd release moves its own PAM stack configuration for the "systemd-user" PAM service (i.e. for the PAM session invoked by the per-user user@.service instance) from /etc/pam.d/ to /usr/lib/pam.d/. We recommend moving all packages' vendor versions of their PAM stack definitions from /etc/pam.d/ to /usr/lib/pam.d/, but if such OS-wide migration is not desired the location to which systemd installs its PAM stack configuration may be changed via the -Dpamconfdir Meson option.
  • The runtime dependencies on libqrencode, libpcre2, libidn/libidn2, libpwquality and libcryptsetup have been changed to be based on dlopen(): instead of regular dynamic library dependencies declared in the binary ELF headers, these libraries are now loaded on demand only, if they are available. If the libraries cannot be found the relevant operations will fail gracefully, or a suitable fallback logic is chosen. This is supposed to be useful for general purpose distributions, as it allows minimizing the list of dependencies the systemd packages pull in, permitting building of more minimal OS images, while still making use of these "weak" dependencies should they be installed. Since many package managers automatically synthesize package dependencies from ELF shared library dependencies, some additional manual packaging work has to be done now to replace those (slightly downgraded from "required" to "recommended" or whatever is conceptually suitable for the package manager). Note that this change does not alter build-time behaviour: as before the build-time dependencies have to be installed during build, even if they now are optional during runtime.
  • sd-event.h gained a new call sd_event_add_time_relative() for installing timers relative to the current time. This is mostly a convenience wrapper around the pre-existing sd_event_add_time() call which installs absolute timers.
  • sd-event event sources may now be placed in a new "exit-on-failure" mode, which may be controlled via the new sd_event_source_get_exit_on_failure() and sd_event_source_set_exit_on_failure() functions. If enabled, any failure returned by the event source handler functions will result in exiting the event loop (unlike the default behaviour of just disabling the event source but continuing with the event loop). This feature is useful to set for all event sources that define "primary" program behaviour (where failure should be fatal) in contrast to "auxiliary" behaviour (where failure should remain local).
  • Most event source types sd-event supports now accept a NULL handler function, in which case the event loop is exited once the event source is to be dispatched, using the userdata pointer — converted to a signed integer — as exit code of the event loop. Previously this was supported for IO and signal event sources already. Exit event sources still do not support this (simply because it makes little sense there, as the event loop is already exiting when they are dispatched).
  • A new per-unit setting RootImageOptions= has been added which allows tweaking the mount options for any file system mounted as effect of the RootImage= setting.
  • Another new per-unit setting MountImages= has been added, that allows mounting additional disk images into the file system tree accessible to the service.
  • Timer units gained a new FixedRandomDelay= boolean setting. If enabled, the random delay configured with RandomizedDelaySec= is selected in a way that is stable on a given system (though still different for different units).
  • Socket units gained a new setting Timestamping= that takes "us", "ns" or "off". This controls the SO_TIMESTAMP/SO_TIMESTAMPNS socket options.
  • systemd-repart now generates JSON output when requested with the new --json= switch.
  • systemd-machined's OpenMachineShell() bus call will now pass additional policy metadata data fields to the PolicyKit authentication request.
  • systemd-tmpfiles gained a new -E switch, which is equivalent to --exclude-prefix=/dev --exclude-prefix=/proc --exclude=/run --exclude=/sys. It's particularly useful in combination with --root=, when operating on OS trees that do not have any of these four runtime directories mounted, as this means no files below these subtrees are created or modified, since those mount points should probably remain empty.
  • systemd-tmpfiles gained a new --image= switch which is like --root=, but takes a disk image instead of a directory as argument. The specified disk image is mounted inside a temporary mount namespace and the tmpfiles.d/ drop-ins stored in the image are executed and applied to the image. systemd-sysusers similarly gained a new --image= switch, that allows the sysusers.d/ drop-ins stored in the image to be applied onto the image.
  • Similarly, the journalctl command also gained an --image= switch, which is a quick one-step solution to look at the log data included in OS disk images.
  • journalctl's --output=cat option (which outputs the log content without any metadata, just the pure text messages) will now make use of terminal colors when run on a suitable terminal, similarly to the other output modes.
  • JSON group records now support a "description" string that may be used to add a human-readable textual description to such groups. This is supposed to match the user's GECOS field which traditionally didn't have a counterpart for group records.
  • The "systemd-dissect" tool that may be used to inspect OS disk images and that was previously installed to /usr/lib/systemd/ has now been moved to /usr/bin/, reflecting its updated status of an officially supported tool with a stable interface. It gained support for a new --mkdir switch which when combined with --mount has the effect of creating the directory to mount the image to if it is missing first. It also gained two new commands --copy-from and --copy-to for copying files and directories in and out of an OS image without the need to manually mount it. It also acquired support for a new option --json= to generate JSON output when inspecting an OS image.
  • The cgroup2 file system is now mounted with the "memory_recursiveprot" mount option, supported since kernel 5.7. This means that the MemoryLow= and MemoryMin= unit file settings now apply recursively to whole subtrees.
  • systemd-homed now defaults to using the btrfs file system — if available — when creating home directories in LUKS volumes. This may be changed with the DefaultFileSystemType= setting in homed.conf. It's now the default file system in various major distributions and has the major benefit for homed that it can be grown and shrunk while mounted, unlike the other contenders ext4 and xfs, which can both be grown online, but not shrunk (in fact xfs is the technically most limited option here, as it cannot be shrunk at all).
  • JSON user records managed by systemd-homed gained support for "recovery keys". These are basically secondary passphrases that can unlock user accounts/home directories. They are computer-generated rather than user-chosen, and typically have greater entropy. homectl's --recovery-key= option may be used to add a recovery key to a user account. The generated recovery key is displayed as a QR code, so that it can be scanned to be kept in a safe place. This feature is particularly useful in combination with systemd-homed's support for FIDO2 or PKCS#11 authentication, as a secure fallback in case the security tokens are lost. Recovery keys may be entered wherever the system asks for a password.
  • systemd-homed now maintains a "dirty" flag for each LUKS encrypted home directory which indicates that a home directory has not been deactivated cleanly when offline. This flag is useful to identify home directories for which the offline discard logic did not run when offlining, and where it would be a good idea to log in again to catch up.
  • systemctl gained a new parameter --timestamp= which may be used to change the style in which timestamps are output, i.e. whether to show them in local timezone or UTC, or whether to show µs granularity.
  • Alibaba's "pouch" container manager is now detected by systemd-detect-virt, ConditionVirtualization= and similar constructs. Similar, they now also recognize IBM PowerVM machine virtualization.
  • systemd-nspawn has been reworked to use the /run/host/incoming/ as place to use for propagating external mounts into the container. Similarly /run/host/notify is now used as the socket path for container payloads to communicate with the container manager using sd_notify(). The container manager now uses the /run/host/inaccessible/ directory to place "inaccessible" file nodes of all relevant types which may be used by the container payload as bind mount source to over-mount inodes to make them inaccessible. /run/host/container-manager will now be initialized with the same string as the $container environment variable passed to the container's PID 1. /run/host/container-uuid will be initialized with the same string as $container_uuid. This means the /run/host/ hierarchy is now the primary way to make host resources available to the container. The Container Interface documents these new files and directories:

https://systemd.io/CONTAINER_INTERFACE

  • Support for the "ConditionNull=" unit file condition has been deprecated and undocumented for 6 years. systemd started to warn about its use 1.5 years ago. It has now been removed entirely.
  • sd-bus.h gained a new API call sd_bus_error_has_names(), which takes a sd_bus_error struct and a list of error names, and checks if the error matches one of these names. It's a convenience wrapper that is useful in cases where multiple errors shall be handled the same way.
  • A new system call filter list "@known" has been added, that contains all system calls known at the time systemd was built.
  • Behaviour of system call filter allow lists has changed slightly: system calls that are contained in @known will result in a EPERM by default, while those not contained in it result in ENOSYS. This should improve compatibility because known system calls will thus be communicated as prohibited, while unknown (and thus newer ones) will be communicated as not implemented, which hopefully has the greatest chance of triggering the right fallback code paths in client applications.
  • "systemd-analyze syscall-filter" will now show two separate sections at the bottom of the output: system calls known during systemd build time but not included in any of the filter groups shown above, and system calls defined on the local kernel but known during systemd build time.
  • If the $SYSTEMD_LOG_SECCOMP=1 environment variable is set for systemd-nspawn all system call filter violations will be logged by the kernel (audit). This is useful for tracking down system calls invoked by container payloads that are prohibited by the container's system call filter policy.
  • If the $SYSTEMD_SECCOMP=0 environment variable is set for systemd-nspawn (and other programs that use seccomp) all seccomp filtering is turned off.
  • Two new unit file settings ProtectProc= and ProcSubset= have been added that expose the hidepid= and subset= mount options of procfs. All processes of the unit will only see processes in /proc that are are owned by the unit's user. This is an important new sandboxing option that is recommended to be set on all system services. All long-running system services that are included in systemd itself set this option now. This option is only supported on kernel 5.8 and above, since the hidepid= option supported on older kernels was not a per-mount option but actually applied to the whole PID namespace.
  • Socket units gained a new boolean setting FlushPending=. If enabled all pending socket data/connections are flushed whenever the socket unit enters the "listening" state, i.e. after the associated service exited.
  • The unit file setting NUMAMask= gained a new "all" value: when used, all existing NUMA nodes are added to the NUMA mask.
  • A new "credentials" logic has been added to system services. This is a simple mechanism to pass privileged data to services in a safe and secure way. It's supposed to be used to pass per-service secret data such as passwords or cryptographic keys but also associated less private information such as user names, certificates, and similar to system services. Each credential is identified by a short user-chosen name and may contain arbitrary binary data. Two new unit file settings have been added: SetCredential= and LoadCredential=. The former allows setting a credential to a literal string, the latter sets a credential to the contents of a file (or data read from a user-chosen AF_UNIX stream socket). Credentials are passed to the service via a special credentials directory, one file for each credential. The path to the credentials directory is passed in a new $CREDENTIALS_DIRECTORY environment variable. Since the credentials are passed in the file system they may be easily referenced in ExecStart= command lines too, thus no explicit support for the credentials logic in daemons is required (though ideally daemons would look for the bits they need in $CREDENTIALS_DIRECTORY themselves automatically, if set). The $CREDENTIALS_DIRECTORY is backed by unswappable memory if privileges allow it, immutable if privileges allow it, is accessible only to the service's UID, and is automatically destroyed when the service stops.
  • systemd-nspawn supports the same credentials logic. It can both consume credentials passed to it via the aforementioned $CREDENTIALS_DIRECTORY protocol as well as pass these credentials on to its payload. The service manager/PID 1 has been updated to match this: it can also accept credentials from the container manager that invokes it (in fact: any process that invokes it), and passes them on to its services. Thus, credentials can be propagated recursively down the tree: from a system's service manager to a systemd-nspawn service, to the service manager that runs as container payload and to the service it runs below. Credentials may also be added on the systemd-nspawn command line, using new --set-credential= and --load-credential= command line switches that match the aforementioned service settings.
  • systemd-repart gained new settings Format=, Encrypt=, CopyFiles= in the partition drop-ins which may be used to format/LUKS encrypt/populate any created partitions. The partitions are encrypted/formatted/populated before they are registered in the partition table, so that they appear atomically: either the partitions do not exist yet or they exist fully encrypted, formatted, and populated — there is no time window where they are "half-initialized". Thus the system is robust to abrupt shutdown: if the tool is terminated half-way during its operations on next boot it will start from the beginning.
  • systemd-repart's --size= operation gained a new "auto" value. If specified, and operating on a loopback file it is automatically sized to the minimal size the size constraints permit. This is useful to use "systemd-repart" as an image builder for minimally sized images.
  • systemd-resolved now gained a third IPC interface for requesting name resolution: besides D-Bus and local DNS to 127.0.0.53 a Varlink interface is now supported. The nss-resolve NSS module has been modified to use this new interface instead of D-Bus. Using Varlink has a major benefit over D-Bus: it works without a broker service, and thus already during earliest boot, before the dbus daemon has been started. This means name resolution via systemd-resolved now works at the same time systemd-networkd operates: from earliest boot on, including in the initrd.
  • systemd-resolved gained support for a new DNSStubListenerExtra= configuration file setting which may be used to specify additional IP addresses the built-in DNS stub shall listen on, in addition to the main one on 127.0.0.53:53.
  • Name lookups issued via systemd-resolved's D-Bus and Varlink interfaces (and thus also via glibc NSS if nss-resolve is used) will now honour a trailing dot in the hostname: if specified the search path logic is turned off. Thus "resolvectl query foo." is now equivalent to "resolvectl query --search=off foo.".
  • systemd-resolved gained a new D-Bus property "ResolvConfMode" that exposes how /etc/resolv.conf is currently managed: by resolved (and in which mode if so) or another subsystem. "resolvctl" will display this property in its status output.
  • The resolv.conf snippets systemd-resolved provides will now set "." as the search domain if no other search domain is known. This turns off the derivation of an implicit search domain by nss-dns for the hostname, when the hostname is set to an FQDN. This change is done to make nss-dns using resolv.conf provided by systemd-resolved behave more similarly to nss-resolve.
  • systemd-tmpfiles' file "aging" logic (i.e. the automatic clean-up of /tmp/ and /var/tmp/ based on file timestamps) now looks at the "birth" time (btime) of a file in addition to the atime, mtime, and ctime.
  • systemd-analyze gained a new verb "capability" that lists all known capabilities by the systemd build and by the kernel.
  • If a file /usr/lib/clock-epoch exists, PID 1 will read its mtime and advance the system clock to it at boot if it is noticed to be before that time. Previously, PID 1 would only advance the time to an epoch time that is set during build-time. With this new file OS builders can change this epoch timestamp on individual OS images without having to rebuild systemd.
  • systemd-logind will now listen to the KEY_RESTART key from the Linux input layer and reboot the system if it is pressed, similarly to how it already handles KEY_POWER, KEY_SUSPEND or KEY_SLEEP. KEY_RESTART was originally defined in the Multimedia context (to restart playback of a song or film), but is now primarily used in various embedded devices for "Reboot" buttons. Accordingly, systemd-logind will now honour it as such. This may configured in more detail via the new HandleRebootKey= and RebootKeyIgnoreInhibited=.
  • systemd-nspawn/systemd-machined will now reconstruct hardlinks when copying OS trees, for example in "systemd-nspawn --ephemeral", "systemd-nspawn --template=", "machinectl clone" and similar. This is useful when operating with OSTree images, which use hardlinks heavily throughout, and where such copies previously resulting in "exploding" hardlinks.
  • systemd-nspawn's --console= setting gained support for a new "autopipe" value, which is identical to "interactive" when invoked on a TTY, and "pipe" otherwise.
  • systemd-networkd's .network files gained support for explicitly configuring the multicast membership entries of bridge devices in the [BridgeMDB] section. It also gained support for the PIE queuing discipline in the [FlowQueuePIE] sections.
  • systemd-networkd's .netdev files may now be used to create "BareUDP" tunnels, configured in the new [BareUDP] setting.
  • systemd-networkd's Gateway= setting in .network files now accepts the special values "_dhcp4" and "_ipv6ra" to configure additional, locally defined, explicit routes to the gateway acquired via DHCP or IPv6 Router Advertisements. The old setting "_dhcp" is deprecated, but still accepted for backwards compatibility.
  • systemd-networkd's [IPv6PrefixDelegation] section and IPv6PrefixDelegation= options have been renamed as [IPv6SendRA] and IPv6SendRA= (the old names are still accepted for backwards compatibility).
  • systemd-networkd's .network files gained the DHCPv6PrefixDelegation= boolean setting in [Network] section. If enabled, the delegated prefix gained by another link will be configured, and an address within the prefix will be assigned.
  • systemd-networkd's .network files gained the Announce= boolean setting in [DHCPv6PrefixDelegation] section. When enabled, the delegated prefix will be announced through IPv6 router advertisement (IPv6 RA). The setting is enabled by default.
  • VXLAN tunnels may now be marked as independent of any underlying network interface via the new Independent= boolean setting.
  • systemctl gained support for two new verbs: "service-log-level" and "service-log-target" may be used on services that implement the generic org.freedesktop.LogControl1 D-Bus interface to dynamically adjust the log level and target. All of systemd's long-running services support this now, but ideally all system services would implement this interface to make the system more uniformly debuggable.
  • The SystemCallErrorNumber= unit file setting now accepts the new "kill" and "log" actions, in addition to arbitrary error number specifications as before. If "kill" the processes are killed on the event, if "log" the offending system call is audit logged.
  • A new SystemCallLog= unit file setting has been added that accepts a list of system calls that shall be logged about (audit).
  • The OS image dissection logic (as used by RootImage= in unit files or systemd-nspawn's --image= switch) has gained support for identifying and mounting explicit /usr/ partitions, which are now defined in the discoverable partition specification. This should be useful for environments where the root file system is generated/formatted/populated dynamically on first boot and combined with an immutable /usr/ tree that is supplied by the vendor.
  • In the final phase of shutdown, within the systemd-shutdown binary we'll now try to detach MD devices (i.e software RAID) in addition to loopback block devices and DM devices as before. This is supposed to be a safety net only, in order to increase robustness if things go wrong. Storage subsystems are expected to properly detach their storage volumes during regular shutdown already (or in case of storage backing the root file system: in the initrd hook we return to later).
  • If the SYSTEMD_LOG_TID environment variable is set all systemd tools will now log the thread ID in their log output. This is useful when working with heavily threaded programs.
  • If the SYSTEMD_RDRAND environment variable is set to "0", systemd will not use the RDRAND CPU instruction. This is useful in environments such as replay debuggers where non-deterministic behaviour is not desirable.
  • The autopaging logic in systemd's various tools (such as systemctl) has been updated to turn on "secure" mode in "less" (i.e. $LESSECURE=1) if execution in a "sudo" environment is detected. This disables invoking external programs from the pager, via the pipe logic. This behaviour may be overridden via the new $SYSTEMD_PAGERSECURE environment variable.
  • Units which have resource limits (.service, .mount, .swap, .slice, .socket, and .slice) gained new configuration settings ManagedOOMSwap=, ManagedOOMMemoryPressure=, and ManagedOOMMemoryPressureLimitPercent= that specify resource pressure limits and optional action taken by systemd-oomd.
  • A new service systemd-oomd has been added. It monitors resource contention for selected parts of the unit hierarchy using the PSI information reported by the kernel, and kills processes when memory or swap pressure is above configured limits. This service is only enabled by default in developer mode (see below) and should be considered a preview in this release. Behaviour details and option names are subject to change without the usual backwards-compatibility promises.
  • A new helper oomctl has been added to introspect systemd-oomd state. It is only enabled by default in developer mode and should be considered a preview without the usual backwards-compatibility promises.
  • New meson option -Dcompat-mutable-uid-boundaries= has been added. If enabled, systemd reads the system UID boundaries from /etc/login.defs at runtime, instead of using the built-in values selected during build. This is an option to improve compatibility for upgrades from old systems. It's strongly recommended not to make use of this functionality on new systems (or even enable it during build), as it makes something runtime-configurable that is mostly an implementation detail of the OS, and permits avoidable differences in deployments that create all kinds of problems in the long run.
  • New meson option '-Dmode=developer|release' has been added. When 'developer', additional checks and features are enabled that are relevant during upstream development, e.g. verification that semi-automatically-generated documentation has been properly updated following API changes. Those checks are considered hints for developers and are not actionable in downstream builds. In addition, extra features that are not ready for general consumption may be enabled in developer mode. It is thus recommended to set '-Dmode=release' in end-user and distro builds.
  • systemd-cryptsetup gained support for processing detached LUKS headers specified on the kernel command line via the header= parameter of the luks.options= kernel command line option. The same device/path syntax as for key files is supported for header files like this.
  • The "net_id" built-in of udev has been updated to ignore ACPI _SUN slot index data for devices that are connected through a PCI bridge where the _SUN index is associated with the bridge instead of the network device itself. Previously this would create ambiguous device naming if multiple network interfaces were connected to the same PCI bridge. Since this is a naming scheme incompatibility on systems that possess hardware like this it has been introduced as new naming scheme "v247". The previous scheme can be selected via the "net.naming-scheme=v245" kernel command line parameter.
  • ConditionFirstBoot= semantics have been modified to be safe towards abnormal system power-off during first boot. Specifically, the "systemd-machine-id-commit.service" service now acts as boot milestone indicating when the first boot process is sufficiently complete in order to not consider the next following boot also a first boot. If the system is reset before this unit is reached the first time, the next boot will still be considered a first boot; once it has been reached, no further boots will be considered a first boot. The "first-boot-complete.target" unit now acts as official hook point to order against this. If a service shall be run on every boot until the first boot fully succeeds it may thus be ordered before this target unit (and pull it in) and carry ConditionFirstBoot= appropriately.
  • bootctl's set-default and set-oneshot commands now accept the three special strings "@default", "@oneshot", "@current" in place of a boot entry id. These strings are resolved to the current default and oneshot boot loader entry, as well as the currently booted one. Thus a command "bootctl set-default @current" may be used to make the currently boot menu item the new default for all subsequent boots.
  • "systemctl edit" has been updated to show the original effective unit contents in commented form in the text editor.
  • Units in user mode are now segregated into three new slices: session.slice (units that form the core of graphical session), app.slice ("normal" user applications), and background.slice (low-priority tasks). Unless otherwise configured, user units are placed in app.slice. The plan is to add resource limits and protections for the different slices in the future.
  • New GPT partition types for RISCV32/64 for the root and /usr partitions, and their associated Verity partitions have been defined, and are now understood by systemd-gpt-auto-generator, and the OS image dissection logic.

Contributions from: Adolfo Jayme Barrientos, afg, Alec Moskvin, Alyssa Ross, Amitanand Chikorde, Andrew Hangsleben, Anita Zhang, Ansgar Burchardt, Arian van Putten, Aurelien Jarno, Axel Rasmussen, bauen1, Beniamino Galvani, Benjamin Berg, Bjørn Mork, brainrom, Chandradeep Dey, Charles Lee, Chris Down, Christian Göttsche, Christof Efkemann, Christoph Ruegge, Clemens Gruber, Daan De Meyer, Daniele Medri, Daniel Mack, Daniel Rusek, Dan Streetman, David Tardon, Dimitri John Ledkov, Dmitry Borodaenko, Elias Probst, Elisei Roca, ErrantSpore, Etienne Doms, Fabrice Fontaine, fangxiuning, Felix Riemann, Florian Klink, Franck Bui, Frantisek Sumsal, fwSmit, George Rawlinson, germanztz, Gibeom Gwon, Glen Whitney, Gogo Gogsi, Göran Uddeborg, Grant Mathews, Hans de Goede, Hans Ulrich Niedermann, Haochen Tong, Harald Seiler, huangyong, Hubert Kario, igo95862, Ikey Doherty, Insun Pyo, Jan Chren, Jan Schlüter, Jérémy Nouhaud, Jian-Hong Pan, Joerg Behrmann, Jonathan Lebon, Jörg Thalheim, Josh Brobst, Juergen Hoetzel, Julien Humbert, Kai-Chuan Hsieh, Kairui Song, Kamil Dudka, Kir Kolyshkin, Kristijan Gjoshev, Kyle Huey, Kyle Russell, Lee Whalen, Lennart Poettering, lichangze, Luca Boccassi, Lucas Werkmeister, Luca Weiss, Marc Kleine-Budde, Marco Wang, Martin Wilck, Marti Raudsepp, masmullin2000, Máté Pozsgay, Matt Fenwick, Michael Biebl, Michael Scherer, Michal Koutný, Michal Sekletár, Michal Suchanek, Mikael Szreder, Milo Casagrande, mirabilos, Mitsuha_QuQ, mog422, Muhammet Kara, Nazar Vinnichuk, Nicholas Narsing, Nicolas Fella, Njibhu, nl6720, Oğuz Ersen, Olivier Le Moal, Ondrej Kozina, onlybugreports, Pass Automated Testing Suite, Pat Coulthard, Pavel Sapezhko, Pedro Ruiz, perry_yuan, Peter Hutterer, Phaedrus Leeds, PhoenixDiscord, Piotr Drąg, Plan C, Purushottam choudhary, Rasmus Villemoes, Renaud Métrich, Robert Marko, Roman Beranek, Ronan Pigott, Roy Chen (陳彥廷), RussianNeuroMancer, Samanta Navarro, Samuel BF, scootergrisen, Sorin Ionescu, Steve Dodd, Susant Sahani, Timo Rothenpieler, Tobias Hunger, Tobias Kaufmann, Topi Miettinen, vanou, Vito Caputo, Weblate, Wen Yang, Whired Planck, williamvds, Yu, Li-Yu, Yuri Chornoivan, Yu Watanabe, Zbigniew Jędrzejewski-Szmek, Zmicer Turok, Дамјан Георгиевски

– Warsaw, 2020-11-26

Last edited 8 months ago by Bruce Dubbs (previous) (diff)

comment:10 by Douglas R. Reno, 8 months ago

One challenge with this update, and the reason why I'm doing a full system rebuild, is because of the kernel API incompatibility listed above. I know libinput has been adapted for this in BLFS, but unsure of other packages.

comment:11 by Douglas R. Reno, 8 months ago

The man pages tarball has been uploaded to anduin.

Size: 612 KB

MD5SUM: 438c98be200e1c3b308e58a3399d4465

Name: systemd-man-pages-247.tar.xz

in reply to:  9 comment:12 by ken@…, 8 months ago

Replying to renodr:

CHANGES WITH 247:

        * KERNEL API INCOMPATIBILITY: Linux 4.14 introduced two new uevents
          "bind" and "unbind" to the Linux device model. When this kernel
          change was made, systemd-udevd was only minimally updated to handle
          and propagate these new event types. The introduction of these new
          uevents (which are typically generated for USB devices and devices
          needing a firmware upload before being functional) resulted in a
          number of issues which we so far didn't address.

}}}

If you are able to read lwn,net, this was discussed a while ago. [ https://lwn.net/SubscriberLink/837033/8e8a3f0c499d267c/ ]

The real problem is that nobody reported it to the kernel devs for over a year, by which time it was a bit late.

The way I read it, anything which causes a bind or unbind event will now likely break some udev rules (example - a fedora rule which skipped if the action was not 'add' or 'change', fix to change the rule to skip only if the action was 'remove'.)

That kernel change happened three years ago. An original report (one where 4.12 was blamed, for debian) can be found at https://github.com/systemd/systemd/issues/8221. What sounds like a sensible reply from Lennart Poettering (!) was to list negatives instead of positives (for bind events).

Long story short - any added udev rules might need to be adapted.

comment:13 by Bruce Dubbs, 8 months ago

I read the iwn article and a lot of the comments. My underestanding is that udev rules for some devices are broken due to the kernel adding some new actions, specifically "BIND" and "UNBIND".

To me, assuming that the set of actions the kerenl emits will never change is a bit shortsighted, but agree that there should be better communication between kernel and systemd developers. As best i can tell, the problems outside of systemd itself are somewhat rare. I do note that on a sysv/udev build there are now rules like:

77-mm-broadmobi-port-types.rules:ACTION!="add|change|move|bind", GOTO="mm_broadmobi_port_types_end"

so eudev appears to have known about the issue for some time.

Other than updating libinput (done) and systemd, it's not clear to me that we need to make any changes to LFS/BLFS.

comment:14 by Douglas R. Reno, 8 months ago

Resolution: fixed
Status: assignedclosed

Fixed at r12065

Note: See TracTickets for help on using tickets.