Opened 2 years ago

Closed 2 years ago

#5183 closed enhancement (fixed)

xz-5.4.0

Reported by: Bruce Dubbs Owned by: lfs-book
Priority: normal Milestone: 11.3
Component: Book Version: git
Severity: normal Keywords:
Cc:

Description

New minor version.

Change History (2)

comment:1 by Bruce Dubbs, 2 years ago

XZ Utils Release Notes

5.4.0 (2022-12-13)

This bumps the minor version of liblzma because new features were added. The API and ABI are still backward compatible with liblzma 5.2.x and 5.0.x.

Since 5.3.5beta:

  • All fixes from 5.2.10.
  • The ARM64 filter is now stable. The xz option is now --arm64. Decompression requires XZ Utils 5.4.0. In the future the ARM64 filter will be supported by XZ for Java, XZ Embedded (including the version in Linux), LZMA SDK, and 7-Zip.
  • Translations:
  • Updated Catalan, Croatian, German, Romanian, and Turkish translations.
  • Updated German man page translations.
  • Added Romanian man page translations.

Summary of new features added in the 5.3.x development releases:

  • liblzma:
  • Added threaded .xz decompressor lzma_stream_decoder_mt(). It can use multiple threads with .xz files that have multiple Blocks with size information in Block Headers. The threaded encoder in xz has always created such files.

Single-threaded encoder cannot store the size information in Block Headers even if one used LZMA_FULL_FLUSH to create multiple Blocks, so this threaded decoder cannot use multiple threads with such files.

If there are multiple Streams (concatenated .xz files), one Stream will be decompressed completely before starting the next Stream.

  • A new decoder flag LZMA_FAIL_FAST was added. It makes the threaded decompressor report errors soon instead of first flushing all pending data before the error location.
  • New Filter IDs:
    • LZMA_FILTER_ARM64 is for ARM64 binaries.
    • LZMA_FILTER_LZMA1EXT is for raw LZMA1 streams that don't necessarily use the end marker.
  • Added lzma_str_to_filters(), lzma_str_from_filters(), and lzma_str_list_filters() to convert a preset or a filter chain string to a lzma_filter[] and vice versa. These should make it easier to write applications that allow users to specify custom compression options.
  • Added lzma_filters_free() which can be convenient for freeing the filter options in a filter chain (an array of lzma_filter structures).
  • lzma_file_info_decoder() to makes it a little easier to get the Index field from .xz files. This helps in getting the uncompressed file size but an easy-to-use random access API is still missing which has existed in XZ for Java for a long time.
  • Added lzma_microlzma_encoder() and lzma_microlzma_decoder(). It is used by erofs-utils and may be used by others too.

The MicroLZMA format is a raw LZMA stream (without end marker) whose first byte (always 0x00) has been replaced with bitwise-negation of the LZMA properties (lc/lp/pb). It was created for use in EROFS but may be used in other contexts as well where it is important to avoid wasting bytes for stream headers or footers. The format is also supported by XZ Embedded (the XZ Embedded version in Linux got MicroLZMA support in Linux 5.16).

The MicroLZMA encoder API in liblzma can compress into a fixed-sized output buffer so that as much data is compressed as can be fit into the buffer while still creating a valid MicroLZMA stream. This is needed for EROFS.

  • Added lzma_lzip_decoder() to decompress the .lz (lzip) file format version 0 and the original unextended version 1 files. Also lzma_auto_decoder() supports .lz files.
  • lzma_filters_update() can now be used with the multi-threaded encoder (lzma_stream_encoder_mt()) to change the filter chain after LZMA_FULL_BARRIER or LZMA_FULL_FLUSH.
  • In lzma_options_lzma, allow nice_len = 2 and 3 with the match finders that require at least 3 or 4. Now it is internally rounded up if needed.
  • CLMUL-based CRC64 on x86-64 and E2K with runtime processor detection. On 32-bit x86 it currently isn't available unless --disable-assembler is used which can make the non-CLMUL CRC64 slower; this might be fixed in the future.
  • Building with --disable-threads --enable-small is now thread-safe if the compiler supports attribute((constructor)).
  • xz:
  • Using -T0 (--threads=0) will now use multi-threaded encoder even on a single-core system. This is to ensure that output from the same xz binary is identical on both single-core and multi-core systems.
  • --threads=+1 or -T+1 is now a way to put xz into multi-threaded mode while using only one worker thread. The + is ignored if the number is not 1.
  • A default soft memory usage limit is now used for compression when -T0 is used and no explicit limit has been specified. This soft limit is used to restrict the number of threads but if the limit is exceeded with even one thread then xz will continue with one thread using the multi-threaded encoder and this limit is ignored. If the number of threads is specified manually then no default limit will be used; this affects only -T0.

This change helps on systems that have very many cores and using all of them for xz makes no sense. Previously xz -T0 could run out of memory on such systems because it attempted to reserve memory for too many threads.

This also helps with 32-bit builds which don't have a large amount of address space that would be required for many threads. The default soft limit for -T0 is at most 1400 MiB on all 32-bit platforms.

  • Previously a low value in --memlimit-compress wouldn't cause xz to switch from multi-threaded mode to single-threaded mode if the limit cannot otherwise be met; xz failed instead. Now xz can switch to single-threaded mode and then, if needed, scale down the LZMA2 dictionary size too just like it already did when it was started in single-threaded mode.
  • The option --no-adjust no longer prevents xz from scaling down the number of threads as that doesn't affect the compressed output (only performance). Now --no-adjust only prevents adjustments that affect compressed output, that is, with --no-adjust xz won't switch from multi-threaded mode to single-threaded mode and won't scale down the LZMA2 dictionary size.
  • Added a new option --memlimit-mt-decompress=LIMIT. This is used to limit the number of decompressor threads (possibly falling back to single-threaded mode) but it will never make xz refuse to decompress a file. This has a system-specific default value because without any limit xz could end up allocating memory for the whole compressed input file, the whole uncompressed output file, multiple thread-specific decompressor instances and so on. Basically xz could attempt to use an insane amount of memory even with fairly common files. The system-specific default value is currently the same as the one used for compression with -T0.

The new option works together with the existing option --memlimit-decompress=LIMIT. The old option sets a hard limit that must not be exceeded (xz will refuse to decompress) while the new option only restricts the number of threads. If the limit set with --memlimit-mt-decompress is greater than the limit set with --memlimit-compress, then the latter value is used also for --memlimit-mt-decompress.

  • Added new information to the output of xz --info-memory and new fields to the output of xz --robot --info-memory.
  • In --lzma2=nice=NUMBER allow 2 and 3 with all match finders now that liblzma handles it.
  • Don't mention endianness for ARM and ARM-Thumb filters in --long-help. The filters only work for little endian instruction encoding but modern ARM processors using big endian data access still use little endian instruction encoding. So the help text was misleading. In contrast, the PowerPC filter is only for big endian 32/64-bit PowerPC code. Little endian PowerPC would need a separate filter.
  • Added decompression support for the .lz (lzip) file format version 0 and the original unextended version 1. It is autodetected by default. See also the option --format on the xz man page.
  • Sandboxing enabled by default:
    • Capsicum (FreeBSD)
    • pledge(2) (OpenBSD)
  • Scripts now support the .lz format using xz.
  • A few new tests were added.
  • The liblzma-specific tests are now supported in CMake-based builds too ("make test").

5.3.5beta (2022-12-01)

  • All fixes from 5.2.9.
  • liblzma:
  • liblzma:
  • Added new LZMA_FILTER_LZMA1EXT for raw encoder and decoder to handle raw LZMA1 streams that don't have end of payload marker (EOPM) alias end of stream (EOS) marker. It can be used in filter chains, for example, with the x86 BCJ filter.
  • Added lzma_str_to_filters(), lzma_str_from_filters(), and lzma_str_list_filters() to make it easier for applications to get custom compression options from a user and convert it to an array of lzma_filter structures.
  • Added lzma_filters_free().
  • lzma_filters_update() can now be used with the multi-threaded encoder (lzma_stream_encoder_mt()) to change the filter chain after LZMA_FULL_BARRIER or LZMA_FULL_FLUSH.
  • In lzma_options_lzma, allow nice_len = 2 and 3 with the match finders that require at least 3 or 4. Now it is internally rounded up if needed.
  • ARM64 filter was modified. It is still experimental.
  • Fixed LTO build with Clang if -fgnuc-version=10 or similar was used to make Clang look like GCC >= 10. Now it uses has_attribute(symver) which should be reliable.
  • xz:
  • --threads=+1 or -T+1 is now a way to put xz into multi-threaded mode while using only one worker thread.
  • In --lzma2=nice=NUMBER allow 2 and 3 with all match finders now that liblzma handles it.
  • Updated translations: Chinese (simplified), Korean, and Turkish.

5.3.4alpha (2022-11-15)

  • All fixes from 5.2.7 and 5.2.8.
  • liblzma:
  • Minor improvements to the threaded decoder.
  • Added CRC64 implementation that uses SSSE3, SSE4.1, and CLMUL instructions on 32/64-bit x86 and E2K. On 32-bit x86 it's not enabled unless --disable-assembler is used but then the non-CLMUL code might be slower. Processor support is detected at runtime so this is built by default on x86-64 and E2K. On these platforms, if compiler flags indicate unconditional CLMUL support (-msse4.1 -mpclmul) then the generic version is not built, making liblzma 8-9 KiB smaller compared to having both versions included.

With extremely compressible files this can make decompression up to twice as fast but with typical files 5 % improvement is a more realistic expectation.

The CLMUL version is slower than the generic version with tiny inputs (especially at 1-8 bytes per call, but up to 16 bytes). In normal use in xz this doesn't matter at all.

  • Added an experimental ARM64 filter. This is *not* the final version! Files created with this experimental version won't be supported in the future versions! The filter design is a compromise where improving one use case makes some other cases worse.
  • Added decompression support for the .lz (lzip) file format version 0 and the original unextended version 1. See the API docs of lzma_lzip_decoder() for details. Also lzma_auto_decoder() supports .lz files.
  • Building with --disable-threads --enable-small is now thread-safe if the compiler supports attribute((constructor))
  • xz:
  • Added support for OpenBSD's pledge(2) as a sandboxing method.
  • Don't mention endianness for ARM and ARM-Thumb filters in --long-help. The filters only work for little endian instruction encoding but modern ARM processors using big endian data access still use little endian instruction encoding. So the help text was misleading. In contrast, the PowerPC filter is only for big endian 32/64-bit PowerPC code. Little endian PowerPC would need a separate filter.
  • Added --experimental-arm64. This will be renamed once the filter is finished. Files created with this experimental filter will not be supported in the future!
  • Added new fields to the output of xz --robot --info-memory.
  • Added decompression support for the .lz (lzip) file format version 0 and the original unextended version 1. It is autodetected by default. See also the option --format on the xz man page.
  • Scripts now support the .lz format using xz.
  • Build systems:
  • New #defines in config.h: HAVE_ENCODER_ARM64, HAVE_DECODER_ARM64, HAVE_LZIP_DECODER, HAVE_CPUID_H, HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR, HAVE_USABLE_CLMUL
  • New configure options: --disable-clmul-crc, --disable-microlzma, --disable-lzip-decoder, and 'pledge' is now an option in --enable-sandbox (but it's autodetected by default anyway).
  • INSTALL was updated to document the new configure options.
  • PACKAGERS now lists also --disable-microlzma and --disable-lzip-decoder as configure options that must not be used in builds for non-embedded use.
  • Tests:
  • Fix some of the tests so that they skip instead of fail if certain features have been disabled with configure options. It's still not perfect.
  • Other improvements to tests.
  • Updated translations: Croatian, Finnish, Hungarian, Polish, Romanian, Spanish, Swedish, and Ukrainian.

5.3.3alpha (2022-08-22)

  • All fixes from 5.2.6.
  • liblzma:
  • Fixed 32-bit build.
  • Added threaded .xz decompressor lzma_stream_decoder_mt(). It can use multiple threads with .xz files that have multiple Blocks with size information in Block Headers. The threaded encoder in xz has always created such files.

Single-threaded encoder cannot store the size information in Block Headers even if one used LZMA_FULL_FLUSH to create multiple Blocks, so this threaded decoder cannot use multiple threads with such files.

If there are multiple Streams (concatenated .xz files), one Stream will be decompressed completely before starting the next Stream.

  • A new decoder flag LZMA_FAIL_FAST was added. It makes the threaded decompressor report errors soon instead of first flushing all pending data before the error location.
  • xz:
  • Using -T0 (--threads=0) will now use multi-threaded encoder even on a single-core system. This is to ensure that output from the same xz binary is identical on both single-core and multi-core systems.
  • A default soft memory usage limit is now used for compression when -T0 is used and no explicit limit has been specified. This soft limit is used to restrict the number of threads but if the limit is exceeded with even one thread then xz will continue with one thread using the multi-threaded encoder and this limit is ignored. If the number of threads is specified manually then no default limit will be used; this affects only -T0.

This change helps on systems that have very many cores and using all of them for xz makes no sense. Previously xz -T0 could run out of memory on such systems because it attempted to reserve memory for too many threads.

This also helps with 32-bit builds which don't have a large amount of address space that would be required for many threads. The default limit is 1400 MiB on all 32-bit platforms with -T0.

Now xz -T0 should just work. It might use too few threads in some cases but at least it shouldn't easily run out of memory. It's possible that this will be tweaked before 5.4.0.

  • Changes to --memlimit-compress and --no-adjust:

In single-threaded mode, --memlimit-compress can make xz scale down the LZMA2 dictionary size to meet the memory usage limit. This obviously affects the compressed output. However, if xz was in threaded mode, --memlimit-compress could make xz reduce the number of threads but it wouldn't make xz switch from multi-threaded mode to single-threaded mode or scale down the LZMA2 dictionary size. This seemed illogical.

Now --memlimit-compress can make xz switch to single-threaded mode if one thread in multi-threaded mode uses too much memory. If memory usage is still too high, then the LZMA2 dictionary size can be scaled down too.

The option --no-adjust was also changed so that it no longer prevents xz from scaling down the number of threads as that doesn't affect compressed output (only performance). After this commit --no-adjust only prevents adjustments that affect compressed output, that is, with --no-adjust xz won't switch from multithreaded mode to single-threaded mode and won't scale down the LZMA2 dictionary size.

  • Added a new option --memlimit-mt-decompress=LIMIT. This is used to limit the number of decompressor threads (possibly falling back to single-threaded mode) but it will never make xz refuse to decompress a file. This has a system-specific default value because without any limit xz could end up allocating memory for the whole compressed input file, the whole uncompressed output file, multiple thread-specific decompressor instances and so on. Basically xz could attempt to use an insane amount of memory even with fairly common files.

The new option works together with the existing option --memlimit-decompress=LIMIT. The old option sets a hard limit that must not be exceeded (xz will refuse to decompress) while the new option only restricts the number of threads. If the limit set with --memlimit-mt-decompress is greater than the limit set with --memlimit-compress, then the latter value is used also for --memlimit-mt-decompress.

  • Tests:
  • Added a few more tests.
  • Added tests/code_coverage.sh to create a code coverage report of the tests.
  • Build systems:
  • Automake's parallel test harness is now used to make tests finish faster.
  • Added the CMake files to the distribution tarball. These were supposed to be in 5.2.5 already.
  • Added liblzma tests to the CMake build.
  • Windows: Fix building of liblzma.dll with the included Visual Studio project files.

5.3.2alpha (2021-10-28)

This release was made on short notice so that recent erofs-utils can be built with LZMA support without needing a snapshot from xz.git. Thus many pending things were not included, not even updated translations (which would need to be updated for the new --list strings anyway).

  • All fixes from 5.2.5.
  • xz:
  • When copying metadata from the source file to the destination file, don't try to set the group (GID) if it is already set correctly. This avoids a failure on OpenBSD (and possibly on a few other OSes) where files may get created so that their group doesn't belong to the user, and fchown(2) can fail even if it needs to do nothing.
  • The --keep option now accepts symlinks, hardlinks, and setuid, setgid, and sticky files. Previously this required using --force.
  • Split the long strings used in --list and --info-memory modes to make them much easier for translators.
  • If built with sandbox support and enabling the sandbox fails, xz will now immediately exit with exit status of 1. Previously it would only display a warning if -vv was used.
  • Cap --memlimit-compress to 2000 MiB on MIPS32 because on MIPS32 userspace processes are limited to 2 GiB of address space.
  • liblzma:
  • Added lzma_microlzma_encoder() and lzma_microlzma_decoder(). The API is in lzma/container.h.

The MicroLZMA format is a raw LZMA stream (without end marker) whose first byte (always 0x00) has been replaced with bitwise-negation of the LZMA properties (lc/lp/pb). It was created for use in EROFS but may be used in other contexts as well where it is important to avoid wasting bytes for stream headers or footers. The format is also supported by XZ Embedded.

The MicroLZMA encoder API in liblzma can compress into a fixed-sized output buffer so that as much data is compressed as can be fit into the buffer while still creating a valid MicroLZMA stream. This is needed for EROFS.

  • Added fuzzing support.
  • Support Intel Control-flow Enforcement Technology (CET) in 32-bit x86 assembly files.
  • Visual Studio: Use non-standard _MSVC_LANG to detect C++ standard version in the lzma.h API header. It's used to detect when "noexcept" can be used.
  • Scripts:
  • Fix exit status of xzdiff/xzcmp. Exit status could be 2 when the correct value is 1.
  • Fix exit status of xzgrep.
  • Detect corrupt .bz2 files in xzgrep.
  • Add zstd support to xzgrep and xzdiff/xzcmp.
  • Fix less(1) version detection in xzless. It failed if the version number from "less -V" contained a dot.
  • Fix typos and technical issues in man pages.
  • Build systems:
  • Windows: Fix building of resource files when config.h isn't used. CMake + Visual Studio can now build liblzma.dll.
  • Various fixes to the CMake support. It might still need a few more fixes even for liblzma-only builds.

5.3.1alpha (2018-04-29)

  • All fixes from 5.2.4.
  • Add lzma_file_info_decoder() into liblzma and use it in xz to implement the --list feature.
  • Capsicum sandbox support is enabled by default where available (FreeBSD >= 10).

5.2.10 (2022-12-13)

  • xz: Don't modify argv[] when parsing the --memlimit* and --block-list command line options. This fixes confusing arguments in process listing (like "ps auxf").
  • GNU/Linux only: Use has_attribute(symver) to detect if that attribute is supported. This fixes build on Mandriva where Clang is patched to define GNUC to 11 by default (instead of 4 as used by Clang upstream).

comment:2 by Bruce Dubbs, 2 years ago

Resolution: fixed
Status: newclosed

Fixed at commit c9aabf13a1e8e1fb57688a7dea2f2ca2f1a9e1ab

    Ensure a gawk hard link is updated in Chapter 8.
    Update to iana-etc-20221209.
    Update to vim-9.0.1060.
    Update to iproute2-6.1.0.
    Update to xz-5.4.0.
    Update to bash-5.2.15.
    Update to psmisc-23.6.
    Update to mpc-1.3.0.
    Update to python3-3.11.1.
    Update to procps-ng-4.0.2.
Note: See TracTickets for help on using tickets.