Opened 13 months ago

Closed 13 months ago

Last modified 9 months ago

#17926 closed enhancement (fixed)

icu4c-73.1

Reported by: Bruce Dubbs Owned by: Bruce Dubbs
Priority: normal Milestone: 12.0
Component: BOOK Version: git
Severity: normal Keywords:
Cc:

Description

New major version.

Change History (9)

comment:1 by Xi Ruoyao, 13 months ago

It's not announced yet at https://icu.unicode.org/, and the release tarball is not uploaded yet too.

I guess we need to wait for several hours.

comment:2 by Bruce Dubbs, 13 months ago

It's there, but the name has changed.

I was able to download via a web browser to icu-release-73-1.tar.gz, but I have not figured out the magic url yet to download via wget.

in reply to:  2 comment:3 by Xi Ruoyao, 13 months ago

Replying to Bruce Dubbs:

It's there, but the name has changed.

I was able to download via a web browser to icu-release-73-1.tar.gz, but I have not figured out the magic url yet to download via wget.

No, that file is a git snapshot, not release tarball.

comment:5 by Xi Ruoyao, 13 months ago

Overview:

  • ICU 73 updates to CLDR 43 (blog) locale data with various additions and corrections.
  • ICU 73 improves Japanese and Korean short-text line breaking, reduces C++ memory use in date formatting, and promotes the Java person name formatter from tech preview to draft.
  • ICU 73 updates to the time zone data version 2023c (March 2023). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.

Common Changes:

  • CLDR 43 (blog) :
    • CLDR 43 is a limited-submission release. Data for many languages has been improved.
    • In English, the name “Türkiye” is now used for the country instead of “Turkey” (the alternate spelling is also available in the data). Where appropriate, a corresponding term is used in other languages.
    • Person name formatting data is now complete and out of “tech preview”.
    • Collation: Improved sorting & matching of “fancy quotes”, Geresh, and Gershayim in the default (CLDR root) sort order. (CLDR-15946, L2/23-016)
    • Several punctuation marks now compare primary-equal to their single and double quote ASCII fallbacks. This makes them easier to find, and groups names together that only differ in whether ASCII quotes or typographic quotes are used.
    • A new unit was added for the Beaufort scale (wind speed).
    • Improved and expanded data for likely subtags.
  • Line breaking with Japanese phrase-based breaking is now using the BudouX machine learning implementation for better quality. (ICU-22100, see ICU 71 ICU-21699 for context)
  • Phrase-based line breaking for Korean now breaks at spaces (approximates word boundaries). (ICU-22119)
  • The UnicodeSet::closeOver() function has a new option for simple case folding. (ICU-6065)
    • C: USET_SIMPLE_CASE_INSENSITIVE / Java: UnicodeSet.SIMPLE_CASE_INSENSITIVE
    • This is useful for implementations that use Simple_Case_Folding (1:1 code points) for case-insensitive matching rather than the full Case_Folding (1:n) mappings. For example, ECMAScript (JavaScript) regular expressions use simple case foldings.
  • Several small Calendar API additions to facilitate implementations of the proposed ECMAScript Temporal API. (ICU-22027)
  • Time zone data (tzdata) version 2023a (2023-mar) [same as 2023c]. Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.

ICU4C Specific Changes:

  • API changes since ICU4C 72 (Markdown) / (HTML)
  • New classes SimpleNumber and SimpleNumberFormatter, with a subset of NumberFormatter functionality for less memory, more object reuse, and fewer code dependencies. (ICU-22093)
    • The SimpleDateFormat classes now uses SimpleNumberFormatter, significantly reducing heap memory use. (ICU-20115)

Some internal changes:

  • Continuous Integration with undefined-behavior sanitizer (UBSan) and alignment sanitizer, and code changes. (ICU-22224)
  • Continuous Integration with a subset of Control Flow Integrity checks and code changes. (ICU-21374)
  • Implementation code relies more on C++11 (char16_t, nullptr, override, ...) with fewer typedefs and conditional defitions. (ICU-21833)

Migration Issues:

  • See CLDR 43 migration issues
  • For ICU users who generate ICU data directly from CLDR: In the CLDR repo, the "seed" data has been merged into the "common" file tree (CLDR-6396). As a result, there are many more locale data files in CLDR "common", but many that were moved do not have usable data item coverage and are therefore not automatically added to ICU. See the CLDR Migration section for details.
  • Interval Formats: A small number of interval formats (like “Dec 2 – 3”) have their spacing changed for consistency. This is unlikely to cause problems, as they are similar to a large number of similar changes in CLDR 42/ICU 72.
  • The “gb2312” and “big5han” Chinese collation tailorings are no longer included in the ICU binary data. (ICU-22285)
    • These are based on the code point order of their respective legacy charsets. By contrast, the “pinyin” and “stroke” sort orders, which are the defaults for the regional variants of Chinese, are based on current Unicode Han character data.
    • The ICU source data files still include the data for these tailorings. See the User Guide for how to include them in the binary data.
    • Future versions of CLDR and ICU may remove the source data for these tailorings. (CLDR-16062)

comment:6 by Xi Ruoyao, 13 months ago

It increases the number of failed tests in js-102 from 113 to 118.

comment:7 by Bruce Dubbs, 13 months ago

Owner: changed from blfs-book to Bruce Dubbs
Status: newassigned

comment:8 by Bruce Dubbs, 13 months ago

Resolution: fixed
Status: assignedclosed

Fixed at commit 8f37884d0cd4988e8d92db30ce59887569ac08a9

Update to libpaper-2.1.0.
Update to icu4c-73.1.

comment:9 by Bruce Dubbs, 9 months ago

Milestone: 11.412.0

Milestone renamed

Note: See TracTickets for help on using tickets.