Opened 16 years ago

Closed 16 years ago

Last modified 16 years ago

#2148 closed task (fixed)

Man-DB-2.5.1

Reported by: Julio Meca Hansen Owned by: lfs-book@…
Priority: normal Milestone: 7.0
Component: Book Version: SVN
Severity: normal Keywords:
Cc:

Description

new version of Man-DB, released as of January 28th

http://download.savannah.gnu.org/releases/man-db/man-db-2.5.1.tar.gz

Julio

Change History (8)

comment:1 by ken@…, 16 years ago

Not something I normally use, but I needed to build by-the-book to check some other things. Initially, this looks interesting - the included man pages seem to be in UTF-8, so I thought I'd give it a whirl using what it provides, and also NOT converting the UTF-8 pages from shadow. There are also comments about allowing UTF-8 for Korean, but looking at the ChangeLog I found it hard to understand what, if anything, had changed for the expected format of pages. Anybody diffing it against 2.5.0 will find it now includes gnulib, and frankly the other changes got swamped by that.

My test results are extremely strange.

In a regular tty, set for en_GB.UTF-8 and then overridden with LC_ALL, any non-ascii characters render as xFFFD (inverse question-mark for 'invalid unicode'). Of the versions of 'apropos.1' the package supplies, this affects de, es, fr, it. All render acceptably in the appropriate .UTF-8 locales.

Of the shadow UTF-8 manpages, the picture is similar and also affects the bullet character as well as all the accents and diacriticals. I tested passwd.1, or passwd.5 if not available.

The situation with the vim pages is similar - fr, it, pl, ru only display ASCII characters correctly in non-UTF-8 locales, but are ok in UTF-8 (ru_RU.UTF-8 vim.1 has some odd formatting for command-line option translations).

Now that I've built X and urxvt, the position is murkier:

Latin1 languages seem to work in urxvt as either de_DE or de_DE.UTF-8, Latin2 languages either have problems with a few characters but are ok in UTF-8 (cs, pl), or are ok in latin2 but unreadable in UTF-8 (hu). The bullet is gibberish in it_IT.UTF-8. pt_BR.UTF-8 passwd.5 is gibberish. ru_RU{,.UTF-8} passwd.1 are both gibberish. sv_SE.UTF-8 passwd.1 misrenders A-diaeresis and A-ring. tr_TR passwd.1 renders I-dot-above as Y-acute. tr_TR.UTF-8 renders passwd.1 as gibberish. ru_RU vim.1 is gibberish but ru_RU.UTF-8 vim.1 looks reasonable (that is a KOI8-R source page).

Of the "hard" languages in urxvt, zh_CN.UTF-8 mostly renders passwd.5 (initial character is not in my screen fonts), I'm on manually-installed locales and I didn't know what to define for non-UTF-8 zh_CN. Didn't test zh_TW. ko_KR is total gibberish, ko_KR.UTF-8 only displays the ascii characters, the ideograms are not present, neither are the empty boxes for characters not in any font.

At first, I thought there had been a switch to "pages are now expected to be in UTF-8, but we haven't bothered to document that", but the errors I can see with UTF-8 in urxvt make me suspect there is something else required.

comment:2 by Matthew Burgess, 16 years ago

Apologies for this Ken, but I just committed the upgrade as it didn't break my _build_, but obviously appears to have broken some functionality. Do you want me to revert the upgrade for the time being?

in reply to:  2 comment:3 by ken@…, 16 years ago

Summary: Man-DB 2.5.1Man-DB-2.5.1

Replying to matthew@linuxfromscratch.org:

Apologies for this Ken, but I just committed the upgrade as it didn't break my _build_, but obviously appears to have broken some functionality. Do you want me to revert the upgrade for the time being?

Not at all, Matt. Like I said, I was trying to use everything as UTF-8, because it looked like that might now be what is intended. I don't feel at all confident with my results. If it works, great. If it doesn't, one of our vast army of testers will confirm the problem ;)

I didn't see you'd done the update until after I commented. Hope I haven't changed what was in the summary, this ibook keyboard is a bit of an acquired taste when I flex it.

comment:4 by Matthew Burgess, 16 years ago

I wonder whether this ChangeLog entry is relevant?

Mon Jan  7 02:12:26 GMT 2008  Colin Watson  <cjwatson@debian.org>

	* configure.ac: Automatically detect the Debian multibyte patch to
	  groff.

See src/encodings.c for various conditionals around MULTIBYTE_GROFF. Do the languages/encodings there correlate with any of your test results, Ken?

in reply to:  4 comment:5 by ken@…, 16 years ago

Replying to matthew@linuxfromscratch.org:

I wonder whether this ChangeLog entry is relevant?

Mon Jan  7 02:12:26 GMT 2008  Colin Watson  <cjwatson@debian.org>

	* configure.ac: Automatically detect the Debian multibyte patch to
	  groff.

See src/encodings.c for various conditionals around MULTIBYTE_GROFF. Do the languages/encodings there correlate with any of your test results, Ken?

For sure the patch is detected:

checking for groff... groff
checking for appropriate groff options... -mandoc
checking for groff with Debian multibyte patch... yes

but I don't really see the tie-up between my test results and that table, nor how/why I got the differences between a tty and urxvt. When I saw that the shipped pages seemed to be UTF-8, I wondered if that file was out of date, or ineffective because something else took precedence.

I say 'seemed to be UTF-8' because file only admitted some were UTF-8, and vim/view seems to go out of its way to render non-UTF-8, and to sometimes render UTF-8 as two characters on my setup.

comment:6 by ken@…, 16 years ago

Resolution: fixed
Status: newclosed

Still haven't had time to build it with all the book's instructions, but looking at my fresh vanilla LFS-6.3 system I see similar problems in urxvt with the old versions (example: man vim works in ru_RU but not ru_RU.UTF-8). So, whatever else it is, it doesn't seem to be a regression.

I guess this should be marked as fixed (by Matt, in r8482) - I normally use groff-utf with the original man and UTF-8 locales so maybe something in my setup causes this oddness - no point logging it as a defect unless somebody with non-UTF-8 locales sees it.

comment:7 by alexander@…, 16 years ago

Milestone: 7.0

Manual pages are expected to be in UTF-8 only if they are in the directory of the form /usr/share/man/*.UTF_8/man*/. The only news here is that this directory is searched even in non-UTF-8 locales.

So man-db's manual pages should be either converted to non-UTF-8, or moved to the directories ending in "UTF-8". The table has to be re-synchronized with src/encodings.c.

OTOH, since the editors have such difficulty testing the functionality themselves, wouldn't it be better to drop UTF-8 from the book? My work clearly didn't have any educational effect.

comment:8 by alexander@…, 16 years ago

Milestone: 7.0

See also #2120

Note: See TracTickets for help on using tickets.