Opened 2 years ago

Closed 2 years ago

#12481 closed defect (fixed)

Locally generated BLFS Book does not display correctly using Epiphany-3.32.5

Reported by: Wayne Blaszczyk Owned by: blfs-book
Priority: normal Milestone: 9.1
Component: BOOK Version: SVN
Severity: normal Keywords:
Cc:

Description

By generating the BLFS book locally, then accessing the generated index.html page via epiphany, it generates the following error:

This page contains the following errors:

error on line 7 at column 12: Encoding error
Below is a rendering of the page up to the first error.

This issue does not occur when accessing the equivalent remote page e.g. url = http://www.linuxfromscratch.org/blfs/view/systemd/index.html

After spending hours on this, I'm still in two minds if this is a epiphany bug or a bug in the way the BLFS book is generated. I have no issue with the LFS book. It is all to do with the special characters like the copyright character.

Attachments (2)

test-bad.html (231 bytes ) - added by Wayne Blaszczyk 2 years ago.
Bad page
test-good.html (275 bytes ) - added by Wayne Blaszczyk 2 years ago.
Good page

Download all attachments as: .zip

Change History (25)

by Wayne Blaszczyk, 2 years ago

Attachment: test-bad.html added

Bad page

by Wayne Blaszczyk, 2 years ago

Attachment: test-good.html added

Good page

comment:1 by Wayne Blaszczyk, 2 years ago

Summary: Locally generated BLFS Book does not display correctly using EpiphanyLocally generated BLFS Book does not display correctly using Epiphany-3.32.4

comment:2 by Wayne Blaszczyk, 2 years ago

I have attached two sample html pages, one good and one bad.

Here is a quote from https://www.w3.org/TR/xhtml1/

An XML declaration is not required in all XML documents; however XHTML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol.

comment:3 by Bruce Dubbs, 2 years ago

I will need to see what is actually generated for your index.html. What I have is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content=
    "application/xhtml+xml; charset=iso-8859-1" />
    <title>
      Beyond Linux® From Scratch (System V Edition)

I note that the first seven lines for both LFS and BLFS are identical.
    </title>
    <link rel="stylesheet" href="stylesheets/lfs.css" type="text/css" />

What is on line 7 is <title>.

The content is generated by the stylesheets and those have not changed since 2007.

comment:4 by Wayne Blaszczyk, 2 years ago

The index.html above is the same apart from the systemd Edition line. (The issue is caused by the eight line)

I can reproduce the issue using the following steps which eliminate the generation of the BOOK.

cd  /tmp
curl -o index.html http://www.linuxfromscratch.org/blfs/view/systemd/index.html
epiphany index.html

I've also copied the index.html file to a Fedora VM, and it gives the same error.

I can see differences between the LFS and BLFS pages. LFS uses escape characters like &copy; I don't see any registered trademark symbols in the LFS book. This is why the issue is not present in the LFS book.

This issue arose when I built my main workstation to the latest book. Looking at my previous VM builds, I can see that this issue was there for some time. I just didn't test for it.

Last edited 2 years ago by Wayne Blaszczyk (previous) (diff)

comment:5 by Bruce Dubbs, 2 years ago

I can duplicate your problem, but I think you need to take it up with the epiphany developers. I checked at http://validator.w3.org/ by doing your curl download and uploading that file to the validator.

The validator says it is valid xhtml.

I do note that if I change the ® to &reg; then epiphany thinks that is OK but then chokes on © (&copy;)

Also firefox, seamonkey, falkon, and even links have no problem with the page.

comment:6 by Wayne Blaszczyk, 2 years ago

Thanks, Bruce, I'll raise it with the epiphany developers. I also tried the validator, both firefox and epiphany. Firefox was a success but epiphany came back with:

Sorry, I am unable to validate this document because on line 1 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication. The error was: Modification of a read-only value attempted

I still think it goes against the quote in comment 2 , but lets see what the developers will say.

Last edited 2 years ago by Wayne Blaszczyk (previous) (diff)

comment:7 by Wayne Blaszczyk, 2 years ago

Issue raised with the Epiphany developers.

https://gitlab.gnome.org/GNOME/epiphany/issues/910

comment:8 by Xi Ruoyao, 2 years ago

Summary: Locally generated BLFS Book does not display correctly using Epiphany-3.32.4Locally generated BLFS Book does not display correctly using Epiphany-3.32.5

No luck with epiphany-3.32.5.

comment:9 by Pierre Labastie, 2 years ago

Looking at the bug reports, it looks like epiphany/webkit is doing something wrong. OTOH, the LFS generated book works because it uses character entities in html, while BLFS does not, because it use the character itself (in ISO-8859-1 encoding). The reason for the difference is that BLFS is still using docbook-xsl-1.73.x, while LFS has been ported to docbook-xsl-1.78.1.

So maybe it is time to move to use a more recent docbook-xsl in blfs (or maybe the computer docbook-xsl, since they are fairly stable nowadays). Note that most problems coming from style sheets were encountered when rendering to pdf, something we do not do anymore for blfs.

I'll look at what happens when using system docbook-xsl-nons-1.79.2...

in reply to:  9 ; comment:10 by Pierre Labastie, 2 years ago

Replying to pierre.labastie:

Looking at the bug reports, it looks like epiphany/webkit is doing something wrong. OTOH, the LFS generated book works because it uses character entities in html, while BLFS does not, because it use the character itself (in ISO-8859-1 encoding). The reason for the difference is that BLFS is still using docbook-xsl-1.73.x, while LFS has been ported to docbook-xsl-1.78.1.

No this is not the reason: I've ported the blfs stylesheets to docbook-xsl-1.79.2 (actually, needs the same modifications as done in lfs for 1.78.1), and still the character itself is generated, while the entity is generated for lfs.

I've noticed the following, which is kind of weird: in the intermediate files used for profiling, blfs-html.xml and blfs-html2.xml, the character is there in UTF-8! Note that it is the same thing for lfs and blfs, so I still don't know what the difference is...

Note that if we generate the <?xml version="1.0" encoding="ISO-8859-1"?> headers, epiphany displays the book correctly. To generate them, just comment out the line:

<xsl:param name="chunker.output.omit-xml-declaration" select="'yes'"/>

in stylesheets/lfs-xsl/chunk-slave.xsl.

in reply to:  10 comment:11 by Pierre Labastie, 2 years ago

Replying to pierre.labastie:

Note that it is the same thing for lfs and blfs, so I still don't know what the difference is...

Well, easy enough: it is in the Makefile. LFS has:

sed -e "s@text/html@application/xhtml+xml@g" \
             -e "s/\xa9/\&copy;/ "                    \
             -i $$filename;

While BLFS has just:

sed -i -e "s@text/html@application/xhtml+xml@g" $$filename;

So nothing to do with the processing...

comment:12 by Bruce Dubbs, 2 years ago

Isn't the problem at stylesheets/lfs-xsl/docbook-xsl-snapshot/xhtml/html.xsl, lines 154-178?

Note that the problem in blfs is for both &copy; and &reg;

in reply to:  12 comment:13 by Pierre Labastie, 2 years ago

Replying to bdubbs:

Isn't the problem at stylesheets/lfs-xsl/docbook-xsl-snapshot/xhtml/html.xsl, lines 154-178?

Note that the problem in blfs is for both &copy; and &reg;

I'm not sure which problem you are talking about. If it is the fact that the intermediate files are UTF-8 encoded, the answer, AFAICT is no: adding this attribute:

encoding="iso-8859-1"

to the <xsl:output> tag in stylesheets/lfs-xsl/profile.xsl allows to keep iso-8859-1 all along.

Last edited 2 years ago by Pierre Labastie (previous) (diff)

comment:14 by Pierre Labastie, 2 years ago

According to https://www.w3.org/International/questions/qa-html-encoding-declarations, we have a bug, because the document is "application/xhtml+xml" (that is xml), and I read

XHTML 1.x served as XML: Use the encoding declaration of the XML
declaration on the first line of the page. Ensure there is nothing
before it, including spaces (although a byte-order mark is OK).

So we need to add the XML declaration on the first line of the html pages. See comment10.

I have not tried, but I understand that if we do not sed text/html to application/xhtml+xml, then the charset could be acknowledged. See the link above.

comment:15 by Pierre Labastie, 2 years ago

Now, there is another question: shouldn't we switch to UTF-8 in the html files? Normally, all modern and not so modern browsers should be able to understand UTF-8.

in reply to:  14 comment:16 by Pierre Labastie, 2 years ago

Replying to pierre.labastie:

I have not tried, but I understand that if we do not sed text/html to application/xhtml+xml, then the charset could be acknowledged. See the link above.

Not sure it should, but epiphany does not acknowledge text/html in Content-Type. So no need to remove the sed.

comment:17 by Pierre Labastie, 2 years ago

So the best for now is to comment out the omit-xml-declaration line.

We may also add an encoding in profile.xsl, but this is not necessary. BTW, can't we apply both revision and condition profiling at the same time? (would generate only one intermediate file, two with blfs-full)

comment:18 by Bruce Dubbs, 2 years ago

I tried that and the xml line is

<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>

Is that what we want? I'm not sure what the standalone="no" means.

comment:19 by Bruce Dubbs, 2 years ago

I went ahead and committed the above. Lets see how it works.

in reply to:  18 comment:20 by Pierre Labastie, 2 years ago

Replying to bdubbs:

I tried that and the xml line is

<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>

Is that what we want? I'm not sure what the standalone="no" means.

FWIIW https://www.w3.org/TR/2008/REC-xml-20081126/#sec-rmd

I'm not sure I understand all what is in the link, but I understand that standalone="no" is the default...

comment:21 by Pierre Labastie, 2 years ago

Weirder and weirder: In order to test the addition, I've decided to install epiphany on my debian (sid) machine (the versions are the same as what we have). First, I tested an unmodified rendered book: I was expecting the same behavior (error at line 7 or 8), and guess what, I did not get the same behavior... Actually, after entering

file:///home/pierre/downloads/BLFS-SVN/index.html

in the address bar (this is the place where I render the book), epiphany "downloaded" it to the ~/Downloads directory! So I suspected that it could be some bad setting in my ~/.config, or ~/.cache, or ~/.local directory, so I erased all three. Then I tried again, and... The page was displayed!

I wonder what webkit/epiphany does for finding the type and encoding of a file, but it seems that settings, either in user hidden directories or in global directories, influence the result!

in reply to:  21 comment:22 by Pierre Labastie, 2 years ago

Replying to pierre.labastie:

I wonder what webkit/epiphany does for finding the type and encoding of a file, but it seems that settings, either in user hidden directories or in global directories, influence the result!

Some explanation at https://bugs.webkit.org/show_bug.cgi?id=201545#c13 and also comment 14. (I would write roughly the same here, so better link to it).

comment:23 by Bruce Dubbs, 2 years ago

Resolution: fixed
Status: newclosed

Tests OK now. Marking fixed.

Note: See TracTickets for help on using tickets.