Opened 6 years ago
Closed 6 years ago
#12481 closed defect (fixed)
Locally generated BLFS Book does not display correctly using Epiphany-3.32.5
Reported by: | Wayne Blaszczyk | Owned by: | blfs-book |
---|---|---|---|
Priority: | normal | Milestone: | 9.1 |
Component: | BOOK | Version: | SVN |
Severity: | normal | Keywords: | |
Cc: |
Description ¶
By generating the BLFS book locally, then accessing the generated index.html page via epiphany, it generates the following error:
This page contains the following errors: error on line 7 at column 12: Encoding error Below is a rendering of the page up to the first error.
This issue does not occur when accessing the equivalent remote page e.g. url = http://www.linuxfromscratch.org/blfs/view/systemd/index.html
After spending hours on this, I'm still in two minds if this is a epiphany bug or a bug in the way the BLFS book is generated. I have no issue with the LFS book. It is all to do with the special characters like the copyright character.
Change History (25)
by , 6 years ago
Attachment: | test-bad.html added |
---|
comment:1 by , 6 years ago
Summary: | Locally generated BLFS Book does not display correctly using Epiphany → Locally generated BLFS Book does not display correctly using Epiphany-3.32.4 |
---|
comment:2 by , 6 years ago
I have attached two sample html pages, one good and one bad.
Here is a quote from https://www.w3.org/TR/xhtml1/
An XML declaration is not required in all XML documents; however XHTML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol.
comment:3 by , 6 years ago
I will need to see what is actually generated for your index.html. What I have is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content= "application/xhtml+xml; charset=iso-8859-1" /> <title> Beyond Linux® From Scratch (System V Edition) I note that the first seven lines for both LFS and BLFS are identical. </title> <link rel="stylesheet" href="stylesheets/lfs.css" type="text/css" />
What is on line 7 is <title>.
The content is generated by the stylesheets and those have not changed since 2007.
comment:4 by , 6 years ago
The index.html above is the same apart from the systemd Edition line. (The issue is caused by the eight line)
I can reproduce the issue using the following steps which eliminate the generation of the BOOK.
cd /tmp curl -o index.html http://www.linuxfromscratch.org/blfs/view/systemd/index.html epiphany index.html
I've also copied the index.html file to a Fedora VM, and it gives the same error.
I can see differences between the LFS and BLFS pages. LFS uses escape characters like © I don't see any registered trademark symbols in the LFS book. This is why the issue is not present in the LFS book.
This issue arose when I built my main workstation to the latest book. Looking at my previous VM builds, I can see that this issue was there for some time. I just didn't test for it.
comment:5 by , 6 years ago
I can duplicate your problem, but I think you need to take it up with the epiphany developers. I checked at http://validator.w3.org/ by doing your curl download and uploading that file to the validator.
The validator says it is valid xhtml.
I do note that if I change the ® to ® then epiphany thinks that is OK but then chokes on © (©)
Also firefox, seamonkey, falkon, and even links have no problem with the page.
comment:6 by , 6 years ago
Thanks, Bruce, I'll raise it with the epiphany developers. I also tried the validator, both firefox and epiphany. Firefox was a success but epiphany came back with:
Sorry, I am unable to validate this document because on line 1 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication. The error was: Modification of a read-only value attempted
I still think it goes against the quote in comment 2 , but lets see what the developers will say.
comment:8 by , 6 years ago
Summary: | Locally generated BLFS Book does not display correctly using Epiphany-3.32.4 → Locally generated BLFS Book does not display correctly using Epiphany-3.32.5 |
---|
No luck with epiphany-3.32.5.
follow-up: 10 comment:9 by , 6 years ago
Looking at the bug reports, it looks like epiphany/webkit is doing something wrong. OTOH, the LFS generated book works because it uses character entities in html, while BLFS does not, because it use the character itself (in ISO-8859-1 encoding). The reason for the difference is that BLFS is still using docbook-xsl-1.73.x, while LFS has been ported to docbook-xsl-1.78.1.
So maybe it is time to move to use a more recent docbook-xsl in blfs (or maybe the computer docbook-xsl, since they are fairly stable nowadays). Note that most problems coming from style sheets were encountered when rendering to pdf, something we do not do anymore for blfs.
I'll look at what happens when using system docbook-xsl-nons-1.79.2...
follow-up: 11 comment:10 by , 6 years ago
Replying to pierre.labastie:
Looking at the bug reports, it looks like epiphany/webkit is doing something wrong. OTOH, the LFS generated book works because it uses character entities in html, while BLFS does not, because it use the character itself (in ISO-8859-1 encoding). The reason for the difference is that BLFS is still using docbook-xsl-1.73.x, while LFS has been ported to docbook-xsl-1.78.1.
No this is not the reason: I've ported the blfs stylesheets to docbook-xsl-1.79.2 (actually, needs the same modifications as done in lfs for 1.78.1), and still the character itself is generated, while the entity is generated for lfs.
I've noticed the following, which is kind of weird: in the intermediate files used for profiling, blfs-html.xml and blfs-html2.xml, the character is there in UTF-8! Note that it is the same thing for lfs and blfs, so I still don't know what the difference is...
Note that if we generate the <?xml version="1.0" encoding="ISO-8859-1"?> headers, epiphany displays the book correctly. To generate them, just comment out the line:
<xsl:param name="chunker.output.omit-xml-declaration" select="'yes'"/>
in stylesheets/lfs-xsl/chunk-slave.xsl.
comment:11 by , 6 years ago
Replying to pierre.labastie:
Note that it is the same thing for lfs and blfs, so I still don't know what the difference is...
Well, easy enough: it is in the Makefile. LFS has:
sed -e "s@text/html@application/xhtml+xml@g" \ -e "s/\xa9/\©/ " \ -i $$filename;
While BLFS has just:
sed -i -e "s@text/html@application/xhtml+xml@g" $$filename;
So nothing to do with the processing...
follow-up: 13 comment:12 by , 6 years ago
Isn't the problem at stylesheets/lfs-xsl/docbook-xsl-snapshot/xhtml/html.xsl, lines 154-178?
Note that the problem in blfs is for both © and ®
comment:13 by , 6 years ago
Replying to bdubbs:
Isn't the problem at stylesheets/lfs-xsl/docbook-xsl-snapshot/xhtml/html.xsl, lines 154-178?
Note that the problem in blfs is for both © and ®
I'm not sure which problem you are talking about. If it is the fact that the intermediate files are UTF-8 encoded, the answer, AFAICT is no: adding this attribute:
encoding="iso-8859-1"
to the <xsl:output> tag in stylesheets/lfs-xsl/profile.xsl allows to keep iso-8859-1 all along.
follow-up: 16 comment:14 by , 6 years ago
According to https://www.w3.org/International/questions/qa-html-encoding-declarations, we have a bug, because the document is "application/xhtml+xml" (that is xml), and I read
XHTML 1.x served as XML: Use the encoding declaration of the XML declaration on the first line of the page. Ensure there is nothing before it, including spaces (although a byte-order mark is OK).
So we need to add the XML declaration on the first line of the html pages. See comment10.
I have not tried, but I understand that if we do not sed text/html to application/xhtml+xml, then the charset could be acknowledged. See the link above.
comment:15 by , 6 years ago
Now, there is another question: shouldn't we switch to UTF-8 in the html files? Normally, all modern and not so modern browsers should be able to understand UTF-8.
comment:16 by , 6 years ago
Replying to pierre.labastie:
I have not tried, but I understand that if we do not sed text/html to application/xhtml+xml, then the charset could be acknowledged. See the link above.
Not sure it should, but epiphany does not acknowledge text/html in Content-Type. So no need to remove the sed.
comment:17 by , 6 years ago
So the best for now is to comment out the omit-xml-declaration line.
We may also add an encoding in profile.xsl, but this is not necessary. BTW, can't we apply both revision and condition profiling at the same time? (would generate only one intermediate file, two with blfs-full)
follow-up: 20 comment:18 by , 6 years ago
I tried that and the xml line is
<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>
Is that what we want? I'm not sure what the standalone="no" means.
comment:20 by , 6 years ago
Replying to bdubbs:
I tried that and the xml line is
<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>
Is that what we want? I'm not sure what the standalone="no" means.
FWIIW https://www.w3.org/TR/2008/REC-xml-20081126/#sec-rmd
I'm not sure I understand all what is in the link, but I understand that standalone="no" is the default...
follow-up: 22 comment:21 by , 6 years ago
Weirder and weirder: In order to test the addition, I've decided to install epiphany on my debian (sid) machine (the versions are the same as what we have). First, I tested an unmodified rendered book: I was expecting the same behavior (error at line 7 or 8), and guess what, I did not get the same behavior... Actually, after entering
file:///home/pierre/downloads/BLFS-SVN/index.html
in the address bar (this is the place where I render the book), epiphany "downloaded" it to the ~/Downloads directory! So I suspected that it could be some bad setting in my ~/.config, or ~/.cache, or ~/.local directory, so I erased all three. Then I tried again, and... The page was displayed!
I wonder what webkit/epiphany does for finding the type and encoding of a file, but it seems that settings, either in user hidden directories or in global directories, influence the result!
comment:22 by , 6 years ago
Replying to pierre.labastie:
I wonder what webkit/epiphany does for finding the type and encoding of a file, but it seems that settings, either in user hidden directories or in global directories, influence the result!
Some explanation at https://bugs.webkit.org/show_bug.cgi?id=201545#c13 and also comment 14. (I would write roughly the same here, so better link to it).
Bad page