HTML Serialization and Handling of Ampersands in HREF Attributes 2007-04-30 - By Klaus Malorny
Hi,
I got some problems using Xalan (and Xerces with the old org.apache.xml.serialize package as well) for serializing to the HTML format. It does *NOT* escape ampersands either as "&" or &" if it occurs in attributes designated to hold URLs, like the "href" attribute of the "a" element. Looking at the source code, it is clear that this is intentional. This puzzles me a lot. Due to a complaint of a customer I reviewed this issue and discovered that the HTML specifications clearly say that of course the ampersand, which is typically used to separate the form values, *MUST* be escaped in attributes containing URLs. I even discovered a respective note in the HTML 2.0 specification from the year 1995. Can anyone explain to me why this wrong handling exist and tell me whether this will be removed in future releases?
HTML 4.0: http://www.w3.org/TR/html401/appendix/notes.html section B.2.2 HTML 2.0: http://www.ietf.org/rfc/rfc1866.txt section 8.2.1 (page 46)
Sample:
test.xml: - - - 8< - - - <html> <body> <a href="a&b" title="a&b"/> </body> </html> - - - 8< - - -
test.xsl: - - - 8< - - - <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/"> <xsl:copy-of select="/"/> </xsl:template>
</xsl:stylesheet> - - - 8< - - -
command arguments: -in test.xml -xsl test.xsl -HTML
output: - - - 8< - - - <html>
<body>
<a href="a&b" title="a&b"></a>
</body>
</html> - - - 8< - - -
Regards,
Klaus
|
|