HTML output method and & in URLs 2003-01-28 - By Herr Christian Wolfgang Hujer
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello,
Am Dienstag, 28. Januar 2003 20:38 schrieb Voytenko, Dimitry: > Hi Angus, > > Where exactly is this happening? In the attributes HREF/SRC? If so, this > behaviour is correct. Otherwise, links won't work in the browsers. There's > no other way to disable escaping in attributes (&, <, etc). This behaviour > is defined in the xslt specification. Wrong. & in href or src attributes works fine and is translated to & in every browser I've tested it in. And wrong, this behaviour is not defined in the xslt specification.
- From the XSLT Specification, 16.2 HTML Output Method "The html output method should escape non-ASCII characters in URI attribute values using the method recommended in Section B.2.1 of the HTML 4.0 Recommendation."
And from HTML 4.0 Section B.2.2 Ampersands in URI attribute values: 'The URI that is constructed when a form is submitted may be used as an anchor-style link (e.g., the href attribute for the A element). Unfortunately, the use of the "&" character to separate form fields interacts with its use in SGML attribute values to delimit character entity references. For example, to use the URI "http://host/?x=1&y=2" as a linking URI, it must be written <A href="http://host/?x=1&y=2"> or <A href="http://host/?x=1&y=2">. We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.'
More from the XSLT Specification, 16.2 HTML Output Method "The html output method should not escape a & character occurring in an attribute value immediately followed by a { character (see Section B.7.1 of the HTML 4.0 Recommendation). For example, a start-tag written in the stylesheet as <BODY bgcolor='&{{randomrbg}};'> should be output as <BODY bgcolor='&{randomrbg};'>"
> If this is happening in the elements SCRIPT or STYLE this is also correct. Then it's correct, yes.
> I'm not quite sure how exactly you were trying to validate your HTML > document. HTML is not an XML, but SGML. Thus, you can't validate it against > XML rules. But it can validate against SGML+HTML rules, and "<a href='/cgi-bin/bla?p1=a&p2=b'>...</a>" won't validate because of the & character.
So I agree with Angus who thinks he discovered a bug resp. violation of specs.
> > I have a stylesheet processor based on Xalan and Ant which I'm using > > to generate HTML pages from XML. Within my pages, I have some URL > > strings containing arguments, separated by '&'. In the input > > document, the form is: > > > > arg1=foo&arg2�r&arg3�z > > > > The final HTML output contains the string > > > > arg1=foo&arg2�r&arg3�z > > > > which fails validation as HTML, because it uses '&' rather than '&'. > > My stylesheet defines the output method as: > > > > <xsl:output method="html" > > doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN" > > > > doctype-system="http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd" > > xalan:omit-meta-tag="yes"/> > > Angus
How exactly have you transformed it? I was unable to reproduce the bug you describe. What Xalan version do you use? I use Xalan 2.4.1.
Bye - -- ITCQIS GmbH Christian Wolfgang Hujer Geschäftsführender Gesellschafter Telefon: +49 (0)89 27 37 04 37 Telefax: +49 (0)89 27 37 04 39 E-Mail: Christian.Hujer@(protected) WWW: http://www.itcqis.com/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux)
iD8DBQE+NuShzu6h7O/MKZkRAqmpAJ4vhxYR3gC98uUVZU64uLt/l7T2ugCfc6sM qfsJbdTflxzor8gkPLmE3wQJ4 -----END PGP SIGNATURE-----
|
|