HTML output method and & in URLs 2003-01-28 - By David N Bertoni/Cambridge/IBM
Hi Angus,
Fascinating. An incorrect answer which implies that using XSLT reduces a user's sanity and flogs software all in the same message.
This is simply a bug in Xalan-J. The processor must serialize attributes so the result is well-formed HTML:
http://www.w3.org/TR/xslt#section-HTML-Output-Method http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2
The usual reason for avoiding the entity is that older browsers and http agents mishandle it. However, doing that generates HTML which is not well-formed, as you've discovered.
As an aside, xalan:use-url-escaping is related to this:
http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.1
Xalan provides this option for the same reason it doesn't use & in URI query strings -- there are lots of older agents out there that don't understand URIs encoded this way.
Dave
"Dan Jacobs" <djacobs@(protected) To: <xalan-j-users@(protected) .apache.org> jects.com> cc: (bcc: David N Bertoni /Cambridge/IBM) Subject: RE: HTML output method and & in URLs 01/28/2003 11:34 AM
Hi Angus,
(We met at the Boston ACM WebTech Group a few years ago, and your name just came up again last week in a conversation with John Kellerman.)
As far as I can tell, when you extract an entity-encoded String from your XML document, the entities are translated, and then you have your String. If you then include that String in the generated output, you have to re-entity-encode it yourself.
If you'd rather do things with Java and keep a bit more of your sanity, you might want to try JPlates instead (http://www.jplates.com). I'd love to get your opinion of it in any case.
All the best, -- Dan Jacobs -- Chairman, Boston ACM WebTech Group -- President, JPlates Inc.
> -----Original Message----- > From: Angus McIntyre [mailto:angus@(protected)] > Sent: Tuesday, January 28, 2003 2:20 PM > To: xalan-j-users@(protected) > Subject: HTML output method and & in URLs > > > I have a stylesheet processor based on Xalan and Ant which I'm using > to generate HTML pages from XML. Within my pages, I have some URL > strings containing arguments, separated by '&'. In the input > document, the form is: > > arg1=foo&arg2=bar&arg3=baz > > The final HTML output contains the string > > arg1=foo&arg2=bar&arg3=baz > > which fails validation as HTML, because it uses '&' rather > than '&'. > > My stylesheet defines the output method as: > > <xsl:output method="html" > doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN" > > doctype-system="http://www.w3.org/TR/1999/REC-html401-19991224 > /loose.dtd" > xalan:omit-meta-tag="yes"/> > > If I change the method to 'xml', the '&' entities are not > converted, so it's presumably the HTML conversion process that is > doing this. Setting: > > xalan:use-url-escaping="no" > > doesn't seem to fix the problem. > > Is there any way around this, or am I going to have to hack my > processor to reencode the '&' characters as entities? > > Thanks > > Angus > -- > angus@(protected) http://pobox.com/~angus >
|
|