Characters missing from XALAN output 2003-03-28 - By Holliday, Donald B. (LNG-CSP)
We are using XERCES 1.4.2 and XALAN 1.2.2 (and can't update the versions until we get an approved project).
We have an application that parses a UTF-8 encoded XML document using XERCES. We are using a FileInputStream to read the document. We then have XALAN transform the document and write the output as 8859-1 (<xsl:output method="text" encoding="iso-8859-1"/>).
If, after creating the parser, we call DOMParser.setCreateEntityReferenceNodes(TRUE) then characters represented as character reference entities DO NOT appear in the output.
If, after creating the parser, we call DOMParser.setCreateEntityReferenceNodes(FALSE) then characters represented as character reference entities DO appear in the output.
Some of these character entity references are defined in the DTD as <!ENTITY ast "&#x002A;"> <!-- ASTERISK OPERATOR --> <!ENTITY nbsp "&#x00A0;"> <!-- NO-BREAK SPACE --> <!ENTITY lsqb "&#x005B;"> <!-- LEFT SQUARE BRACKET --> <!ENTITY rsqb "&#x005D;"> <!-- RIGHT SQUARE BRACKET --> <!ENTITY sect "&#x00A7;"> <!-- SECTION SIGN -->
This behavior is consistent on both Win2K and Solaris.
We speculate that DOMParser.setCreateEntityReferenceNodes(TRUE)causes the parser to create a special node for these character entity references instead of expanding them inline with the other text. When XALAN gets the DOM tree built this way it doesn't see the contents of the entity reference nodes, so they don't show up in the output.
Is our speculation correct?
Does anyone know for a fact why XALAN behaves two different ways depending on how we set DOMParser.setCreateEntityReferenceNodes( ... )?
Does anyone know how we can get XALAN to write out the value of the entity reference nodes when we have DOMParser.setCreateEntityReferenceNodes(TRUE)?
Thanks,
Donald Holliday
|
|