memory usage of xslt processing 2006-04-19 - By Thomas Porschberg
Hi,
I have the following task: Create an arbitrary formatted file (XML/HTML/CSV whatever) based on a Select from a database.
As a constraint the amount of data fetched from the database can not be stored in memory as a whole. Another constraint is that I can not use XML-functionality in the database, I have to implement the functionality on top of our database access framework. This database access framework fetches record for record one after another.
My idea was to decorate every fetched row from the database with simple generic XML and fire this to Xalan.
Let do an example: If my result set from the database looks like:
ID Name Description -- ---- ----------- 1 "dog" "an animal may be dangerous" 2 "cat" "an animal likes milk"
I create the following XML:
<?xml version="1.0" encoding="UTF-8"?> <dataset> <row> <value>1</value> <value>dog</value> <value>an animal may be dangerous</value> </row> <row> <value>2</value> <value>cat</value> <value>an animal likes milk</value> </row> </dataset>
I create this XML as "Sax fire events" in an java class[StringArrayXMLReader], which implements the org.xml.sax.XMLReader interface. I have three methods:
public void init() throws SAXException { ch.startDocument( ); ch.startElement("","dataset","dataset",EMPTY_ATTR); }
public void close() throws SAXException { ch.endElement("","dataset","dataset"); ch.endDocument( ); }
public void parse(String [] input) throws SAXException { ch.startElement("","row","row",EMPTY_ATTR); for (int i = 0; i< input.length; ++i){ ch.startElement("","value","value",EMPTY_ATTR); ch.characters(input[i].toCharArray(), 0,input[i].length( )); ch.endElement("","value","value"); } ch.endElement("","row","row"); }
The parse method creates the <row>...</row> entries for an overhanded String array. The StringArrayXMLReader is associated with a TransformerHandler, which uses a XSL stylesheet to transform the XML to the desired output.
What happens here is, that when the fetch from the database starts I call init() ( and thus startDocument() ) and at last, after the fetch finished, I call close() (and thus endDocument()). I observed that the xslt processing starts when endDocument() is called. This is not acceptable for me because I fear the xslt processor reads all the rows into memory until endDocument() is called and in this case I take a risk to run in OutOfMemory.
My second idea was to eliminate the init()/close() methods and to consider one <row>...</row> section as complete document input for the processor. This has the disadvantage that I have to create the head and tail of the document manually (and in my example I get a NullPointerException when I the transformer is called twice).
I have the following questions: Is it possible to create the output without having the whole data in memory ? The basis XML for xslt processing <dataset> <row><value>... <row><value>... </dataset> looks very simple and the supplied XLS stylesheets will be not complex so my hope is to get it working. I also think that the task in general - produce formatted output from a potential very large data pool - should be a common one. Unfortunately I did not do much xslt-processing in the past so I lack the experience (a bit libxslt which I feed a DOM tree). If someone has some striking links I would very glad to hear. My test code I provide at:
http://randspringer.de/sax_row.tar and http://randspringer.de/sax.tar
If someone could have a look at it I would really appreciate it.
Thomas
--
|
|