The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

SAX Parsing Question

Hi,

I am parsing an XML file which is not so huge. I am using SAX parser. It doesn't fetch the data properly for exactly one element in the file.

For eg:

<?xml...>
<parent>
  <child>
  <element1>data1</element1>
  <element2>data2</element2>
  <element3>data3</element3>
  .
  .
  .
  <elementn>datan</elementn>
  </child>
</parent>



I am subclassing the DefaultHandler class and overriding the methods like:
characters(char[] ch, int start, int length)
endElement(String uri, String localName, String qName) and others...

While debugging though I found out that for one element it just fetches the data partially... i.e if <element2> has data data2 the characters() method just gets da and gets the remaining data (ta2) in the next iteration for the same element which is really weird.

This situation happens exactly at the same place for the same element. Are there any limitations? Or am I doing something wrong?

Incase the information is not clear or insufficient for you to help me, please let me know and I will give more information.

Thank you :)
Sri
Thursday, September 28, 2006
 
 
The fine manual ( http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int) ) says "SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks"

The standard technique is to stick the chunks into a StringBuffer and then flush it when you hit the corresponding end element.
Matt
Thursday, September 28, 2006
 
 
Special characters will sometimes divide up a section. I forget the details, but I'm thinking that either "s and &s (or conversely &amp; and similar. --again, I forget.) You have to hold state, convert the special characters, and concat everything.

At any rate, I strongly recommend using the xstream library for all of your parsing needs. It basically does a fool-proof xml-to-object conversion.
namehere
Thursday, September 28, 2006
 
 
Matt answered what to do, but as to why it happens your SAX component is probably just happening to reach the edge of a read block. You can test this by changing the document before this point and see if it moves.
Ben Bryant
Thursday, September 28, 2006
 
 
If your XML documents tend to be pretty small, why not use DOM instead of SAX?
BenjiSmith Send private email
Thursday, September 28, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz