The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Parsing XML with chained handlers

I have to write code in PHP to parse XML. I've come up with a design but I'm
wondering if I'm on the right track and especially I think I'm overengineering the
damn thing and would appreciate your expert input.

Suppose my XML is like this :

<stuff>
<carcollection>
    <car>
        <make>Honda</make>
        <type>Civic</type>
    </car>
    <car>
        <make>Toyota</make>
        <type>Prius</type>
    </car>
</carcollection>
<petcollection>
    <pet>
        <type>dog</type>
        <name>Pluto</type>
    </pet>
    <pet>
        <type>cat</type>
        <name>Fritz</type>
    </pet>
</petcollection>
</stuff>

I have to supply callbacks to the parser, that are called on tag begin, tag end, and text.

So my idea is to create several parsing classes, one for every nested section of the file.
At the beginning I create a root. Then on entering every section and subsection, I create an instance
of the class that knows how to parse that section, and set the parser callbacks to the methods of the class.

So when I see <carcollection>, I say

    $carCollectionParser = new CarCollectionParser( ... );
    set parser callbacks to methods of $carCollectionParser;

When $carCollectionParser is called on <car>, I say

    $carParser = new CarParser( ... );
    set parser callbacks to methods of carParser;

etc...

When $carParser is called on </car>, I have saved in $carParser a
reference to the parsing class for the previous level (in this case, $carCollectionParser) so I can set the
callbacks back to this parser.

That way, every section has its own parser and the contents of each section are
cleanly seperated. The advantage of this is that for example, <car> has a <type> tag, but so does <pet>.
With my method, those 2 cases will be handled by different parsing classes.

Now comes the part where I doubt...
All the examples I see on the net are very basic.
Usually all they do is save the current tag to a global and use that to know where
they are. This certainly wouldn't work with the <type> issue I mentioned above, would it???
All the XML files I've had to parse were complex enough for this to be an issue.

On the other hand- am I overengineering this or what ? Why can't I find any examples
of a design similar to mine ? Because it's too complex and there's a simpler way of doing it ?

Granted, this means creating and chaining a whole lot of parsing classes, even for just one tag
that has a short text, hence my feeling of overengineering.

So- anyone know of any better method for this ?

Thanks
(and sorry if this post is long and boring, but... who said design of software had to be fun??)

Thursday, April 10, 2008
 
 
I'm the one who started this thread, for some reason I got logged out...
28 Projects Later Send private email
Thursday, April 10, 2008
 
 
I did something similar in an open source project that I was working on at one time.
I kept a stack of handlers.  Each handler had methods like performActionIn, performActionOut, processChildResult.

The performActionOut method would return some object, which would be passed to the parent's processChildResult method.  The parent was expected to take the object and know where to put it.

Here's some links in case you want to look.  The source is in java.

The config file which tells me which handlers to use for certain documents:
http://ramses.svn.sourceforge.net/viewvc/ramses/trunk/RamsesServer/etc/xpath-parser.xml?view=markup

This dir contains all the handler classes and the parser itself:
http://ramses.svn.sourceforge.net/viewvc/ramses/trunk/RamsesServer/src/net/sf/ramses/xml/
Ted Send private email
Thursday, April 10, 2008
 
 
Have you looked at SimpleXML or some of the DOM-based parsers? They parse the whole thing into a PHP object for you. It's a lot easier.
JW
Thursday, April 10, 2008
 
 
Handlers would be a classic GoF implementation, but I think it would create a lot of code bloat for complex schemas. When I have a quick-and-dirty use-case, I really enjoy the simplicity of XmlManager's JavaBean approach. For more complex situations, we've gotten pretty far with using type systems. All we write are is the schema and everything else is taken care of by our frameworks.

Its best to steal if you can and I think these guys did it really well:

http://www.ricebridge.com/products/xmlman/examples/beans/README.htm
Benjamin Manes Send private email
Friday, April 11, 2008
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz