The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.


I have a small task at hand, parsing a text file (using C++) which has an XML like structure: some tag-like markers surrounding data. I am trying to decide between:

a) read each line in file one at a time, parse it using regular expressions and jump to a big if/switch block to decide what to do.

b) use an off-the-shelf parser

Option b) sounds like a better idea, but I don't know if it will end up being an overkill. I've done a) in the past using perl. It works OK, but it is hard to maintain, modify, etc.

I've done some research and it looks like the lemon parser and the Gold parser stand out. Anyone used these before? Any comments? Any other suggestions for an easy to use/extend/maintain parser?

Tuesday, September 20, 2005
If it's really XML, use an XML parsing library.

If it's not XML, roll your own if you're up to it and it's only a small job. But remember that using regular expressions is not the best way to parse things other than for picking out tokens and small sub-expressions. If a big complex regex doesn't match, you will not easily report back to the user where exactly it failed. And nested XML-like structures get tricky.

Consider using a parsing library if the code will have any substance or permanence.
Ian Boys Send private email
Tuesday, September 20, 2005
If it's xml, or even html, use at _least_ expat.

Otherwise, use the relevant library.

If no parsers exist, find a parser-generator and learn that, and use it to make your parser.

Finally, if all other options fail, code it yourself.
Arafangion Send private email
Tuesday, September 20, 2005
Try learn yourself lex and yacc (or similar tools for whatever language you're into)

Consider itself a worthwhile learning exercise even if it is overkill in this particular situation :)
Matt Send private email
Tuesday, September 20, 2005
Another option is Boost.Spirit (, search for Spirit), which will allow you to specify a parser directly in C++ code. In other words, the parser is not generated from a parser description by a parser generator. I've used it a little bit, and my initial impression is that it's pretty nice.
Tuesday, September 20, 2005
Have a look at Coco/R - a lot cleaner than lex or yacc ;-)
Johnny Moondog
Wednesday, September 21, 2005
What I am parsing is text-based, but not XML or HTML. It just resembles XML in that it has data marking tag-like markers.

I'll look into one of the parsers.

Wednesday, September 21, 2005
Do you have any say over what text format gets used? If so, did you exclude xml for any particular reason?
BenjiSmith Send private email
Wednesday, September 21, 2005
It is someone else's format so for reading I gotta use their format, however, I can save it anyway I want. Should I go with xml to save it back out?
Wednesday, September 21, 2005
I'd save it back out in the same format as you loaded it in.

This would make verifying a load a case of load-save-diff, and would allow you to use the load code both for files you get from elsewhere, as well as files you get internally.

On the other hand, it is quite common to write one program to read in some format and squirt out some XML equiv.  The main tool then works with the XML equiv.  This makes it straightforward to bolt-on many other import/export formats.
new nick, new rep
Thursday, September 22, 2005
You can check out ANTLR ( It's very powerful and free.
Cristian Ionita
Sunday, September 25, 2005
You could try hacking up bxmlnode, the simplest XML parser in the world.  (I wrote it.)  It's at:

It's plain C (or C++), less than 500 lines, one C struct, two C functions, three example C functions.

The license is do whatever you want but preserve the copyright notice so people don't try to sue me.
Daniel Howard Send private email
Monday, September 26, 2005

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz