The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Regular Expression lib/source Recommendations?

I'm looking for a Regex lib for the app I'm working on, I can use C or C++.

I've been going over Boost Regex and GRETA, this type of code isn't my strong point, I'd like to be able to take the time to dig into the code so I know what I've got, but I essentially just need to get the functionality for now (I've looked at the sources and I don't know how long it'll take me to "get" it but it's going to take some time).  I'm a little confused about the MS research agreement that GRETA came with, at least with Boost I feel safe, at the same time I see fewer files for GRETA and I'm not really caught up in the performance, I just want something so that I have some flexibility with some input fields.

Any recommendations?  I'd probably be willing to purchase a package (at a reasonable cost), I'd like to be able to easily cut out much of the components that I'm not using.  I'm trying to limit use of code beyond OS API as I'm a bit neurotic about having control (if it's all under my umbrella then I'm then only one to blame if I get wet).

I'm really interested to know what people's experiences have been.
Jay Lauffer Send private email
Tuesday, December 21, 2004
I'll wager that it will be easier to integrate Boost Regex into your application than the other library, because Boost is so widely used. It comes with make scripts that work without any customization. I've integrated Boost into my application very easily.
Oren Send private email
Tuesday, December 21, 2004
Is this for work or fun?  Do you understand the theory behind regular expressions (and formal languages in general)?  If not, maybe this would be a good chance to learn.  If so, I'd go with boost -- John Maddock's regex library is pretty fast (and very featureful).

Also, the feature to post a message on your website is broken.
Kalani Send private email
Tuesday, December 21, 2004
Thanks Kalani, for the heads up on the comments (it should be all set now).  I consider myself to be very unknowledgeable about Regular Expressions, but maybe I'm being hard on myself.  Anyhow I found this resource related to Formal Language Theory:

It looks interesting.  I know enough about regular expression to use groups and do replaces as well as validate input in multiple formats, that's the primary reason I use them.

As far as Boost goes I've already got the libs built so I'll go ahead and use it, after all it's peer reviewed code so it must be decent.

Just based on a quick glance it looks like this Formal Language stuff is a precursor to a compiler design class, maybe one of these days I'll actually go back to college for computers.
Jay Lauffer Send private email
Tuesday, December 21, 2004
Jay, you might also want to consider whether your chosen library supports Unicode in regular expressions. For example, if you match some letters are you matching all letters - Greek, Chinese, or whatever - or just Latin or even just ASCII? Or for another example, will you ever want to find the next character with the East Asian Narrow Width property? See the article on the Unicode site at
Graham Asher Send private email
Tuesday, December 21, 2004
Justin Send private email
Tuesday, December 21, 2004
Jay, I don't think that you need to go back to college to learn this formal language stuff, provided that you're very interested in the subject.  Yes, understanding the basics of formal languages will help you create a compiler/interpreter, but it's useful for a number of other things too (even just to understand how/why things like regular expressions work).

That page that you linked to seems like a good introduction.  The famous "dragon book" would probably make a good companion to that.  I got a copy of the first dragon book (the green dragon, not the red one) for fifteen cents at my high school library years and years ago.  I've implemented Earley's algorithm for general context-free grammars (a subset of which parses regular grammars), and I've never done college work in this subject (not that it wouldn't do me some good, but it's not necessary to make a parser generator).

Conceptually it's actually very simple and straightforward.
Kalani Send private email
Tuesday, December 21, 2004
Yes Unicode.  What's ASCII? (j/k)  I don't even build ASCII versions anymore (what's the point? oh yeah Win98).  I love the Regex tester at (saves me the trouble of downloading something), but something about that site sort of annoys me (like there aren't nearly enough expressions in the library [suppose I should contribute a few]).

Seems like I'll just go with Boost, after all even Eric Niebler seems to have gone to Boost:
Jay Lauffer Send private email
Tuesday, December 21, 2004
Indeed, I have "gone over" to Boost. Sorry about the MS Research License Agreement -- that wasn't my choice. You can get it free for commercial use if you download VC++ Powertoys from Same code, different license. I know, it's dumb.

GRETA is good code, and fast, but is no longer actively maintained. Also, the Boost regex interface has been accepted into TR1 and will probably end up in the standard at some point. My suggestion would be to go with Boost.

If you're feeling adventurous, you could check out xpressive, my new pet project. It's a regex template library, but it's also a bit like a parser generator. You can read about it at


Eric Niebler
Boost Consulting
Eric Niebler Send private email
Tuesday, December 21, 2004
Thursday, December 23, 2004

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz