The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

ECMAScript subset parser in - hand code or use generator

Hi,

I need a parser to embed in my product, target language is subset of ECMAScript, implementation in C++.

Now I know how to hand code a parser (have done many times in past).

From the other side there are parser generators out there that presumably will do it faster than I will do it manually. Problem is I have no experience with generators and don't know what to expect. I looked into Lemon http://www.hwaci.com/sw/lemon/ and it looks good on paper.

Can anyone who has experience in both hand coding parsers and using generators tell me what kind of problems to watch for in generators and if it is really worth learning how to use them or I just better do it the way I know?

Thanks in advance
TN Send private email
Monday, December 24, 2007
 
 
I've used GNU bison on some personal projects, and spent a bit of time at work maintaining hand-written recursive descent parsers. (This isn't the main focus of my job; they're just things that cropped up.)

I've found GNU Bison way better than writing by hand. I suspect Lemon would be the same. Quicker to create, easier to modify, automatic detection of ambiguities, fewer bugs, the source code looks vaguely like a grammar diagram. If you've got a specific compelling reason to write the parser by hand, then you should, but in the absence of any such you'd be best off getting the computer to do the bulk of the work for you.

(You may have to meet it halfway for certain things, but it should still be quicker than doing it yourself!)
Tom_
Monday, December 24, 2007
 
 
Tom, thanks for reply.

Regarding bison. Does it generate lexer and parser as separately usable entities? Is it possible to call the lexer without calling the parser and tokenize the input? I need this to get tokens for syntax coloring.
TN Send private email
Monday, December 24, 2007
 
 
Why not use Mozilla SpiderMonkey (JavaScript-C) Engine?
SpiderMonkey is the code-name for the Mozilla's C implementation of JavaScript:
http://www.mozilla.org/js/spidermonkey/
Dan Shappir Send private email
Monday, December 24, 2007
 
 
Dan, thanks, I will look at SpiderMonkey but probably will decide against it because I need quite small subset of ECMAScript and also due to licensing issues.
TN Send private email
Monday, December 24, 2007
 
 
BTW, if you are only planning to target Windows then adding support for JScript is easy - just use the Microsoft Scripting Control. It's a bit tricky from C++, but still much easier than implementing the parser yourself.
Dan Shappir Send private email
Monday, December 24, 2007
 
 
Dan, unfortunately it is not Windows-only. The product is multi-platform (Windows/Mac).

But thanks anyway.
TN Send private email
Monday, December 24, 2007
 
 
Bison only creates a parser; you can supply your own scanner, or use bison's companion flex to create one. bison and flex are commonly used together, as they are complementary and integrate straightforwardly, but you can use them seperately. (Indeed, one of the hand-written parsers I mentioned above uses flex for its scanning.)

When used together, you can still call flex seperately in order to extract tokens from a particular file without parsing.

I mention flex and bison as examples of packages that provide the benefits of an automated approach, rather than necessarily recommending them unreservedly. They have the definite advantage of being proven code and a de facto standard, big plus points in my book, but the manuals aren't the greatest, and keeping track of source code locations isn't done for you automatically. (Annoying!)

To my eyes, though, the advantages of the generated parser approach still shine through, so whether you use bison and flex or not I'd still recommend this kind of package over writing something by hand.
Tom_
Monday, December 24, 2007
 
 
Well, if you are feeling adventurous, and your compiler is up to it, check out CliPP: http://clipp.sourceforge.net/
Dan Shappir Send private email
Monday, December 24, 2007
 
 
Why not use LUA http://www.lua.org/ which which is portable, cross platform, embeddable etc. Could be a viable option.

There are also embeddable C/C++ interpreters. I wrote one many years ago, which is heavilly used in one of my commercial apps.
Neville Franks Send private email
Monday, December 24, 2007
 
 
If you're looking at parser/lexer generation, I'd strongly recommend checking out ANTLR. You'll need Java to run it, but it generates lexers and parsers for Java, C#, C++ and a bunch of other languages. It uses much more powerful parsing algorithms than Bison/YACC/Lemon, which means you spend less time fiddling with your grammar to get rid of "shift/reduce conflicts".

http://www.antlr.org
Chris Tavares Send private email
Monday, December 24, 2007
 
 
I've just started using ANTLR for a small project and so far I'm quite pleased with it, certainly easier than hand coding for anything non trivial.
Tony Edgecombe Send private email
Wednesday, December 26, 2007
 
 
Do you *need* a subset of ECMAScript, or do you simply need an embeddable language and think something like ECMAScript makes sense? There are other languages out there that are easy to embed and very powerful. Tcl and Lua come to mind. Why invent yet another language?
Bryan Oakley Send private email
Monday, December 31, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz