The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

lex and yacc design help


I hope I have the right discussion group. If not I'd
appreciate pointers to where I should be looking.

I need to develop a specification language which is
hierarchical in nature.

For example, a system consists of other systems etc.
In terms of properties each system is similar.
A snippet of the envisoned language will look like...

  property_AA: blah
  property_BB: blah
    property_AA: blah
    property_BB: blah

The nesting depth is arbitrary but can be limited to
some sane value. Are there any grammar specs. similar
to this which I can use as a starting point?
ravi kumar Send private email
Monday, January 07, 2008
For something that simple, why do you need lex and yacc? You could parse that with a couple dozen lines of code and some regexps.
a.nerd Send private email
Monday, January 07, 2008
Why not use XML?

This way you are building on top of a standard and have a wealth of tools to use. And you'll be able to use prebuilt parsers as a basis for your code - should you be wanting to code. And finally, you can specify the "grammar" of your XML with a nice DTD or XSD.

Anyone else got any thoughts?
my name is
Tuesday, January 08, 2008
I use Bison/Flex and what you want to do sounds pretty easy once you know how. The problem is that the tools are quite difficult to understand.

These links might help

If I were starting a new project I'd probably try ANTLR. It looks easier, and there's a book
Robert Send private email
Tuesday, January 08, 2008
Thanks for the input.

1. Why lex and yacc are needed.
lex and yacc is needed to parse the properties.
The properties are large and complex. I need C++ container
classes to store and manipulate them. The only difficulty
I have is in parsing grammar with arbitrary recursion.

2. Why not XML.
It will be XML. The doc. is human created, so I wanted a simpler grammar which I can convert to XML.

3. Flex and Bison.
I think lex and flex are same in redhat. Don't know about

Ravi Kumar
Tuesday, January 08, 2008
Bison/Flex age GNU versions of yacc/lex. You can get Windows builds of theme here
Robert Send private email
Tuesday, January 08, 2008
Bison might have some licensing requirements under GPL relating to libraries that you must distribute with your software.  This was the case some years ago, and I'm not sure about the current licenses.

Investigate closely if you are going to be selling software to users, as opposed to using the parsing system internally.  If it's still a problem there are many alternatives, such as ANTLR, which runs under Java but can generate C++ code.
Tuesday, January 08, 2008
I wrote a grammar a few years ago that's pretty similar to the one you're describing here (with some extra sugar, like an 'import' statement and inheritance) that made it much more convenient for my project than XML. I used JavaCC, though I was also considering ANTLR.

At any rate, your grammar looks remarkably similar to JSON. Here's an example of some JSON code snippets:

And here's the basic JSON grammar with pictures!!!):

And, finally, here's an example of how to build a JSON interpreter using ANTLR with a Java back-end:

Even though back-end is in Java, the front-end is mostly re-usable for a C++ implementation. But before you do any real work with ANTLR (it has kind of a steep learning curve), walk through some of the tutorials here:

Also, ANTLR has tons of pre-existing grammars that you can download and inspect. For example, whenever I'm writing a new grammar, I always steal the Expression hierarchy from the Java 1.5 grammar (cuz I can never remember the operator precedence table on my own). Here's the list of grammars you can download:

Finally, I should mention that I have zero experience with lex/yacc. My first parser was written by hand using my own state machine (yikes!) and after that I've used (mostly) JavaCC and (a little) ANTLR. Although the learning curve for ANTLR is steeper, it's a far superior platform for writing parsers.

Incidentally, yacc is a LALR parser, whereas ANTLR is a LL(k) parser. That (mostly) shouldn't matter to you right now, but here are a couple of wikipedia pages to explain the difference to you:

In many cases, an LALR parser gives you better ability to implement complex recursive structures than an LL(k) parser. But ANTLR also has the ability to implement syntactic and semantic predicates, which pretty much ameliorates the differences.

Oh, and by the way, someone earlier in the thread suggested just using regular expressions. Don't do it. Although it'd be possible (and I use regular expressions all the time, for lots of stuff), your grammar will include recursion, which would make a regex implementation extremely difficult. You seriously don't want to go there.

Good luck! And let us know how it goes!
BenjiSmith Send private email
Wednesday, January 09, 2008
Also have a look ar Coco/R...
Johnny Moondog
Sunday, January 13, 2008
I finally made substantial progress. I briefly veered
towards antlr (which is extremely interesting, btw and
would have saved me tons of XML generation work) but
my compiler has to be embedded in other simulators, which
only support C/C++.

The trick to supporting recursive language is to find the
atom and define the atom as one or more atoms. (seems simple

  | systemBlocks systemBlock

Now I have another challange.
Suppose a statement block is a list of tokens. Every
token must be mentioned, but once only. But the order
is not important...

So a system block can have a name, location, value, status
predecessor, and successor (let's say)
I got this going by specifying all above tokens in every
possible order. Which is a lot. Then I got 3 more tokens
to add (like owner, implementor etc.) Quickly this language
is going south.

How do I specify in my grammar that the block should have
all tokens but in any order. I realize I can maintain
records, but in a recursive language that's rather tedious.

I looked at other languages and this seems to be a rather
unique requirement.
Ravi Kumar Send private email
Thursday, January 17, 2008
>>Suppose a statement block is a list of tokens. Every
token must be mentioned, but once only. But the order
is not important...

So a system block can have a name, location, value, status
predecessor, and successor<<

Wouldn't this cover it?

SystemBlock ::= BlockUnit*
BlockUnit := name | location | value | status | predecessor | successor

And actually, that seems pretty standard. Any grammar that accepts, say, ASCII, would have something similar in there, listing all the ASCII chars in there.

But maybe I'm misinterpreting...
Dotimus Send private email
Friday, January 18, 2008
your solution partially solves the problem. But it doesn't
address the case where one or more are missing or multiply
specified. The trick is to allow one and only one of each
token to be present in the systemBlock.

Going with your solution, I could keep a record of every
block unit and then when the systemBlock parsing is closed,
I'd examine my records, and print out error msgs if things
are missing or duplicated.
Ravi Kumar Send private email
Friday, January 18, 2008
There's a reason that yacc lets you attach actions to your grammar productions. ;-)

While my theory is rusty, I suspect you're moving out of the context-free languages (which is what YACC handles) into the context-sensitive area. Much like regular languages (expressed by regexes) can't handle any recursive structures, context-free languages can't handle context-sensitive structures.

You can, however, write regular C code to do this. So that's the best bet.
Chris Tavares Send private email
Monday, January 21, 2008

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz