The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

parsers & searching

I was playing with the GOLD parser this weekend. It is pretty neat. I was just thinking, could I use a parser like that to do searches for keywords in files? I would have to generate the parsing table on the fly based on what the user enters, but after that the parser itself seems to be rather efficient..

Is that how it is done?
novice coder Send private email
Sunday, October 09, 2005
 
 
Regular expression support is enough to search for keywords in files.
Kalani Send private email
Sunday, October 09, 2005
 
 
Using a parser for searching sounds incredibly inefficient.
Andy Brice Send private email
Monday, October 10, 2005
 
 
>  Regular expression support is enough to search for
> keywords in files.

If you have a 5MB text file, and the thing you are searching for spans multiple lines, how will regex match your search term without having to load the entire file into memory? How is this efficient?
novice coder Send private email
Monday, October 10, 2005
 
 
I think Andy was speaking of inefficiency in terms of the amount of work you'd have to put in versus the benefit you gain from the work.

I've implemented several small languages and parsers (albeit with ANTLR rather than GOLD).

I would only use the parser as a module in a larger system, or if you need some sort of a frame on which to build something relatively permanent.

Other than that, the preparation, implementation of a grammar, and debugging costs far outweigh the need to parse for a few keywords. The parsers, on average, are meant for building AST's for dealing with various languages.

If it is the case that all you are looking to do is implement a keyword parser (something that parses .cpp files for C++ keywords for example), nothing will beat grep, cat, perhaps a few more UNIX utils and a couple of pipes.
Andrey Butov Send private email
Monday, October 10, 2005
 
 
Memory mapping, perhaps?

The thing is, regexps and lexers generated by tools like flex use basically the same technology, except that regexps are optimised for the case where only one expression is sought and it is not known at compile-time.  In other words, for searches.
Iago
Monday, October 10, 2005
 
 
novice coder:

>If you have a 5MB text file, and the thing you are
>searching for spans multiple lines, how will regex
>match your search term without having to load the
>entire file into memory?

It's not necessary to load the whole file into memory with either regex based parsers or CFG based parsers.  I'm not sure what makes you think that's necessary.  In principle, you could implement either type of parser by reading a file one byte at a time.
Kalani Send private email
Monday, October 10, 2005
 
 
>the thing you are searching for spans multiple lines,

Perl, at least, allows you to explicitly allow patterns to span lines. Then "\n" is simply the end of line character itself.
.
Tuesday, October 11, 2005
 
 
Are the things you're searching for simple string matches, or are there repeated elements like matching parentheses?

Regexes are really good at the "x followed by some characters followed by Y". They're not good at things like "expression + expression" where expression is "number, number + expression, addend", where addend is "number or number * number" etc.

Basically, if you're just matching flat text, then regexes are fine. If you need to match not only the characters, but the structure of the file you're in, you want an actual parser.
Chris Tavares Send private email
Tuesday, October 11, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz