The Joel on Software Discussion Group (CLOSED)A place to discuss Joel on Software. Now closed. |
||
|
This community works best when people use their real names. Please
register for a free account.
Other Groups: Joel on Software Business of Software Design of Software (CLOSED) .NET Questions (CLOSED) TechInterview.org CityDesk FogBugz Fog Creek Copilot The Old Forum Your hosts: Albert D. Kallal Li-Fan Chen Stephen Jones |
Hello, I have a question.
I want to write an expression to match a string with a series of rules. The problem is that the expression correcly matches strings that well formed but also matches strings that are well formed and have extra stuff. [0-9]{1}(1|0){1} What must be inserted in order to invalidate further chars? I tried using: [0-9]{1}(1|0){1}\w{0} and [0-9]{1}(1|0){1}[a-zA-Z0-9]{0} To make it match 0 letters and numbers after the first string, which gives the same results. Thanks.
Drunkie Monday, February 25, 2008
"What must be inserted in order to invalidate further chars?"
If you're matching an entire string, you want to anchor it with the ^ and $ characters: ^[0-9]{1}(1|0){1}$ This matches only 2 characters. Without the ^ and $ it merely matches any string that contains that pattern somewhere in it.
Are you sure about that? This may be specific to the Bash shell (which I'm familiar with) but I believe ^ and $ match the beginning and end of a line. In that case, adding those characters would only match when the two characters are the only ones on the line.
Question for the OP: Are you looking for instaneces of the two characters set off by whitespace? If so you can try \s\([0-9]{1}(1|0){1}\)\s The \s matches space, tab, return and newline. The \( and \) will cause the expression to only return the enclosed segment of what is matched. (Here's hoping nothing there gets stripped from this post.)
Though you're complicating things needlessly: {1} is meaningless, so you can just write ^[0-9](1|0)$. (Or ^[0-9]([01])$, which may be infinitesimally faster on some implementations.)
Iago Monday, February 25, 2008
There's no such thing as \s (or \w) in POSIX regular expressions (there is [[:space:]], but who wants to use that?). And if you're using an extended type of regex you will also have access to zero-width assertions like \b (Perl etc) or \< and \> (Emacs etc) that will match a whitespace boundary _or_ the end of a line, and will also not require an extra match group to be introduced, so there aren't many cases where \s(foo)\s is optimal.
And then people wonder why regular expressions have a reputation for being complicated... :D
Iago Monday, February 25, 2008
(quote)
Though you're complicating things needlessly: {1} is meaningless, so you can just write ^[0-9](1|0)$. (Or ^[0-9]([01])$, which may be infinitesimally faster on some implementations.) (endquote) {1} is redundant, but I think it makes the expression more readable. Thanks.
Drunkie Monday, February 25, 2008
> {1} is redundant, but I think it makes the expression more readable.
I think the overwhelming majority of people who know regular expressions would disagree with you. Why stop at {1}? If that's more readable, surely you could make the same case for {1}{1}, {1}{1}{1}, etc.
clcr Monday, February 25, 2008
Drunkie said: "{1} is redundant, but I think it makes the expression more readable."
Sure sure, in in the the same same way way that that extra extra words words make make sentences sentences more more readable readable. Monday, February 25, 2008
It just occurred to me that this is essentially the same as the argument over whether to compare booleans to true, e.g.:
if (succeeded) vs. if (succeeded == true) In both cases the more verbose form makes newbies somewhat more comfortable but most experienced people find that it adds clutter and suggests that the author didn't really understand what was going on.
clcr Monday, February 25, 2008
@Iago
Thanks for the clarification. I had assumed bash was pretty close to POSIX standard. Guess not.
Someone needs to come up with a tool that SIGNIFICANTLY reduces the pain involved with the creation of a regex.
Some tools exist, but they are garbage at best....not intuitive to use.... RegEx is the bane of current software development right along with multi-threading, application protection and executable installation protocols.
Brice Richards: "Someone needs to come up with a tool that SIGNIFICANTLY reduces the pain involved with the creation of a regex."
Done. See www.regexbuddy.com - it's great for designing and testing regexes, allows you to build reusable libraries and automatically paste the regex in various dialects and IDEs. It's also pretty cheap (< $30, IIRC). Also allows GREPping in files or arbitrary text pasted into a window. No relation, just been using it for a while.
(quote)
Why stop at {1}? If that's more readable, surely you could make the same case for {1}{1}, {1}{1}{1}, etc. (end of quote) Well that can't be defended with the same argument because that introduces a lot of entropy in the expression. Using "(1|2){1}" serves as clarification. It adds some entropy but it minimal and serves a purpose. Does this have any non negletable performance influence ? I'm using PHP btw.
Drunkie Tuesday, February 26, 2008
(quote)
if (succeeded) vs. if (succeeded == true) (end of quote) Sometimes you actually need a form similar to the second. For example some PHP functions return stuff like: * N number, if (...) * boolean false, otherwise And you actually need if ( $x === false ) { } Using if ( !$x ) { } Could lead to incorrect results. Regards.
Drunkie Tuesday, February 26, 2008
"RegEx is the bane of current software development right along with multi-threading, "
Regular expressions have been around for 25+ years and there is a huge body of knowledge around them, including many examples. They are absolutely nothing new. What I do is take pieces of regular expression examples from outside sources, and build my expression outward, checking it against test data as I go. I'm good enough to get close with a regular expression written from scratch but not so good that I can wing it without testing. Multi-threading being a "bane"? No, not really. Unless one doesn't understand it at all. It adds complexity but it's manageable. |
|
Powered by FogBugz


