The Joel on Software Discussion Group (CLOSED)

A place to discuss Joel on Software. Now closed.

This community works best when people use their real names. Please register for a free account.

Other Groups:
Joel on Software
Business of Software
Design of Software (CLOSED)
.NET Questions (CLOSED)
TechInterview.org
CityDesk
FogBugz
Fog Creek Copilot


The Old Forum


Your hosts:
Albert D. Kallal
Li-Fan Chen
Stephen Jones

Peculiar "C Test" Question...that I failed

Hi Everyone,

Just going through some old emails and I came across this interesting question I got from one company. It was their "screening" question and I basically had a week to come up with an answer. In the next post, I'll throw up my answer...which I know was correct as I tested a bunch of different inputs. However, apparently the hiring manager took a "5 second" look at my code and immediately stamped me as "No Hire". I'm curious as to what you would have done?

Design and write, using ISO C (no C++ or language extentions), an implementation for the following function prototype:

int numcmp(const char *s1, const char *s2);

The function numcmp assumes that both null-terminated strings s1 and s2 contain one or more digits and at most one decimal point ('.'); the result for input strings not of this form is not defined.  The function compares the contents of the null-terminated string s1 with the contents of the null-terminated string s2.  Using a numerical interpretation for the strings, it returns a value of type int that is less than zero if s1 is less than s2; equal to zero if s1 is equal to s2; and greater than zero if s1 is greater than s2.

The function interface is similar to that of strcmp from the standard C library "string.h".  It can be used in place of strcmp when a numeric comparison is needed in place of a lexicographic comparison.

The result of the function should be the same as a comparison of the numerical values of represented by the strings.  (Examples: "0" is equal to "000"; "12" is equal to "0012"; "100" is greater than "99"; ".02" is greater than "0.001")

The the comparison should be implemented using only simple character processing. (It should not be implemented using float or double.)

There is no limit on the number of digits in the input strings.  The implementation should not impose a restriction on the length of the strings.
Do not use any library functions (including the standard libraries) in your implementation.
Example input strings:

0
00000
001
1
000.00000
00.000000001
12345
30000000.00000000000000300000000000001000001
234982732487618236876
.0001
10000.
6787687687682344442837612836626387612387163.23435
0000100000000000000000.00000000000000000000000000001
0000100000000000000000.00000000000000000000000000002
Soft Duck AKA Skeletor
Thursday, September 01, 2005
 
 
Here's my reply:

static void parse_str(const char *s, const char **lhs, const char **rhs, int *lhs_width, int *rhs_width);

/*
 * Here's the numcmp function that performs a numerical interpretation of its
 * two input strings and returns the following values according to s1 & s2:
 *
 * s1  > s2: A positive integer greater than 0 (1...n)
 * s1 == s2: 0
 * s1  < s2: A negative integer less than 0 (-1...-n)
 */
int numcmp( const char *s1, const char *s2)
{
    /* Pointers used to partition the input strings into seperate strings  */
    /* representing the lhs and rhs of the decimal where applicable        */
    const char *plhs1, *plhs2, *prhs1, *prhs2;

    /* Variables used to hold the "widths" of the lhs & rhs components of  */
    /* each input string                                                    */
    int lhs_width1 = 0, lhs_width2 = 0, rhs_width1 = 0, rhs_width2 = 0;

    /* parse each string to obtain the lhs, rhs, and width of each component*/
    parse_str(s1, &plhs1, &prhs1, &lhs_width1, &rhs_width1);
    parse_str(s2, &plhs2, &prhs2, &lhs_width2, &rhs_width2);

    /* This block handles lhs strings containing non-zero digits...there is */
    /* no processing required when BOTH lhs components are zero            */
    if (lhs_width1 || lhs_width2)
    {
        if (lhs_width1 == lhs_width2)
        {
            /* Since the widths are equal, we must traverse the string      */
            while (lhs_width1 && (*plhs1 == *plhs2))
            {
                lhs_width1--;
                plhs1++;
                plhs2++;
            }

            /* We can stop if there was a difference in digits before      */
            /* traversing the entire lhs widths...else we must continue onto*/
            /* the rhs component of the string                              */
            if (lhs_width1)
            {
                return *plhs1 - *plhs2;
            }
        }
        else
        {
            /* No processing required...the larger width represents the    */
            /* greatest numerical value                                    */
            return lhs_width1 - lhs_width2;
        }
    }

    /* This block handles rhs strings containing non-zero digits...there is */
    /* no processing required when BOTH rhs components are zero            */
    if (rhs_width1 || rhs_width2)
    {
        /* In all cases, we must traverse the rhs components                */
        while ((rhs_width1 && rhs_width2) && (*prhs1 == *prhs2))
        {
            rhs_width1--;
            rhs_width2--;
            prhs1++;
            prhs2++;
        }

        /* Traversal completed. Either we detected a difference in digits  */
        /* or we traversed the entire width of one/both rhs components.    */
        /* We now have a conclusive final result...                        */
        if (!rhs_width1 || !rhs_width2)
        {
            return rhs_width1 - rhs_width2;
        }
        else
        {
            return *prhs1 - *prhs2;
        }
    }

    /* This is reached only when both lhs and rhs components are 0's        */
    return 0;
}

/*
 * This function splits an input string into its respective lhs and rhs
 * components and computes the "width" of each component without the
 * extraneous leading 0's (lhs) and trailing 0's (rhs)
 */
static void parse_str(const char *s, const char **lhs, const char **rhs, int *lhs_width, int *rhs_width)
{
    /* Pointer used to traverse the string                      */
    const char * str_p = s;
   
    /* Trim the leading zeros from the lhs component            */
    while (*str_p == '0')
    {
        str_p++;
    }
   
    /* Handle the case when the input string is a string of 0's */
    /* with no decimal point                                    */
    if (*str_p == '\0')
    {
        return;
    }

    /* The lhs component is now free of leading 0's            */
    *lhs = str_p;
   
    /* Compute the width of the lhs component between either    */
    /* the decimal or '\0' character termination....where      */
    /* applicable                                              */
    while (*str_p != '.' && *str_p != '\0')
    {
        str_p++;
    }
    *lhs_width = str_p - *lhs;
   
    /* If we reached the '.' character, set rhs...else, we're  */
    /* done...                                                  */
    if (*str_p == '.')
    {
        *rhs = ++str_p;
    }
    else
    {
        return;
    }

    /* Now traverse to the end of the string and trim any      */
    /* trailing 0's if applicable                              */
    while (*str_p != '\0')
    {
        str_p++;
    }

    do
    {
        --str_p;
    }
    while ((*str_p == '0'));

    /* handle the case where rhs is all 0's                    */
    if (*str_p == '.')
    {
        return;
    }

    /* Compute the width of the rhs component between the head  */
    /* of the rhs component (after the decimal) and either the  */
    /* '\0' or first trailing 0...where applicable              */
    *rhs_width = str_p - *rhs + 1;
    return;   
}
Soft Duck AKA Skeletor
Thursday, September 01, 2005
 
 
Well in my five second look I see you return from "parse_str" in some cases without setting return values, also you spelled "separate" wrong.

How do you know he only spent five seconds looking at it, and decided not to hire you based on that?
Vic
Thursday, September 01, 2005
 
 
What was your test questions for them?
son of parnas
Thursday, September 01, 2005
 
 
Did you email your code snippet from softDuck@hotmail.com or skeletor@yahoo.com? The latter would be a no hire in my opinion.

Your code contains a serious bug:
* No processing required...the larger width represents the    */
/* greatest numerical value                */
            return lhs_width1 - lhs_width2;
        }
    }

Did you test these numbers?
000.0000
00.00001

They are part of your sample input.

There are a couple other issues I would have with your code too:
1. Why decrement lhs_width1? You should create an index like 'i' and use that (your lhs_width1-- actually created another bug which I think causes your test case to succeed  on 000.000 and 00.0001 - try testing the numbers in reverse ( 00.0001 and 000.000)

2. Your use of while loops with conditional checks after each loop was confusing.

3. Your commment block style was aggravating:
/* blah blah */
/* blah blah */
I've seen it around, but I much prefer:
/*
 blah blah
 blah blah
*/
or:
// blah blah
// blah blah

4. Why did you use rhs and lhs? Why not just left and right? You saved 2 letters, but made the code harder to read.
NathanJ
Thursday, September 01, 2005
 
 
That seems a little complicated. What about following (excuse my c, have never really written a c program):

int numcmp(char *left, char *right)
{
  /* find first non-zero digit */
  while (*left == '0' && *left != '\0')
      ++left;
  while (*right == '0' && *right != '\0')
      ++right;

  /* compare digits, find first inequality */
  while (*left == *right && *left != '\0')
  {
      ++left;
      ++right;
  }

  if (*left < *right)
  {
      /*
        left is either decimal point
        or digit is less than right
      */
      return -1;
  }
  if (*left == *right)
  {
      /* all digits were the same */
      return 0;
  }
  else if (*left > *right)
  {
      return 1;
  }
 
}
Daren Thomas Send private email
Thursday, September 01, 2005
 
 
Darren,

You need some out of bounds checking but, yup, that'll do it.
Mr Jack
Thursday, September 01, 2005
 
 
Daren, I've not spent too much time checking myself, but doesn't that fail for a pair like (12, 113) -> wouldn't it return that 12 is higher than 113?
Joel Goodwin Send private email
Thursday, September 01, 2005
 
 
With all these comment overload, one *really* important comment is missing:

  static void parse_str(const char *s, const char **lhs, const char **rhs, int *lhs_width, int *rhs_width)

/* parse_str expects lhs_width and rhs_width to be initialized with 0 */

Furthermore:

  return int1 - int2;

has the danger of over/underflow of signed integers with possible undefined behaviour. Try this nice idiom instead, it even guarantees the values -1, 0 and +1:

  return (int1 > int2) - (int1 < int2);
Secure
Thursday, September 01, 2005
 
 
Aside from bounds checking (what if left is longer than right?) it also dies if there are trailing (decimal) zeros.

Also, it's a bit crude (in my opinion) to compare to the null terminator as if it were a real digit. It should at least be documented.
Mostly Harmless
Thursday, September 01, 2005
 
 
What a mess! Breaking it into lhs and rhs was your first mistake. Daren was more on track. This is an example of a function that takes a lot of imagination and constant refactoring. What you did looks more like coding soup, you just keep on coding and remarking until it works without being willing to totally rework it. I think the best solution would be a single pass keeping track of a couple of offsets. Like Darren did, bypass leading zeros and note start of digits for each string, then compare numbers up to decimal or end and first one greater flag that string as greater. Once reaching end or decimal in both strings, if the digit counts are different that will override whether one number compared greater than the other. If they are equal so far, then compare after decimals if any. First non-equal digit gives the answer, otherwise equal. The result is very straight forward, lean and mean. Not the monster mess you created.
Anon and anon
Thursday, September 01, 2005
 
 
Well...

It was actually quite fun to watch you guys rip that code apart!!!

I did this about 3-4 years ago and it was during a time when I was laid off and had been thoroughly fed up with SW development altogether. I know that my main problem was that I didn't think through the analysis of the question well enough...and just went right into the code. My brain had already turned into mush by working with some really horrible developers and fixing up their hideous spaghetti/block-copied code.

Looking back on the junk I threw together, I know (after a considerable amount of water under the bridge) I could have done MUCH better and would have focused more on a terse & simpler to follow solution rather than the well-commented madness I threw together (I remember it took me several hours to even comment the stuff).

But to this day...it always imagined that there was a silly catch to the problem and it could have been solved with 5 lines of code.

Anyways, I knew this would probably spark some interesting discussion here...thanks again everyone!

PS - No, my legal name doesn't have Soft Duck or Skeletor in it...and the email address I used fully reflects my legal name.
Soft Duck AKA Skeletor
Thursday, September 01, 2005
 
 
Soft Duck,

After my 5s look, my reaction was "too long and too ugly." I have the advantage of practice so don't feel too bad. Here is my offering, which took about 50 minutes total:


int numcmp(const char *s1, const char *s2);

int numcmp(const char *s1, const char *s2)
{
    char *p1, *p2;
    int offdec1, offdec2;  /* offsets of decimal points in s1, s2 */
   
    /*  move beginning of string to first character that isn't '0' or '.'
    *  then find offset of '.'
    *  if '.' not found its offsetis implied
    */
    for(; s1 && *s1=='0' && *s1!='.'; s1++)   
        ;
    for(p1=s1; p1; p1++)
        if(*p1=='.')
            break;
    offdec1=p1-s1;

    /* repeat for s2 */
    for(; s2 && *s2=='0' && *s2!='.'; s2++)
        ;
    for(p2=s2; p2; p2++)
        if(*p2=='.')
            break;
    offdec2=p2-s2;

    /* if both adjusted strings contain '.' the string with the highest offset is the biggest number */
    if(offdec1!=offdec2)
        return offdec1-offdec2;

    /*  ...otherwise both decimal points are at same offest
    *  so look for first different digit
    */
    for(p1=s1, p2=s2; p1 && p2 && *p1==*p2; p1++, p2++)
        ;
    if(!p1) return -1;          /* s2 has more digits so s1<s2 */
    if(!p2) return  1;          /* s1 has more digits so s1>s2 */
    return (*p1-'0')-(*p2-'0'); /* found two digits that differ */
}
Philip Prohm Send private email
Thursday, September 01, 2005
 
 
By measuring the 'width' of string components you are violating the part of the specification that states: "The implementation should not impose a restriction on the length of the strings.".
JP
Thursday, September 01, 2005
 
 
"for(p1=s1, p2=s2; p1 && p2 && *p1==*p2; p1++, p2++);"

Ugh! You've got to be kidding right? That makes my head hurt just looking at it.  :)
matt
Thursday, September 01, 2005
 
 
Too ugly and too long. You would have been clearly no hire for me.

As for a trivial solution: just strip the leading and terminating zeros from both strings and have a loop a-la "strcmp" (with the only trivial catch dealing with ".").
The function to remove zeros is about 4-5lines of code and the the loop another 3-4 lines. The solution should be under 15 lines of code.
danpop
Thursday, September 01, 2005
 
 
Philip,

  for(; s1 && *s1=='0' && *s1!='.'; s1++)
    ;

Why not this, what's wrong with it?

  while (s1 && *s1=='0' && *s1!='.')
    s1++;

Besides, it can be simplified to:

  while (s1 && *s1=='0')
    s1++;

Or do you assume that '0' could ever be equal to '.'?

  for(p1=s1; p1; p1++)
    if(*p1=='.')
      break;

What will happen if there is no '.' in the string?
Secure
Thursday, September 01, 2005
 
 
JP, unless you want to support strings with 2**31 significant digits on the lhs, my implementation does not restrict the length of the string.

matt, sorry
for(p1=s1, p2=s2;
    p1!='\0' && p2!='\0' && *p1==*p2;
    p1++, p2++)
    ;
:)

danpop, it's a lot prettier in an editor. Anyway, put your money where your mouth is and let's see your code :)
Philip Prohm Send private email
Thursday, September 01, 2005
 
 
Secure, there is nothing wrong with your first while(), it is a matter of individual style. I prefer for() because all of the loop control is in the head. I'm not even original here because K&R make this point explicitly in the white book.

Your simplification doesn't hold because it changes the semantics.

If there is no '.' then after the loop the condition "p1" is false ie. *p1=='\0' (and '.' is implied to be after the last digit ie. *p1=='.'). See "offdec1=p1-s1;"
Philip Prohm Send private email
Thursday, September 01, 2005
 
 
Philip,

one more ;)

if s1 or s2 is NULL, then

  offdec1=p1-s1;

you subtract two NULL-pointers here. This should be undefined behaviour. Furthermore,

  if(!p1) return -1;
  if(!p2) return  1;

numcmp(NULL,NULL) returns -1.

"I have the advantage of practice so don't feel too bad. Here is my offering, which took about 50 minutes total:"

50 minutes and so much bugs in such less code? The advantage of practice is worthless without knowledge. I suggest a bit more humility and modesty. Additionally, test your code exhaustively before posting it. ;)
Secure
Thursday, September 01, 2005
 
 
Philip,

"Your simplification doesn't hold because it changes the semantics."

When *s1 is !='0', the loop will break with a shortcut. When *s1 is =='0', it is guaranteed to be !='.', thus the additional test is superfluous.

"If there is no '.' then after the loop the condition "p1" is false ie. *p1=='\0'"

The condition (p1) tests for (p1 != NULL).
Secure
Thursday, September 01, 2005
 
 
Hi Secure, read the spec.

"The function numcmp assumes that both null-terminated strings s1 and s2 contain one or more digits"

No bug.
Philip Prohm Send private email
Thursday, September 01, 2005
 
 
Philip,

Yes, I see. Learn your pointers before using them. Learn what if(p) and if(!p) is testing.
Secure
Thursday, September 01, 2005
 
 
You should be using Perl.

sub numcmp ($$) {
  return $_[0] <=> $_[1];
}
Mike Schiraldi
Thursday, September 01, 2005
 
 
See also "Appendix: Power" midway through http://www.paulgraham.com/icad.html
Mike Schiraldi
Thursday, September 01, 2005
 
 
Yes the "&& *s1!='.'" is superfluous. Oh well it was after 2am when I wrote it. Thanks to short-circuiting it wouldn't have a performance impact.

while(p)
Cripes, how many times have I done that? Must be my favourite bug. Of course it should be
while(*p)
Philip Prohm Send private email
Thursday, September 01, 2005
 
 
I disagree that 'rhs' and 'lhs' are poorly chosen. I'm a huge fan of full word and multi-word variable names. However, rhs and lhs are very common in the mathematical code I've seen, and often appear inside some of the most important places to keep terse: complex math expressions.

Chances are that when you document your algorithm you're going to use the letters 'rhs' rather than something else since that's pretty standard, so it makes the docs -> code mapping simpler too.

'right' to me could mean the right and of an array, or maybe a boolean value denoting correctness. When I see 'rhs' I immediately know it represents a specific part of a mathematic expression.

Great comments otherwise, though.
John Send private email
Thursday, September 01, 2005
 
 
I like visual patterns in code. lhs and rhs qualify because the names are both the same length and the relationship between the two entities is clearly reflected in their names.
Philip Prohm Send private email
Thursday, September 01, 2005
 
 
Philip Prohm: "JP, unless you want to support strings with 2**31 significant digits on the lhs, my implementation does not restrict the length of the string."

Where in the function specification (or C specification) does it say that integers are 32-bits? What if they were compiling this on a target system with 16-bit integers? Or 64-bit addresses and 32-bit integers? The only thing we know for sure is that the requirements are very clear in saying that you must NOT restrict the length of the string.

Your implementation and that of the OP are both incorrect for this reason alone. You don't get bonus points for relaxing the spec to fit your own assumptions about the problem you are trying to solve.
JP
Thursday, September 01, 2005
 
 
Wow, this is a fun thread!  The question must be great for finding good candidates - it's deceptively difficult to come up with an answer that conforms to the spec.  In the interest of keeping this going, here's my attempt.  It's a bit longer than the other, but it could probably be whittled down a bit.  Fire away!

int numcmp( const char *s1, const char *s2) {

    int ret = 0;
    
    /* Strip leading zeros to simplify things. */
    while (*s1 && *s1 == '0') s1++;
    while (*s2 && *s2 == '0') s2++;

    /* Loop until we return or run out of input on one string. */
    while (*s1 && *s2) {
        
        /* If we hit the decimal in one string first, then it's the SMALLER. */
        if (*s1 == '.' && *s2 != '.') return -1;
        else if (*s1 != '.' && *s2 == '.') return 1;
        
        /* If the strings differ, set the current return value. */
        if (ret != 0 && *s1 != *s2)
                ret = (*s1 > *s2) ? 1 : -1;

        /* Go to the next element. */
        s1++;
        s2++;
        
    }

    /*
     * If both strings were still equal when we ran out of input, then check
     * the remaining string.  If the remaining input contains a non-zero
     * character, then it's the larger value.
     */
    if (ret == 0 && *s1) {
        while (*s1) {
            if (*s1 != '0') return 1;
            s1++;
        }
    } else if (ret == 0 && *s2) {
        while (*s2) {
            if (*s2 != '0') return -1;
            s2++;
        }
    }

    return ret;
    
}
PAG
Thursday, September 01, 2005
 
 
JP, you're splitting hairs.

In C, int is the natural word size of the machine. On today's machines, and in the context of the job OP was applying for, that is 32 bits. If the spec required anything different it would have said so. In practical terms my implementation does not limit the input.
Philip Prohm Send private email
Thursday, September 01, 2005
 
 
I don't know why everyone keeps trying to strip leading zeros... I say that you should add leading and trailing zeros to make the numbers easier to compare.  ;)

Take:

1.4357
3456.23

And convert to:

0001.4357
3456.2300

Now perform a simple string compare! Easy as pie.

FYI... I wouldn't actually create a new string by padding with zero's. I'm just showing that there are many different ways to skin a cat.

Fun topic!
matt
Thursday, September 01, 2005
 
 
Philip, Your view of the world is extremely blinkered if all you can imagine is computing on 32-bit PCs. I could demonstrate your implementation failing on real-world "current" C-based target platform. Ergo, it is wrong.

But yes, your solution would be fine if you changed the spec from "There is no limit..." to "Limited to MAXINT digits..". However that wasn't the spec.

BTW there was no "context" for the job that OP was applying for.
JP
Thursday, September 01, 2005
 
 
> (with the only trivial catch dealing with ".").

Ha! if there was only a way to turn rhetoric into code.  I second Philip Brohm's impl above.  It seems to be what was intended.
russ
Thursday, September 01, 2005
 
 
My algorithm (which I haven't coded) is the same as Matt's.  The only note I'll make is that there's no need to add trailing 0s.  Once we've passed the decimal point and are still equal, if Str1 has another (non-zero) character and Str2 doesn't, Str1 is bigger.

And while there's more than one way to skin a cat, I've learned that people really don't like hearing them listed (especially during lunch).
Boofus McGoofus Send private email
Thursday, September 01, 2005
 
 
void strip_zeros(char* s)
{
    int i = 0;
    for (; s[i] != '\0' && s[i] == '0'; ++i);
    memmove(&s[0], &s[i], strlen(s) - i);
    for (i = strlen(s) - i - 1; i >= 0 && s[i] == '0'; --i);
    s[i + 1] = '\0';
}
int numcmp(char* s1, char* s2)
{
    strip_zeros(s1);
    strip_zeros(s2);
    while (*s1 && *s2 && *s1 == *s2) ++s1, ++s2;
    if ((*s1 == '\0') && (*s2 == '\0'))
        return 0;
    if (*s1 == '\0' || *s1 == '.')
        return -1;
    if (*s2 == '\0' || *s2 == '.')
        return 1;
    return *s1 < *s2 ? -1 : 1;
}


There are two main problems with the above:
1. It modifies the input strings (I could have coded it not to, but I wanted to keep it shorter).
2. It exceeds 15 lines.

I did not pay any attention to optimization, style, etc.
danpop
Thursday, September 01, 2005
 
 
danpop, spec says no library functions :(

JP, while I disagree with your literal-minded interpretation of the spec, here is an implementaion that will satisfy it (it also has a couple of cleanups):

int numcmp(const char *s1, const char *s2)
/* spec says no need for assert(s1) */
{
    char *p1, *p2;
    int ds, dp;  /* distance in memory between s1/s2, p1/p2 */   

    /*  clip leading '0's
    *  then find offset of '.'
    *  if '.' not found its offset is implied
    */
    for(    ; *s1 && *s1=='0'; s1++)
        ;
    for(p1=s1; *p1 && *p1!='.'; p1++)
        ;

    /* repeat for s2 */
    for(    ; *s2 && *s2=='0'; s2++)
        ;
    for(p2=s2; *p2 && *p2!='.'; p2++)
        ;

    /*  if both clipped strings contain '.' the string with
    *  the longest lhs is the biggest number
    */
    ds=s2-s1;
    dp=p2=p1;
    if(ds!=dp)
        return ds-dp;

    /*  ...otherwise both decimal points are at same offest
    *  so look for first different digit
    */
    for(p1=s1, p2=s2; *p1 && *p2 && *p1==*p2; p1++, p2++)
        ;
    if(!*p1) return -1;        /* s2 has more digits so s1<s2 */
    if(!*p2) return  1;        /* s1 has more digits so s1>s2 */
    return (*p1-'0')-(*p2-'0'); /* found two digits that differ */
}
Philip Prohm Send private email
Thursday, September 01, 2005
 
 
Phillip, it says do not use C++ and language extensions; it doesn't say anything about using standard C functions.
danpop
Thursday, September 01, 2005
 
 
I jus couldn't resist pasting my own solution. Took me less than 10 minutes:

int  numcomp( const char*  s1,
              const char*  s2 )
{
  int    result;
  size_t  l1,  l2;

  /* strip leading zeros */
  while ( *s1 == '0' ) s1++;
  while ( *s2 == '0' ) s2++;

  /* get lengths of integer parts */
  for ( l1 = 0; s1[l1] && s1[l1] != '.'; l1++ ) {}
  for ( l2 = 0; s2[l2] && s2[l2] != '.'; l2++ ) {}

  if ( l1 < l2 ) /* compare integer lengths */
      return -1;
  else if ( l2 < l1 )
      return +1;

  result = strncmp( s1, s2, l1 ); /* compare ints */
  if ( result != 0 )
    return result;

  if ( s1[l1] == '.' ) l1++;  /* skip dot */
  if ( s2[l2] == '.' ) l2++;

  for ( s1 += l1, s2 += l2; ; s1++, s2++ )
  {
      if ( *s1 == 0 )
      {
          while (*s2 == '0') s2++;
          return (*s2 == 0) - 1;
      }
      if ( *s2 == 0 )
      {
          while ( *s1 == '0' ) s1++;
          return 1 - (*s1 == 0);
      }
      if ( *s1 != *s2 )
          return (*s1 - *s2);
  }
}
Just Trying
Thursday, September 01, 2005
 
 
Philip & Just Trying & OP, Any implementation that relies on the integer distance between two pointers is implicitly dependent on the size of an integer and therefore wont meet the exact spec as written.

Dandop, I think Philip was referring to the bit in the spec that said "Do not use any library functions (including the standard libraries)". Also your function prototype is not the same as the one in the spec.
JP
Thursday, September 01, 2005
 
 
I guess, I'm bored.

JP, it doesn't use standard functions and it has the right prototype:

int numcmp(const char* s1, const char* s2)
{
    for (; *s1 != '\0' && *s1 == '0'; ++s1);
    for (; *s2 != '\0' && *s2 == '0'; ++s2);
    int len1 = 0;
    int len2 = 0;
    while (s1[len1] != '\0') ++len1;
    while (s2[len2] != '\0') ++len2;
    for (--len1; len1 >= 0 && s1[len1] == '0'; --len1);
    const char* end1 = &s1[len1 + 1];
    for (--len2; len2 >= 0 && s2[len2] == '0'; --len2);
    const char* end2 = &s2[len2 + 1];
    while (s1 != end1 && s2 != end2 && *s1 == *s2) ++s1, ++s2;
    if ((s1 == end1) && (s2 == end2))
        return 0;
    if (s1 == end1 || *s1 == '.')
        return -1;
    if (s2 == end2 || *s2 == '.')
        return 1;
    return *s1 < *s2 ? -1 : 1;
}
danpop
Thursday, September 01, 2005
 
 
I think the company would expect you not to take the "no limit" literally, and taking it literally could be a "no hire".

Given that the strings are addressed via pointers and aren't in files on a disk, that implies they fit within the memory space of the machine (including virtual), and any numeric type big enough to address the whole space has to be acceptable.

Learning *not* to take requirements literally is an important skill, as literal interpretations could send you off wasting gobs of time and money.

That's one of the troubles with offshoring ... they too often take something you say literally and then run up the cost of the project unnecessarily.
Notadotnetter
Thursday, September 01, 2005
 
 
"In C, int is the natural word size of the machine. On today's machines, and in the context of the job OP was applying for, that is 32 bits. If the spec required anything different it would have said so."

Actually, the spec has said so:

"Design and write, using ISO C (no C++ or language extentions), an implementation for the following function prototype:"

In the ANSI/ISO-Standard, an int is guaranteed to have at least 16 bits (including the sign). Thus you should not assume more. Because we can safely assume that the strings will fit into memory, pointer operations and/or size_t counters are the only safe bet on this.
Secure
Thursday, September 01, 2005
 
 
>  I disagree that 'rhs' and 'lhs' are poorly chosen.

Your reasons for disagreeing don't make sense.

This code was not terse. If it was meant to be terse use l and r. And use w instead of width (example: l_w1, r_w1). lhs and rhs are not more clear. right_width1 is more clear than rhs_width1.

If an expression fits on one line there is absolutely no reason to shorten it with terse variable names.

The proper term for the two "sides" of a decimal number are not right and left. Right and left are more appropriate for two sides of a comparison or an equation. In this example using rhs and lhs is more confusing. Are you talking about the rhs of the left number being compared or talking about something on the right number?
NathanJ
Thursday, September 01, 2005
 
 
Enough theory, here is my try (hope it does not line-wrap too ugly):

int numcmp (const char *s1, const char *s2)
{
    int res = 0, dot1, dot2, val1, val2;

    while (*s1 == '0') s1++;
    while (*s2 == '0') s2++;

    dot1 = 0;
    while ((*s1 || *s2) && !dot1)
      {
        dot1 = ((*s1 == '.') || (*s1 == '\0'));
        dot2 = ((*s2 == '.') || (*s2 == '\0'));

        if (dot1 && !dot2)
            return -1;
        else if (dot2 && !dot1)
            return +1;
        else if ((res == 0) && !dot1)
            res = (*s1 > *s2) - (*s1 < *s2);

        if (*s1) s1++;
        if (*s2) s2++;
      }

    while ((*s1 || *s2) && (res == 0))
      {
        val1 = ( *s1 ? *s1++ : '0' );
        val2 = ( *s2 ? *s2++ : '0' );
        res = (val1 > val2) - (val1 < val2);
      }

    return res;
}

#include <stdio.h>
void check (const char *s1, const char *s2)
{
    int res;
    res = numcmp(s1, s2);
    printf("%s , %s = %d\n", s1, s2, res);
}
int main (void)
{
    check("0", "0000000.000000000000");
    check("123.", "123.00000000");
    check("123.000", "123");
    check("123.001", "123");
    check("123", "123.001");
    check("123", "123.");
    check("123", ".123");
    check("00000100000.0001", "00000100000.0002");
    check("1234567890.", "1234567891.");
    check(".1234567890", ".1234567891");
    check("12345.0001", "0001234.0001");
    return 0;
}

0 , 0000000.000000000000 = 0
123. , 123.00000000 = 0
123.000 , 123 = 0
123.001 , 123 = 1
123 , 123.001 = -1
123 , 123. = 0
123 , .123 = 1
00000100000.0001 , 00000100000.0002 = -1
1234567890. , 1234567891. = -1
.1234567890 , .1234567891 = -1
12345.0001 , 0001234.0001 = 1
Secure
Thursday, September 01, 2005
 
 
Whoops, there are two important test cases not covered:

    check("0512345.0001", "0412345.0001");
    check("0512345.0001", "0612345.0001");

0512345.0001 , 0412345.0001 = 1
0512345.0001 , 0612345.0001 = -1
Secure
Thursday, September 01, 2005
 
 
JP, stop being so bloody-minded. Even if you're not limited by the length of an int, you'll be limited by the amount of addressable memory on the machine, which in a lot of cases will be the same. Either way there is going to be some finite limit.
Matt
Thursday, September 01, 2005
 
 
Yeah! Come on JD. Give us a solution that isn't limited in some way. It's easy to sit back and say that a solution which limits the total length is wrong. So show us the "right" solution.

The bottom line is that even if you could do it (it's definitely possible), the result would be a horrendous mess of convoluted pointer arithmetic. I personally wouldn't hire anyone who couldn't make a simple tradeoff decision like this.
other matt
Thursday, September 01, 2005
 
 
ooops... JP, not JD.  ;)
other matt
Thursday, September 01, 2005
 
 
I find it hard to resist something like this. I have attempted to optimize for clarity and readability. Of course this is an elusive goal and one man's clarity is another's spaghetti.

I have reused Secure's test suite (thanks, a useful contribution). I added a few sanity checking test cases and made the suite check itself.

// Compare non-negative real numbers in string form
//  return +ve if s1>s2, -ve if s2>s2, 0 if s1==s2
int numcmp( const char *s1, const char *s2 )
{
    int diff=0;
    const char *mid1, *end1;
    const char *mid2, *end2;
    char c, d;

    // Point s1 beyond leading zeroes, end1 to end
    //  mid1 to '.' or to end1 if no '.'
    while( *s1 == '0' )
        s1++;
    for( mid1=s1; *mid1 && *mid1!='.'; mid1++ )
        ;
    for( end1=mid1; *end1; end1++ )
        ;

    // Repeat for s2
    while( *s2 == '0' )
        s2++;
    for( mid2=s2; *mid2 && *mid2!='.'; mid2++ )
        ;
    for( end2=mid2; *end2; end2++ )
        ;

    // Look at region between before mid (integer part)
    for( ; s1<mid1 && s2<mid2; s1++,s2++ )
    {
        if( diff==0 && *s1!=*s2 ) // first diff ?
            diff = (*s1-*s2); // decides only when
                              //  lengths are same
    }

    // If integer part digits left over, lengths differ
    if( s1 < mid1 )
        diff = 1;  // s1 longer than s2, so s1 bigger
    else if( s2 < mid2 )
        diff = -1; // s2 longer than s1, so s2 bigger

    // Look at fractional part if diff is still zero
    for( s1=mid1+1, s2=mid2+1;
        diff==0 && (s1<end1||s2<end2);
        s1++, s2++
      )
    {
        c = s1<end1 ? *s1 : '0';
        d = s2<end2 ? *s2 : '0';
        diff = c-d;  // first difference will decide
    }
    return diff;
}

#include <stdio.h>

void check( int expect, const char *s1, const char *s2 )
{
    int res;
    res = numcmp(s1, s2);
    if( res > 1 )
        res = 1;
    else if( res < -1 )
        res = -1;
    if( res != expect )
        printf(" whoops! %s , %s = %d\n", s1, s2, res);
}

int main( void )
{
    check( 1,"321", "54" );
    check(-1,"54", "321" );
    check( 1,"321.6", "00054" );
    check(-1,"54", "000321.6" );
    check( 1,"321.6", "54.7" );
    check(-1,"54.000", "321.0" );
    check( 1,"321.6", "54.00000" );
    check(-1,"54.00000", "00000321.6" );
    check( 0,"0", "0000000.000000000000");
    check( 0,"123.", "123.00000000");
    check( 0,"123.000", "123");
    check( 1,"123.001", "123");
    check(-1,"123", "123.001");
    check( 0,"123", "123.");
    check( 1,"123", ".123");
    check(-1,"00000100000.0001", "00000100000.0002");
    check(-1,"1234567890.", "1234567891.");
    check(-1,".1234567890", ".1234567891");
    check( 1,"12345.0001", "0001234.0001");
    check( 1,"0512345.0001", "0412345.0001");
    check(-1,"0512345.0001", "0612345.0001");
    return 0;
}
Bill Forster Send private email
Thursday, September 01, 2005
 
 
To both matts,

FWIW, Secure got it "right". Ironically for all the naysayers and apologists for reinterpreting specs, it's also a clean and concise implementation.
JP
Friday, September 02, 2005
 
 
Is the following solution any good? I don't have access to a compiler from this computer (i.e. I don't even know if it compiles), but I've tried to make sure that the code is as correct as possible. It's not optimised or anything, just something I threw together while waiting for stuff to happen at work. It's missing some casts/temp. variables: I've opted for brtevity, clarity, and ease of typing, over correctness in some places.



char* move_to_first_non_zero_digit(char* string)
{
  while (*string && *string == '0') ++string;
  return string;
}


int count_digits_before_decimal_point(char* string)
{
  int count = 0;
  while (*string && *string != '.')
  {
      ++string;
      ++count;
  }
  return count;
}


char* digit_by_digit_comparison(char* s1, char* s2)
{
  int result;

  /* compare the strings digit by digit until one string runs (or both strings run) out, or the digits don't match */
  while (*s1 && *s2 && *s1 == *s2)
  {
      ++s1;
      ++s2;
  }
 
  /* if we've come to the end of s1, but not s2, s2 can be greater than s1 by having a non-zero digit */
  if (!*s1 && *s2)
  {
      result = -(*move_to_first_non_zero_digit(s2) != 0);
  }

  /* conversely, if s2 is done, s1 can be greater by having a non-zero digit */
  else if (!*s2 && *s1)
  {
      result = *move_to_first_non_zero_digit(s1) != 0;
  }

  /* if both are done (i.e. both equal) or we found an inequality between adjacent digits, return a single digit comparison */
  else
  {
      result = *s1 - *s2;
  }

  return result;
}


int numcmp(const char *s1, const char *s2)
{
  int result;
  int s1_length;
  int s2_length;

  s1 = move_to_first_non_zero_digit(s1);
  s2 = move_to_first_non_zero_digit(s2);

  s1_length = count_digits_before_decimal_point(s1);
  s2_length = count_digits_before_decimal_point(s2);

  if (s1_length != s2_length)
  {
      result = s1_length - s2_length;
  }
  else
  {
      result = digit_by_digit_comparison(s1, s2);
  }

  return result;
}
Jordan Stewart Send private email
Friday, September 02, 2005
 
 
> while (*s1 == '0') s1++;

How did it work???
Michael
Friday, September 02, 2005
 
 
I don't know if it would be a widely successful tactic, but I'd prefer to take the spec literally or as it would probably be intended if written in a production environment, and add remarks to explain where they differ.

That would seem to compromise best between pedanticism and common sense.
Jack Vickeridge Send private email
Friday, September 02, 2005
 
 
Michael,

>> while (*s1 == '0') s1++;
>How did it work???

Why do you think it shouldn't?
Secure
Friday, September 02, 2005
 
 
My mistake.
Michael
Friday, September 02, 2005
 
 
It'll be a bit late so no-one'll read, but here is my ten minute try:

#include <stdio.h>

int numcmp(const char* one,const char* two) {
    enum { WHOLE_NUMBER, FRACTION } state = WHOLE_NUMBER;
    enum { LESS = -1, EQUAL, GREATER } ret = EQUAL;
    /* trim all leading zeros */
    while(*one=='0') one++;
    while(*two=='0') two++;
    /* compare digits one at a time, while we have two digits to compare */
    while(*one && *two) {
        /* moving to a fractional part? */
        if(*one=='.') {
            /* do we have to compare the fractional part too? */
            if(*two=='.') {
                if(ret==EQUAL)
                    state = FRACTION;
                else /* we know enough */
                    return ret;
            } else /* else two has more digits than one */
                return LESS;
        } else if(*two=='.') /* one has more digits than two */
            return GREATER;
        /* do digit comparision, recording state of first differing digit */
        if(ret==EQUAL) {
            if(*one < *two)
                ret=LESS;
            else if(*one > *two)
                ret=GREATER;
        }
        /* in fractional part, first differing digit is enough */
        if(state==FRACTION && ret!=EQUAL)
            return ret;
        /* move to next digit */
        one++;
        two++;
    }
    /* so one is shorter than the other */
    if(*one) /* one is longest */
        return GREATER;
    else if(*two) /* two is longest */
        return LESS;
    else /* they are the same length */
        return EQUAL;
}

void unittest_numcmp() {
    const char* test_data[] = {
        "0",
        "00000",
        "001",
        "1",
        "000.00000",
        "00.000000001",
        "12345",
        "30000000.00000000000000300000000000001000001",
        "234982732487618236876",
        ".0001",
        "10000.",
        "6787687687682344442837612836626387612387163.23435",
        "0000100000000000000000.00000000000000000000000000001",
        "0000100000000000000000.00000000000000000000000000002",
        NULL
    };
    const char** one;
    const char** two;
    int cmp;
    for(one=test_data; *one; one++) {
        for(two=test_data; *two; two++) {
            printf("numcmp(\"%s\",\"%s\") = %d\n",*one,*two,numcmp(*one,*two));
        }
    }
}

int main() {
    unittest_numcmp();
    return 0;
}

After scrolling a few pages of the unit-test, it seemed roughly right.  It is, of course, probably completely flawed too.
New nick, new rep
Friday, September 02, 2005
 
 
Using the extended test suite from above:

Jordan:
 whoops! 0 , 0000000.000000000000 = -1
 whoops! 123.000 , 123 = 1
 whoops! 123 , 123. = -1
and a looooot of warnings on compiling.

New nick:
 whoops! 0 , 0000000.000000000000 = -1
 whoops! 123. , 123.00000000 = -1
 whoops! 123.000 , 123 = 1
 whoops! 123 , 123. = -1
hidden agenda
Friday, September 02, 2005
 
 
ah, serves me right: if I'd scrolled a few pages lower, I'd have spotted:

comparing "000.00000" and "0" = 1
comparing "000.00000" and "00000" = 1
comparing "000.00000" and "001" = -1
comparing "000.00000" and "1" = -1
comparing "000.00000" and "000.00000" = 0
comparing "000.00000" and "00.000000001" = -1

... oops!

This little fix seems to fix that, if inserted after the trimming leading zeros and before the digit-for-digit bit:

    /* special case; what if either is all zeros but contains a decimal point? */
    if(*s1=='.') {
        const char* tmp=s1+1;
        while(*tmp=='0') tmp++; /* scan for first non-zero digit */
        if(!*tmp) /* s1 was all zeros after decimal point then.. */
            s1 = tmp; /* truncate */
    }
    if(*s2=='.') {
        const char* tmp=s2+1;
        while(*tmp=='0') tmp++; /* scan for first non-zero digit */
        if(!*tmp) /* was all zeros after decimal point then.. */
            s2 = tmp; /* truncate */
    }

I'm sure that with a week to offer a solution, this could be refactored to hell and back..
New nick, new rep
Friday, September 02, 2005
 
 
hidden agenda, posting at same time I see!  Thanks for testing.

How about this?

int numcmp(const char* s1,const char* s2) {
    enum { WHOLE_NUMBER, FRACTION } state = WHOLE_NUMBER;
    enum { LESS = -1, EQUAL, GREATER } ret = EQUAL;
    /* trim leading zeros */
    while(*s1=='0') s1++;
    while(*s2=='0') s2++;
    /* compare digits one at a time, while we have two digits to compare */
    while(*s1 && *s2) {
        /* moving to a fractional part? */
        if(*s1=='.') {
            /* do we have to compare the fractional part too? */
            if(*s2=='.') {
                if(ret==EQUAL)
                    state=FRACTION;
                else /* we know enough */
                    return ret;
            } else /* else s2 has more digits than s1 */
                return LESS;
        } else if(*s2=='.') /* s1 has more digits than s2 */
            return GREATER;
        /* do digit comparision, recording state of first differing digit */
        if(ret==EQUAL) {
            if(*s1 < *s2)
                ret=LESS;
            else if(*s1 > *s2)
                ret=GREATER;
        }
        /* in fractional part, first differing digit is enough */
        if(state==FRACTION && ret!=EQUAL)
            return ret;
        /* move to next digit */
        s1++;
        s2++;
    }
    /* so one is shorter than the other */
    if(*s1) { /* s1 is longest */
        /* special case; what if a decimal point and then all zeros? */
        if(*s1=='.') {
            const char* tmp=s1+1;
            while(*tmp=='0') tmp++; /* scan for first non-zero digit */
            if(!*tmp) /* s1 was all zeros after decimal point then.. */
                return EQUAL;
        } else
            return GREATER;
    }
    if(*s2) { /* s2 is longest */
        /* special case; what if a decimal point and then all zeros? */
        if(*s2=='.') {
            const char* tmp=s2+1;
            while(*tmp=='0') tmp++; /* scan for first non-zero digit */
            if(!*tmp) /* s2 was all zeros after decimal point then.. */
                return EQUAL;
        } else
            return LESS;
    }
    /* they are the same length */
    return EQUAL;
}

Guess this posting-before-unittesting is a no-hire ;-)
New nick, new rep
Friday, September 02, 2005
 
 
Great, real no-hire I am!

latest fix, right near the bottom:

    if(*s1) { /* s1 is longest */
        /* special case; what if a decimal point and then all zeros? */
        if(*s1=='.' || state==FRACTION) {

.. ditto for s2

Really in real life I do unittest things to bits.  Honest.
New nick, new rep
Friday, September 02, 2005
 
 
"Guess this posting-before-unittesting is a no-hire ;-)"

Not only not hired, you're fired!

"Really in real life I do unittest things to bits. Honest."

[X-Files music fading in]
I want to believe.
[music fading out]
hidden agenda
Friday, September 02, 2005
 
 
Ok, a couple of changes later and I'm pretty sure the code passes all the tests, and it compiles with far fewer warnings.

The method signatures should have been:

const char* move_to_first_non_zero_digit(const char* string);
int count_digits_before_decimal_point(const char* string);
int digit_by_digit_comparison(const char* s1, const char* s2);

Changed code:

  /* ... */
  if (!*s1 && *s2)
  {
      result = -(*move_to_first_non_zero_digit(s2) != 0);
  }

  /* ... */
  else if (!*s2 && *s1)
  {
      result = *move_to_first_non_zero_digit(s1) != 0;
  }

becomes:

  /* ... */
  if (!*s1 && *s2)
  {
      s2 = move_to_first_non_zero_digit(s2);
      if (*s2 == '.')
      {
        s2 = move_to_first_non_zero_digit(s2 + 1);
      }
      result = -(*s2 != 0);
  }

  /* ... */
  else if (!*s2 && *s1)
  {
      s1 = move_to_first_non_zero_digit(s1);
      if (*s1 == '.')
      {
        s1 = move_to_first_non_zero_digit(s1 + 1);
      }
      result = (*s1 != 0);
  }
Jordan Stewart Send private email
Friday, September 02, 2005
 
 
I looked at all the versions above with special logic--testing length, testing before and after decimal point, etc., and found them somehow unsatisfying.

Late to the party I know, but here's a version with simple linear logic:

int numcmp(const char* s1, const char* s2)
{
    const char *dp1, *dp2, *e1, *e2, *p1, *p2;
    int dig1, dig2;

    /* find decimal point in first */
    for (dp1 = s1; *dp1 && *dp1 != '.'; ++dp1)
        ;

    /* find end of first */
    for (e1 = dp1; *e1; ++e1)
        ;

    /* find decimal point in second */
    for (dp2 = s2; *dp2 && *dp2 != '.'; ++dp2)
        ;

    /* find end of second */
    for (e2 = dp2; *e2; ++e2)
        ;

    /* compare strings */
    p1 = (dp1 - s1 > dp2 - s2) ? s1 : dp1 - (dp2 - s2);
    p2 = (dp2 - s2 > dp1 - s1) ? s2 : dp2 - (dp1 - s1);
    while (p1 < e1 || p2 < e2)
    {
        dig1 = (p1 >= s1 && p1 < e1 && *p1 != '.') ? *p1 : '0';
        dig2 = (p2 >= s2 && p2 < e2 && *p2 != '.') ? *p2 : '0';
        if (dig1 > dig2) return 1;
        if (dig1 < dig2) return -1;
        ++p1, ++p2;
    }
    return 0;
}

It's maybe a little inefficient in that it scans the whole of both strings before starting the compare, but it is robust in that it is immune to possible failure on special cases.
Ian Boys Send private email
Saturday, September 03, 2005
 
 
Hi

Looking from another perspective...

All the above code will no doubt be their respective authors' pride, but from a point of view of production code, would it not be much simpler, and reliable, to use strtod() on the two strings and compare the results?
I'm sure it would be about 5 lines of code...

It might be fun to reinvent wheels, but its tough to get a small, round, standard, and fast wheel.

Regards
Vivek
Vivek N Send private email
Saturday, September 03, 2005
 
 
Vivek,

strtod() will lose precision on longer inputs, producing wrong results. It may seem to be an academical question, but there may as well be special circumstances where this function is needed, e.g. in a bigfloat implementation (bigint with fractions).
Secure
Saturday, September 03, 2005
 
 
Ian Boys, that's an elegant symmetric solution that "avoids avoidable case analysis" - the core of the problem is "normalisation" of the strings (i.e., the decimal point position), not leading or trailing zeroes.

(It gets a bit more complicated if moving pointers _left_ past the parameters, or pointers arithmetic (subtracting) is not kosher - it would require moving pointers back from dp's towards s's, whichever comes first, and examining the rest at the left...).
ako
Saturday, September 03, 2005
 
 
Here is my code which does most of its comparisons right to left as a novelty :

// mostly compares backward through number strings
// so the last comparison is most important
// we just overwrite the answer (big)
// whereas a forward compare needs the first unequal comparison
// and then returns early

int numcmp(const char *s1, const char *s2) {
    const char *s[2];
    const char *start[2];
    const char *tl, *hd;
    int i,big;
    int big_decode[]={1,-1,0};
    start[0]=s1;
    start[1]=s2;
    //scan to (implied) decimal point to align numbers
    for (i=0;i<2;i++) {
        s[i]=start[i];
        while (*(s[i])!='.' && *(s[i])) s[i]++;
    }
    //scan to end of shortest string
    while (*(s[0]) && *(s[1])) {
        s[0]++;
        s[1]++;
    }
    big=2; //2 means equal
    //overhanging tail is bigger if nonzero
    i=(*(s[0])==0);
    if (*(s[i])) {
        tl=s[i];
        while (*tl=='0' || *tl=='.')
            tl++;
        if (*tl)
            big=i;
    }
    //compare backward through overlapping numbers
    while (s[0]!=start[0] && s[1]!=start[1]) {
        if (*(--s[0])!=*(--s[1]))
            big=(*(s[1])>*(s[0]));
    }
    //overhanging head is bigger if nonzero
    i=(s[0]==start[0]);
    if (s[i]!=start[i]) {
        hd=start[i];
        while (hd!=s[i] && *hd=='0') hd++;
        if (hd!=s[i])
            big=i;
    }
    return big_decode[big];
}
setsquare Send private email
Sunday, September 04, 2005
 
 
Just for the kicks, here is my version which seems to work. Can we now start optimizations - to begin with whose version was the quickest and/or shorest?

#include<stdio.h>
#include<stdlib.h>

static int
numcmp(const char *s1, const char *s2);


int
main(int argc, const char* argv[])
{
    int retval=0;
    if(argc != 3) {
        printf("Two arguments required\n");
        exit(-9);
    }
    retval = numcmp(argv[1], argv[2]);
    printf("Retval is %d\n", retval);
    return 0;
}

#define EQUAL 0
#define GT 1
#define LT -1

int numcmp(const char* left, const char* right)
{
    int zadl=0, zadr=0;
    /* Knock off leading zeros */
    while(*left == '0'  && *left != '\0' && *left != '.')
        left++;
    while(*right == '0' && *right != '\0' && *right != '.')
        right++;

    if(*left == '\0' &&  *right == '\0')
        return EQUAL;
    
    /* Left = 0.xxx , Right X.xxx */
    if (*left == '.' && *right != '.')
        return LT;
    /* Reverse of above */
    if (*left != '.' && *right == '.')
        return GT;
    /* Both numbers begin with a . - 0.xxx)    */
    
    if( *left == '.' && *right == '.')
    {
        left++;
        right++;
    }

    if (*left > *right)
        return GT;
    if (*left < *right)
        return LT;
     while( *left && *left == '0') {
        left++;    
        zadl++;
    }
    
    while( *right && *right == '0') {
        right++;
        zadr++;
    }
    if(*left == '\0' && *right == '\0')
        return EQUAL;

    else if(zadl < zadr )
        return GT;

    else if(zadl > zadr)
        return LT;

    while (*left && *right)
    {
        left++;
        right++;
        if(*left == *right)
            continue;
        else if(*left > *right)
            return GT;
        else if(*left < *right)
            return LT;
    }    
    /* Not reached */
    return 0;
}
parry Send private email
Monday, September 05, 2005
 
 
Sorry, but in your case we aren't ready to start optimizations, viz;

 whoops! 321 , 54 = -1
 whoops! 54 , 321 = 1
 whoops! 321.6 , 00054 = -1
 whoops! 54 , 000321.6 = 1
 whoops! 321.6 , 54.7 = -1
 whoops! 54.000 , 321.0 = 1
 whoops! 321.6 , 54.00000 = -1
 whoops! 54.00000 , 00000321.6 = 1
 whoops! 0 , 0000000.000000000000 = 1
 whoops! 123. , 123.00000000 = -1
 whoops! 123.000 , 123 = 1
 whoops! 123 , 123. = -1

I must admit I am a little puzzled at the tendency to keep on posting new solutions that are not competitive at all in terms of important metrics like correctness, clarity, brevity, speed. Of these correctness is the most important and a decent test suite has been posted to help out there.

In case it's not clear I am not directing this criticism only at the most recent poster.

I would be interested in pursuing this whole topic further, (whose solution is best, further optimization etc.), but on JoS unfortunately I think a thread doesn't maintain vitality once it's a few days old and down the page a bit. So I am not too hopeful of very much more vigorous debate.
Bill Forster Send private email
Monday, September 05, 2005
 
 
whoops! 321 , 54 = -1

Sorry but you didn't read the spec, did you?

BTW I am also admitting that the program is not perfect yet - just a 15 min hack if you will. But I did read the spec :)
parry Send private email
Monday, September 05, 2005
 
 
OK, parry, please explain. In what spec-conforming sense is 321 less than 54?
Ian Boys Send private email
Monday, September 05, 2005
 
 
Oh crap - I read the spec as if it said the string will have at least one decimal point, actually it says "at most" one decimal - my bad.

Sorry pals. Bad day!
parry Send private email
Monday, September 05, 2005
 
 
Bill: One way to renew interest and continue debate is to start a new thread with a slightly different subject.

In acknowledging ako's valid criticism about the pitfalls of moving pointers too far to the left, I humbly submit the following fixed up version that uses offsets instead of raw pointers. I believe it is close to the shortest possible non-obfuscated implementation, and will give the correct result for all inputs, but I would be curious about its speed compared to codes that look for special cases and exit early.

int numcmp(const char* s1, const char* s2)
{
    const char *dp1, *dp2;
    long lf1, lf2, li1, li2, i;
    int dig1, dig2;

    /* find decimal point in first, and length of integer part */
    for (dp1 = s1, li1 = 0; *dp1 && *dp1 != '.'; ++dp1, --li1)
        ;

    /* find length of fractional part */
    for (lf1 = 0; dp1[lf1]; ++lf1)
        ;

    /* find decimal point in second, and length of integer part */
    for (dp2 = s2, li2 = 0; *dp2 && *dp2 != '.'; ++dp2, --li2)
        ;

    /* find length of fractional part */
    for (lf2 = 0; dp2[lf2]; ++lf2)
        ;

    /* compare strings */
    i = (li1 < li2) ? li1 : li2;
    while (i < lf1 || i < lf2)
    {
        dig1 = (i >= li1 && i < lf1 && dp1[i] != '.') ? dp1[i] : '0';
        dig2 = (i >= li2 && i < lf2 && dp2[i] != '.') ? dp2[i] : '0';
        if (dig1 > dig2) return 1;
        if (dig1 < dig2) return -1;
        ++i;
    }
    return 0;
}
Ian Boys Send private email
Monday, September 05, 2005
 
 
Hmmm. I suppose that to reduce the op count, one should replace

    for (dp1 = s1, li1 = 0; *dp1 && *dp1 != '.'; ++dp1, --li1)
        ;

with

    for (dp1 = s1; *dp1 && *dp1 != '.'; ++dp1)
        ;
    li1 = s1 - dp1;

And maybe replace

    for (lf1 = 0; dp1[lf1]; ++lf1)
        ;

with

    for (e1 = dp1; *e1; ++e1)
        ;
    lf1 = e1 - dp1;

Though these changes make the line count greater, which is unsatisfying.
Ian Boys Send private email
Monday, September 05, 2005
 
 
Well in the interests of wasting more time when I should really be doing something more productive I have taken up the challenge of satisfying Ian Boys' curiosity about the speed of various implementations. Basically I have done some crude timings, on my home Windows box. The timing varies a little from run to run (not surprisingly, I didn't make any effort to run as anything but a low priority user process), but is nonetheless pretty consistent.

Here is the results of a typical run of my program;

secure: Time elapsed = 9824
bill: Time elapsed = 11176
ian original: Time elapsed = 12858
ian modified: Time elapsed = 13189
ian modified enhanced: Time elapsed = 13089
setsquare: Time elapsed = 12318

I will post the full program below. I used Visual C++ 6.0 with default options. (I used C, not C++ mode).

My criteria for selecting implementations was a little loose. If you have posted an implementation that passes the test suite first posted by Secure I apologise for missing you out. Basically I looked hard at any implementation posted after Secure's. I discounted anyone who obviously wasn't testing their code. I looked at a selection of posts before Secure's, but didn't find anything that didn't have obvious problems. I finally decided it wasn't my job to do everyone's testing for them. However if you did post a solid implementation very early I have missed you out and I apologise. Feel free to add your implementation to the correctness/speed test below and repost.

My comments on the qualifying implementations are as follows;

Ian Boys. An algorithm that works uniformly before and after the decimal point, is definitely an important and worthwhile addition. As you admit, unlikely to be the quickest and it turned out that way. Not sure you really needed to submit modified versions.
Secure. First to post a solid, demonstrably tested implementation, and fastest execution time. Not commented and non-obvious, but very nice job nevertheless.
Setsquare. Good job.
Me. I retain a sentimental attachment to my own implementation. I think it is the best commented and (consequently?) most easily understood version (very subjective I know). I like the way I don't bail out early (one return statement only at end - setsquare also has this feature). Second best performance is a reasonable effort.

Here is the correctness plus speed test suite;

#include <windows.h>    // use GetTickCount()

int numcmp_secure (const char *s1, const char *s2)
{
    int res = 0, dot1, dot2, val1, val2;

    while (*s1 == '0') s1++;
    while (*s2 == '0') s2++;

    dot1 = 0;
    while ((*s1 || *s2) && !dot1)
      {
        dot1 = ((*s1 == '.') || (*s1 == '\0'));
        dot2 = ((*s2 == '.') || (*s2 == '\0'));

        if (dot1 && !dot2)
            return -1;
        else if (dot2 && !dot1)
            return +1;
        else if ((res == 0) && !dot1)
            res = (*s1 > *s2) - (*s1 < *s2);

        if (*s1) s1++;
        if (*s2) s2++;
      }

    while ((*s1 || *s2) && (res == 0))
      {
        val1 = ( *s1 ? *s1++ : '0' );
        val2 = ( *s2 ? *s2++ : '0' );
        res = (val1 > val2) - (val1 < val2);
      }

    return res;
}

// Compare non-negative real numbers in string form
//  return +ve if s1>s2, -ve if s2>s2, 0 if s1==s2
int numcmp_bill( const char *s1, const char *s2 )
{
    int diff=0;
    const char *mid1, *end1;
    const char *mid2, *end2;
    char c, d;

    // Point s1 beyond leading zeroes, end1 to end
    //  mid1 to '.' or to end1 if no '.'
    while( *s1 == '0' )
        s1++;
    for( mid1=s1; *mid1 && *mid1!='.'; mid1++ )
        ;
    for( end1=mid1; *end1; end1++ )
        ;

    // Repeat for s2
    while( *s2 == '0' )
        s2++;
    for( mid2=s2; *mid2 && *mid2!='.'; mid2++ )
        ;
    for( end2=mid2; *end2; end2++ )
        ;

    // Look at region between before mid (integer part)
    for( ; s1<mid1 && s2<mid2; s1++,s2++ )
    {
        if( diff==0 && *s1!=*s2 ) // first diff ?
            diff = (*s1-*s2); // decides only when
                              //  lengths are same
    }

    // If integer part digits left over, lengths differ
    if( s1 < mid1 )
        diff = 1;  // s1 longer than s2, so s1 bigger
    else if( s2 < mid2 )
        diff = -1; // s2 longer than s1, so s2 bigger

    // Look at fractional part if diff is still zero
    for( s1=mid1+1, s2=mid2+1;
        diff==0 && (s1<end1||s2<end2);
        s1++, s2++
      )
    {
        c = s1<end1 ? *s1 : '0';
        d = s2<end2 ? *s2 : '0';
        diff = c-d;  // first difference will decide
    }
    return diff;
}

int numcmp_ian_original(const char* s1, const char* s2)
{
    const char *dp1, *dp2, *e1, *e2, *p1, *p2;
    int dig1, dig2;

    /* find decimal point in first */
    for (dp1 = s1; *dp1 && *dp1 != '.'; ++dp1)
        ;

    /* find end of first */
    for (e1 = dp1; *e1; ++e1)
        ;

    /* find decimal point in second */
    for (dp2 = s2; *dp2 && *dp2 != '.'; ++dp2)
        ;

    /* find end of second */
    for (e2 = dp2; *e2; ++e2)
        ;

    /* compare strings */
    p1 = (dp1 - s1 > dp2 - s2) ? s1 : dp1 - (dp2 - s2);
    p2 = (dp2 - s2 > dp1 - s1) ? s2 : dp2 - (dp1 - s1);
    while (p1 < e1 || p2 < e2)
    {
        dig1 = (p1 >= s1 && p1 < e1 && *p1 != '.') ? *p1 : '0';
        dig2 = (p2 >= s2 && p2 < e2 && *p2 != '.') ? *p2 : '0';
        if (dig1 > dig2) return 1;
        if (dig1 < dig2) return -1;
        ++p1, ++p2;
    }
    return 0;
}

int numcmp_ian_modified(const char* s1, const char* s2)
{
    const char *dp1, *dp2;
    long lf1, lf2, li1, li2, i;
    int dig1, dig2;

    /* find decimal point in first, and length of integer part */
    for (dp1 = s1, li1 = 0; *dp1 && *dp1 != '.'; ++dp1, --li1)
        ;

    /* find length of fractional part */
    for (lf1 = 0; dp1[lf1]; ++lf1)
        ;

    /* find decimal point in second, and length of integer part */
    for (dp2 = s2, li2 = 0; *dp2 && *dp2 != '.'; ++dp2, --li2)
        ;

    /* find length of fractional part */
    for (lf2 = 0; dp2[lf2]; ++lf2)
        ;

    /* compare strings */
    i = (li1 < li2) ? li1 : li2;
    while (i < lf1 || i < lf2)
    {
        dig1 = (i >= li1 && i < lf1 && dp1[i] != '.') ? dp1[i] : '0';
        dig2 = (i >= li2 && i < lf2 && dp2[i] != '.') ? dp2[i] : '0';
        if (dig1 > dig2) return 1;
        if (dig1 < dig2) return -1;
        ++i;
    }
    return 0;
}

int numcmp_ian_modified_enhanced(const char* s1, const char* s2)
{
    const char *e1, *dp1, *dp2;
    long lf1, lf2, li1, li2, i;
    int dig1, dig2;

    /* find decimal point in first, and length of integer part */
    for (dp1 = s1; *dp1 && *dp1 != '.'; ++dp1)
        ;
    li1 = s1 - dp1;

    /* find length of fractional part */
    for (e1 = dp1; *e1; ++e1)
        ;
    lf1 = e1 - dp1;

    /* find decimal point in second, and length of integer part */
    for (dp2 = s2, li2 = 0; *dp2 && *dp2 != '.'; ++dp2, --li2)
        ;

    /* find length of fractional part */
    for (lf2 = 0; dp2[lf2]; ++lf2)
        ;

    /* compare strings */
    i = (li1 < li2) ? li1 : li2;
    while (i < lf1 || i < lf2)
    {
        dig1 = (i >= li1 && i < lf1 && dp1[i] != '.') ? dp1[i] : '0';
        dig2 = (i >= li2 && i < lf2 && dp2[i] != '.') ? dp2[i] : '0';
        if (dig1 > dig2) return 1;
        if (dig1 < dig2) return -1;
        ++i;
    }
    return 0;
}


int numcmp_setsquare(const char *s1, const char *s2) {
    const char *s[2];
    const char *start[2];
    const char *tl, *hd;
    int i,big;
    int big_decode[]={1,-1,0};
    start[0]=s1;
    start[1]=s2;
    //scan to (implied) decimal point to align numbers
    for (i=0;i<2;i++) {
        s[i]=start[i];
        while (*(s[i])!='.' && *(s[i])) s[i]++;
    }
    //scan to end of shortest string
    while (*(s[0]) && *(s[1])) {
        s[0]++;
        s[1]++;
    }
    big=2; //2 means equal
    //overhanging tail is bigger if nonzero
    i=(*(s[0])==0);
    if (*(s[i])) {
        tl=s[i];
        while (*tl=='0' || *tl=='.')
            tl++;
        if (*tl)
            big=i;
    }
    //compare backward through overlapping numbers
    while (s[0]!=start[0] && s[1]!=start[1]) {
        if (*(--s[0])!=*(--s[1]))
            big=(*(s[1])>*(s[0]));
    }
    //overhanging head is bigger if nonzero
    i=(s[0]==start[0]);
    if (s[i]!=start[i]) {
        hd=start[i];
        while (hd!=s[i] && *hd=='0') hd++;
        if (hd!=s[i])
            big=i;
    }
    return big_decode[big];
}


typedef int (*fptr)( const char *s1, const char *s2 );

void speed( fptr f )
{
    f( "321", "54" );
    f( "54", "321" );
    f( "321.6", "00054" );
    f( "54", "000321.6" );
    f( "321.6", "54.7" );
    f( "54.000", "321.0" );
    f( "321.6", "54.00000" );
    f( "54.00000", "00000321.6" );
    f( "0", "0000000.000000000000");
    f( "123.", "123.00000000");
    f( "123.000", "123");
    f( "123.001", "123");
    f( "123", "123.001");
    f( "123", "123.");
    f( "123", ".123");
    f( "00000100000.0001", "00000100000.0002");
    f( "1234567890.", "1234567891.");
    f( ".1234567890", ".1234567891");
    f( "12345.0001", "0001234.0001");
    f( "0512345.0001", "0412345.0001");
    f( "0512345.0001", "0612345.0001");
}

void check( fptr f, const char *name,
            int expect, const char *s1, const char *s2 )
{
    int res;
    res = f(s1, s2);
    if( res > 1 )
        res = 1;
    else if( res < -1 )
        res = -1;
    if( res != expect )
        printf(" %s whoops! %s , %s = %d\n",
                              name, s1, s2, res);
}

void test( fptr f, const char *name )
{
    check( f,name, 1,"321", "54" );
    check( f,name,-1,"54", "321" );
    check( f,name, 1,"321.6", "00054" );
    check( f,name,-1,"54", "000321.6" );
    check( f,name, 1,"321.6", "54.7" );
    check( f,name,-1,"54.000", "321.0" );
    check( f,name, 1,"321.6", "54.00000" );
    check( f,name,-1,"54.00000", "00000321.6" );
    check( f,name, 0,"0", "0000000.000000000000");
    check( f,name, 0,"123.", "123.00000000");
    check( f,name, 0,"123.000", "123");
    check( f,name, 1,"123.001", "123");
    check( f,name,-1,"123", "123.001");
    check( f,name, 0,"123", "123.");
    check( f,name, 1,"123", ".123");
    check( f,name,-1,"00000100000.0001", "00000100000.0002");
    check( f,name,-1,"1234567890.", "1234567891.");
    check( f,name,-1,".1234567890", ".1234567891");
    check( f,name, 1,"12345.0001", "0001234.0001");
    check( f,name, 1,"0512345.0001", "0412345.0001");
    check( f,name,-1,"0512345.0001", "0612345.0001");
}

int main( void )
{
    unsigned long i, j, t1, t2;
    fptr f[] =
    {
        numcmp_secure,
        numcmp_bill,
        numcmp_ian_original,
        numcmp_ian_modified,
        numcmp_ian_modified_enhanced,
        numcmp_setsquare
    };
    const char *names[] =
    {
        "secure",
        "bill",
        "ian original",
        "ian modified",
        "ian modified enhanced",
        "setsquare"
    };
    for( i=0; i<sizeof(names)/sizeof(names[0]); i++ )
    {
        test( f[i], names[i] );
        t1 = GetTickCount();
        for( j=0; j<1000000; j++ )
            speed( f[i] );
        t2 = GetTickCount();
        printf( "%s: Time elapsed = %lu\n",
                                    names[i], t2-t1 );
    }
    return 0;
}
Bill Forster Send private email
Tuesday, September 06, 2005
 
 
My own version:

int numcmp(const char *s1, const char *s2)
{
    const char *p1 = s1;
    const char *p2 = s2;

    // go past leading zeros
    while (*p1 == '0') p1++;
    while (*p2 == '0') p2++;
    
    // count the characters in the leading integer portions of the strings
    const char *decimal1 = p1;
    const char *decimal2 = p2;

    while (*decimal1 != 0 && *decimal1 != '.') decimal1++;
    while (*decimal2 != 0 && *decimal2 != '.') decimal2++;

    size_t count1 = decimal1 - p1;
    size_t count2 = decimal2 - p2;

    // longer integers are bigger
    if (count1 < count2) return -1;
    if (count1 > count2) return 1;

    // There are the same number of digits in the integer portions of the string
    // so now compare all the characters in the integer string individually
    while (p1 != decimal1) {
        // NOTE: its unnecessary to test p2 != decimal2, since char counts are equal

        char c1 = *p1++;
        char c2 = *p2++;        

        if (c1 < c2 ) return -1;
        if (c1 > c2) return 1;
    }

    // advance pointers past decimal point
    if (*p1 == '.') p1++;
    if (*p2 == '.') p2++;

    // now compare the fractional portion of the string
    // do this until the characters differ or a string ends
    while (*p1 != 0 && *p2 != 0) {
        char c1 = *p1++;
        char c2 = *p2++;

        if (c1 < c2 ) return -1;        
        if (c1 > c2) return 1;
    }

    // skip past any zero characters that remain in the strings
    // (at least one string is at the null now, but we
    // need not explicitly test)
    while (*p1 == '0') p1++;
    while (*p2 == '0') p2++;

    // compare the final characters (one string may still have more characters
    // but it doesn't matter). The only possible characters are NULL
    // and '1' through '9' ('0' is not possible, we explicitly skipped them).
    // The NULL character will act as a '0' however for comparison purposes
    if (*p1 < *p2 ) return -1;        
    if (*p1 > *p2) return 1;

    return 0;
}

test results:

secure: Time elapsed = 1922
bill: Time elapsed = 1969
ian original: Time elapsed = 2484
ian modified: Time elapsed = 2453
ian modified enhanced: Time elapsed = 2453
setsquare: Time elapsed = 2485
damien: Time elapsed = 1187

Amateurs :)
Damien Katz Send private email
Tuesday, September 06, 2005
 
 
Nice job at producing a speed optimized algorithm. However seeing as I don't think anyone else was explicitly aiming for speed, (for example I made it clear from the start I was optimizing for clarity and readability), your arrogant sign off is unjustified (as arrogance so often is).

Fortunately I have to go out now or no doubt I would feel compelled to waste even more time writing the fastest version I can come up with.
Bill Forster Send private email
Tuesday, September 06, 2005
 
 
Wow. I actually wrote that several days ago ( http://damienkatz.net/2005/09/fun_problem.html ). I didn't time it until I saw you had created something to time all the solutions together. So I was curious.
Damien Katz Send private email
Tuesday, September 06, 2005
 
 
Damian, except that size_t shuld be replaced by sth like ptrdiff_t (I quit programming a couple years ago and I wondered if pointer subtraction was safe here in the face of the following requirement: "The program should not pose any restriction on the length of the strings". However, in ISO C "the result of pointer subtraction is always an integral type" (i.e., defined), but size_t is defined for other purposes).

Ian, both left and right sides are equally forbidden of course, so I should have said it in a more general way: "using pointers, even if not dereferenced, outside the scope of the strings, may be undefined".
ako
Tuesday, September 06, 2005
 
 
Since Bill missed my implementation, I've added it (and Damien's) to the correctness/speed test. I tried the test out (on Win XP with mingw gcc 4.3.2) with a variety of optimisation levels. The following results are pretty typical, but very unscientific.


no optimisations:

secure: Time elapsed = 5448
bill: Time elapsed = 5879
ian original: Time elapsed = 7250
ian modified: Time elapsed = 7181
ian modified enhanced: Time elapsed = 7240
setsquare: Time elapsed = 6720
jordan: Time elapsed = 4566
damien: Time elapsed = 3155


-O1

secure: Time elapsed = 4757
bill: Time elapsed = 4266
ian original: Time elapsed = 5869
ian modified: Time elapsed = 6049
ian modified enhanced: Time elapsed = 6048
setsquare: Time elapsed = 5258
jordan: Time elapsed = 3665
damien: Time elapsed = 2774


-O2

secure: Time elapsed = 5038
bill: Time elapsed = 4506
ian original: Time elapsed = 5738
ian modified: Time elapsed = 5749
ian modified enhanced: Time elapsed = 5628
setsquare: Time elapsed = 4706
jordan: Time elapsed = 3495
damien: Time elapsed = 2885


-O3

secure: Time elapsed = 4907
bill: Time elapsed = 4476
ian original: Time elapsed = 5698
ian modified: Time elapsed = 5779
ian modified enhanced: Time elapsed = 5628
setsquare: Time elapsed = 4586
jordan: Time elapsed = 2724
damien: Time elapsed = 2764


the code:


#include <windows.h>    // use GetTickCount()

int numcmp_secure (const char *s1, const char *s2)
{
    int res = 0, dot1, dot2, val1, val2;

    while (*s1 == '0') s1++;
    while (*s2 == '0') s2++;

    dot1 = 0;
    while ((*s1 || *s2) && !dot1)
      {
        dot1 = ((*s1 == '.') || (*s1 == '\0'));
        dot2 = ((*s2 == '.') || (*s2 == '\0'));

        if (dot1 && !dot2)
            return -1;
        else if (dot2 && !dot1)
            return +1;
        else if ((res == 0) && !dot1)
            res = (*s1 > *s2) - (*s1 < *s2);

        if (*s1) s1++;
        if (*s2) s2++;
      }

    while ((*s1 || *s2) && (res == 0))
      {
        val1 = ( *s1 ? *s1++ : '0' );
        val2 = ( *s2 ? *s2++ : '0' );
        res = (val1 > val2) - (val1 < val2);
      }

    return res;
}

// Compare non-negative real numbers in string form
//  return +ve if s1>s2, -ve if s2>s2, 0 if s1==s2
int numcmp_bill( const char *s1, const char *s2 )
{
    int diff=0;
    const char *mid1, *end1;
    const char *mid2, *end2;
    char c, d;

    // Point s1 beyond leading zeroes, end1 to end
    //  mid1 to '.' or to end1 if no '.'
    while( *s1 == '0' )
        s1++;
    for( mid1=s1; *mid1 && *mid1!='.'; mid1++ )
        ;
    for( end1=mid1; *end1; end1++ )
        ;

    // Repeat for s2
    while( *s2 == '0' )
        s2++;
    for( mid2=s2; *mid2 && *mid2!='.'; mid2++ )
        ;
    for( end2=mid2; *end2; end2++ )
        ;

    // Look at region between before mid (integer part)
    for( ; s1<mid1 && s2<mid2; s1++,s2++ )
    {
        if( diff==0 && *s1!=*s2 ) // first diff ?
            diff = (*s1-*s2); // decides only when
                              //  lengths are same
    }

    // If integer part digits left over, lengths differ
    if( s1 < mid1 )
        diff = 1;  // s1 longer than s2, so s1 bigger
    else if( s2 < mid2 )
        diff = -1; // s2 longer than s1, so s2 bigger

    // Look at fractional part if diff is still zero
    for( s1=mid1+1, s2=mid2+1;
        diff==0 && (s1<end1||s2<end2);
        s1++, s2++
      )
    {
        c = s1<end1 ? *s1 : '0';
        d = s2<end2 ? *s2 : '0';
        diff = c-d;  // first difference will decide
    }
    return diff;
}

int numcmp_ian_original(const char* s1, const char* s2)
{
    const char *dp1, *dp2, *e1, *e2, *p1, *p2;
    int dig1, dig2;

    /* find decimal point in first */
    for (dp1 = s1; *dp1 && *dp1 != '.'; ++dp1)
        ;

    /* find end of first */
    for (e1 = dp1; *e1; ++e1)
        ;

    /* find decimal point in second */
    for (dp2 = s2; *dp2 && *dp2 != '.'; ++dp2)
        ;

    /* find end of second */
    for (e2 = dp2; *e2; ++e2)
        ;

    /* compare strings */
    p1 = (dp1 - s1 > dp2 - s2) ? s1 : dp1 - (dp2 - s2);
    p2 = (dp2 - s2 > dp1 - s1) ? s2 : dp2 - (dp1 - s1);
    while (p1 < e1 || p2 < e2)
    {
        dig1 = (p1 >= s1 && p1 < e1 && *p1 != '.') ? *p1 : '0';
        dig2 = (p2 >= s2 && p2 < e2 && *p2 != '.') ? *p2 : '0';
        if (dig1 > dig2) return 1;
        if (dig1 < dig2) return -1;
        ++p1, ++p2;
    }
    return 0;
}

int numcmp_ian_modified(const char* s1, const char* s2)
{
    const char *dp1, *dp2;
    long lf1, lf2, li1, li2, i;
    int dig1, dig2;

    /* find decimal point in first, and length of integer part */
    for (dp1 = s1, li1 = 0; *dp1 && *dp1 != '.'; ++dp1, --li1)
        ;

    /* find length of fractional part */
    for (lf1 = 0; dp1[lf1]; ++lf1)
        ;

    /* find decimal point in second, and length of integer part */
    for (dp2 = s2, li2 = 0; *dp2 && *dp2 != '.'; ++dp2, --li2)
        ;

    /* find length of fractional part */
    for (lf2 = 0; dp2[lf2]; ++lf2)
        ;

    /* compare strings */
    i = (li1 < li2) ? li1 : li2;
    while (i < lf1 || i < lf2)
    {
        dig1 = (i >= li1 && i < lf1 && dp1[i] != '.') ? dp1[i] : '0';
        dig2 = (i >= li2 && i < lf2 && dp2[i] != '.') ? dp2[i] : '0';
        if (dig1 > dig2) return 1;
        if (dig1 < dig2) return -1;
        ++i;
    }
    return 0;
}

int numcmp_ian_modified_enhanced(const char* s1, const char* s2)
{
    const char *e1, *dp1, *dp2;
    long lf1, lf2, li1, li2, i;
    int dig1, dig2;

    /* find decimal point in first, and length of integer part */
    for (dp1 = s1; *dp1 && *dp1 != '.'; ++dp1)
        ;
    li1 = s1 - dp1;

    /* find length of fractional part */
    for (e1 = dp1; *e1; ++e1)
        ;
    lf1 = e1 - dp1;

    /* find decimal point in second, and length of integer part */
    for (dp2 = s2, li2 = 0; *dp2 && *dp2 != '.'; ++dp2, --li2)
        ;

    /* find length of fractional part */
    for (lf2 = 0; dp2[lf2]; ++lf2)
        ;

    /* compare strings */
    i = (li1 < li2) ? li1 : li2;
    while (i < lf1 || i < lf2)
    {
        dig1 = (i >= li1 && i < lf1 && dp1[i] != '.') ? dp1[i] : '0';
        dig2 = (i >= li2 && i < lf2 && dp2[i] != '.') ? dp2[i] : '0';
        if (dig1 > dig2) return 1;
        if (dig1 < dig2) return -1;
        ++i;
    }
    return 0;
}


int numcmp_setsquare(const char *s1, const char *s2) {
    const char *s[2];
    const char *start[2];
    const char *tl, *hd;
    int i,big;
    int big_decode[]={1,-1,0};
    start[0]=s1;
    start[1]=s2;
    //scan to (implied) decimal point to align numbers
    for (i=0;i<2;i++) {
        s[i]=start[i];
        while (*(s[i])!='.' && *(s[i])) s[i]++;
    }
    //scan to end of shortest string
    while (*(s[0]) && *(s[1])) {
        s[0]++;
        s[1]++;
    }
    big=2; //2 means equal
    //overhanging tail is bigger if nonzero
    i=(*(s[0])==0);
    if (*(s[i])) {
        tl=s[i];
        while (*tl=='0' || *tl=='.')
            tl++;
        if (*tl)
            big=i;
    }
    //compare backward through overlapping numbers
    while (s[0]!=start[0] && s[1]!=start[1]) {
        if (*(--s[0])!=*(--s[1]))
            big=(*(s[1])>*(s[0]));
    }
    //overhanging head is bigger if nonzero
    i=(s[0]==start[0]);
    if (s[i]!=start[i]) {
        hd=start[i];
        while (hd!=s[i] && *hd=='0') hd++;
        if (hd!=s[i])
            big=i;
    }
    return big_decode[big];
}


const char* move_to_first_non_zero_digit(const char* string)
{
  while (*string && *string == '0') ++string;
  return string;
}


int count_digits_before_decimal_point(const char* string)
{
  int count = 0;
  while (*string && *string != '.')
  {
      ++string;
      ++count;
  }
  return count;
}


int digit_by_digit_comparison(const char* s1, const char* s2)
{
  int result;
 
  /* compare the strings digit by digit until one string runs (or both strings run) out, or the digits don't match */
  while (*s1 && *s2 && *s1 == *s2)
  {
      ++s1;
      ++s2;
  }
 
  /* if we've come to the end of s1, but not s2, s2 can be greater than s1 by having a non-zero digit */
  if (!*s1 && *s2)
  {
      s2 = move_to_first_non_zero_digit(s2);
      if (*s2 == '.')
      {
        s2 = move_to_first_non_zero_digit(s2 + 1);
      }
      result = -(*s2 != 0);
  }

  /* conversely, if s2 is done, s1 can be greater by having a non-zero digit */
  else if (!*s2 && *s1)
  {
      s1 = move_to_first_non_zero_digit(s1);
      if (*s1 == '.')
      {
        s1 = move_to_first_non_zero_digit(s1 + 1);
      }
      result = (*s1 != 0);
  }

  /* if both are done (i.e. both equal) or we found an inequality between adjacent digits, return a single digit comparison */
  else
  {
      result = *s1 - *s2;
  }

  return result;
}


int numcmp_jordan(const char *s1, const char *s2)
{
  int result;
  int s1_length;
  int s2_length;

  s1 = move_to_first_non_zero_digit(s1);
  s2 = move_to_first_non_zero_digit(s2);

  s1_length = count_digits_before_decimal_point(s1);
  s2_length = count_digits_before_decimal_point(s2);

  if (s1_length != s2_length)
  {
      result = s1_length - s2_length;
  }
  else
  {
      result = digit_by_digit_comparison(s1, s2);
  }

  return result;
}


int numcmp_damien(const char *s1, const char *s2)
{
    const char *p1 = s1;
    const char *p2 = s2;

    // go past leading zeros
    while (*p1 == '0') p1++;
    while (*p2 == '0') p2++;
   
    // count the characters in the leading integer portions of the strings
    const char *decimal1 = p1;
    const char *decimal2 = p2;

    while (*decimal1 != 0 && *decimal1 != '.') decimal1++;
    while (*decimal2 != 0 && *decimal2 != '.') decimal2++;

    size_t count1 = decimal1 - p1;
    size_t count2 = decimal2 - p2;

    // longer integers are bigger
    if (count1 < count2) return -1;
    if (count1 > count2) return 1;

    // There are the same number of digits in the integer portions of the string
    // so now compare all the characters in the integer string individually
    while (p1 != decimal1) {
        // NOTE: its unnecessary to test p2 != decimal2, since char counts are equal

        char c1 = *p1++;
        char c2 = *p2++;       

        if (c1 < c2 ) return -1;
        if (c1 > c2) return 1;
    }

    // advance pointers past decimal point
    if (*p1 == '.') p1++;
    if (*p2 == '.') p2++;

    // now compare the fractional portion of the string
    // do this until the characters differ or a string ends
    while (*p1 != 0 && *p2 != 0) {
        char c1 = *p1++;
        char c2 = *p2++;

        if (c1 < c2 ) return -1;       
        if (c1 > c2) return 1;
    }

    // skip past any zero characters that remain in the strings
    // (at least one string is at the null now, but we
    // need not explicitly test)
    while (*p1 == '0') p1++;
    while (*p2 == '0') p2++;

    // compare the final characters (one string may still have more characters
    // but it doesn't matter). The only possible characters are NULL
    // and '1' through '9' ('0' is not possible, we explicitly skipped them).
    // The NULL character will act as a '0' however for comparison purposes
    if (*p1 < *p2 ) return -1;       
    if (*p1 > *p2) return 1;

    return 0;
}


typedef int (*fptr)( const char *s1, const char *s2 );

void speed( fptr f )
{
    f( "321", "54" );
    f( "54", "321" );
    f( "321.6", "00054" );
    f( "54", "000321.6" );
    f( "321.6", "54.7" );
    f( "54.000", "321.0" );
    f( "321.6", "54.00000" );
    f( "54.00000", "00000321.6" );
    f( "0", "0000000.000000000000");
    f( "123.", "123.00000000");
    f( "123.000", "123");
    f( "123.001", "123");
    f( "123", "123.001");
    f( "123", "123.");
    f( "123", ".123");
    f( "00000100000.0001", "00000100000.0002");
    f( "1234567890.", "1234567891.");
    f( ".1234567890", ".1234567891");
    f( "12345.0001", "0001234.0001");
    f( "0512345.0001", "0412345.0001");
    f( "0512345.0001", "0612345.0001");
}

void check( fptr f, const char *name,
            int expect, const char *s1, const char *s2 )
{
    int res;
    res = f(s1, s2);
    if( res > 1 )
        res = 1;
    else if( res < -1 )
        res = -1;
    if( res != expect )
        printf(" %s whoops! %s , %s = %d\n",
                              name, s1, s2, res);
}

void test( fptr f, const char *name )
{
    check( f,name, 1,"321", "54" );
    check( f,name,-1,"54", "321" );
    check( f,name, 1,"321.6", "00054" );
    check( f,name,-1,"54", "000321.6" );
    check( f,name, 1,"321.6", "54.7" );
    check( f,name,-1,"54.000", "321.0" );
    check( f,name, 1,"321.6", "54.00000" );
    check( f,name,-1,"54.00000", "00000321.6" );
    check( f,name, 0,"0", "0000000.000000000000");
    check( f,name, 0,"123.", "123.00000000");
    check( f,name, 0,"123.000", "123");
    check( f,name, 1,"123.001", "123");
    check( f,name,-1,"123", "123.001");
    check( f,name, 0,"123", "123.");
    check( f,name, 1,"123", ".123");
    check( f,name,-1,"00000100000.0001", "00000100000.0002");
    check( f,name,-1,"1234567890.", "1234567891.");
    check( f,name,-1,".1234567890", ".1234567891");
    check( f,name, 1,"12345.0001", "0001234.0001");
    check( f,name, 1,"0512345.0001", "0412345.0001");
    check( f,name,-1,"0512345.0001", "0612345.0001");
}

int main( void )
{
    unsigned long i, j, t1, t2;
    fptr f[] =
    {
        numcmp_secure,
        numcmp_bill,
        numcmp_ian_original,
        numcmp_ian_modified,
        numcmp_ian_modified_enhanced,
        numcmp_setsquare,
        numcmp_jordan,
        numcmp_damien
    };
    const char *names[] =
    {
        "secure",
        "bill",
        "ian original",
        "ian modified",
        "ian modified enhanced",
        "setsquare",
        "jordan",
        "damien"
    };
   
    for( i=0; i<sizeof(names)/sizeof(names[0]); i++ )
    {
        test( f[i], names[i] );
        t1 = GetTickCount();
        for( j=0; j<1000000; j++ )
            speed( f[i] );
        t2 = GetTickCount();
        printf( "%s: Time elapsed = %lu\n", names[i], t2-t1 );
    }
    system("pause");
    return 0;
}
Jordan Stewart Send private email
Tuesday, September 06, 2005
 
 
Speed, speed measurement and speed optimizations are always relative. One of the main points of this function is the handling of the dots. Okay, here is a little modification of the speed test. It creates random strings with a fixed dot position. Use your favourite random generator for it, I've used the Mersenne Twister:

void speed( fptr f )
{
    int i, j;
    char aStr1[20], aStr2[20];

    for (i = 0; i < 3; i++)
      {
        for (j = 0; j < 19; j++)
          {
            aStr1[j] = '0' + (char)(RandomDice() % 10);
            aStr2[j] = '0' + (char)(RandomDice() % 10);
          }
        aStr1[15] = aStr2[15] = '.';
        aStr1[19] = aStr2[19] = '\0';
        f(aStr1, aStr2);
      }
}

Calling loop in main:

    RandomSeed(255); /* equal values for all */
    for( j=0; j<1000000; j++ )
        speed( f[i] );

gcc without optimizations:

secure: Time elapsed = 20455
bill: Time elapsed = 20555
ian original: Time elapsed = 19975
ian modified: Time elapsed = 20065
ian modified enhanced: Time elapsed = 20140
setsquare: Time elapsed = 21790
damien: Time elapsed = 19525

gcc with -O3:

secure: Time elapsed = 9909
bill: Time elapsed = 9265
ian original: Time elapsed = 8595
ian modified: Time elapsed = 8560
ian modified enhanced: Time elapsed = 8480
setsquare: Time elapsed = 9405
damien: Time elapsed = 8325

Surprisingly, my own implementation turns out to be worst for this case, at least in the compiler optimized version.
But most of the differences are now relativated by the additional randomization overhead, especially over a long runtime in the unoptimized version.

But in fact, I don't care. My implementation seems to be correct, it runs in reasonable time with linear complexity, I think it is readable, and with a bit of commenting it would be maintainable. If the final application turns out to be too slow, I will measure the performance and invest my time on the identified bottlenecks, instead of developing a highly optimized but quite unreadable version now, just to find that the final application always hits a special case where this version goes crazy.
Secure
Tuesday, September 06, 2005
 
 
Finally, there is even a killer case for Damien, giving a penalty on all 2-pass implementations (1st pass: search the dot, 2nd pass: compare the digits).

void speed( fptr f )
{
    int  i, j;
    char aStr1[100], aStr2[100];

    for (i = 0; i < 3; i++)
      {
        for (j = 0; j < 99; j++)
            aStr1[j] = aStr2[j] = '0' + (char)(RandomDice() % 10);
        aStr1[95] = aStr2[95] = '.';
        aStr1[98] = '1';
        aStr2[98] = '2';
        aStr1[99] = aStr2[99] = '\0';
        f(aStr1, aStr2);
      }
}

secure: Time elapsed = 32424
bill: Time elapsed = 28225
ian original: Time elapsed = 29775
ian modified: Time elapsed = 29920
ian modified enhanced: Time elapsed = 30030
setsquare: Time elapsed = 27045
damien: Time elapsed = 28055

Finally finally, after applying the code design rule "common case first" to my own implementation, changing this:

        if (dot1 && !dot2)
            return -1;
        else if (dot2 && !dot1)
            return +1;
        else if ((res == 0) && !dot1)
            res = (*s1 > *s2) - (*s1 < *s2);

to this:

        if (!dot1 && !dot2)
          {
            if (res == 0)
                res = (*s1 > *s2) - (*s1 < *s2);
          }
        else if (!dot2)
            return -1;
        else if (!dot1)
            return +1;

my implementation is no longer the worst. ;)

secure: Time elapsed = 28669
bill: Time elapsed = 28230
ian original: Time elapsed = 29760
ian modified: Time elapsed = 29925
ian modified enhanced: Time elapsed = 30080
setsquare: Time elapsed = 27065
damien: Time elapsed = 28060
Secure
Tuesday, September 06, 2005
 
 
Yah, pretty happy with mine at last!

I put my function into the test (the version using the test data, not the more recent dot-handling test).

secure: Time elapsed = 2907
bill: Time elapsed = 2687
ian original: Time elapsed = 3328
ian modified: Time elapsed = 3547
ian modified enhanced: Time elapsed = 3250
setsquare: Time elapsed = 3203
jordan: Time elapsed = 1688
damien: Time elapsed = 1734
new_nick: Time elapsed = 1594

int numcmp_new_nick(const char* s1,const char* s2) {
    enum { WHOLE_NUMBER, FRACTION } state = WHOLE_NUMBER;
    enum { LESS = -1, EQUAL, GREATER } ret = EQUAL;
    /* trim leading zeros */
    while(*s1=='0') s1++;
    while(*s2=='0') s2++;
    /* compare digits one at a time, while we have two digits to compare */
    while(*s1 && *s2) {
        /* moving to a fractional part? */
        if(*s1=='.') {
            /* do we have to compare the fractional part too? */
            if(*s2=='.') {
                if(ret==EQUAL)
                    state=FRACTION;
                else /* we know enough */
                    return ret;
            } else /* else s2 has more digits than s1 */
                return LESS;
        } else if(*s2=='.') /* s1 has more digits than s2 */
            return GREATER;
        /* do digit comparision, recording state of first differing digit */
        if(ret==EQUAL) {
            if(*s1 < *s2) {
              if(state==FRACTION) /* in fractional part, first differing digit is enough */
                  return LESS;
              else
                  ret=LESS;
            } else if(*s1 > *s2) {
                if(state==FRACTION) /* in fractional part, first differing digit is enough */
                  return GREATER;
                else
                  ret=GREATER;
            }
        }
        /* move to next digit */
        s1++;
        s2++;
    }
    /* so one is shorter than the other */
    if(*s1) { /* s1 is longest */
        /* special case; what if a decimal point and then all zeros? */
        if(state==FRACTION || *s1=='.') {
            const char* tmp=(state==FRACTION)?s1:s1+1;
            while(*tmp=='0') tmp++; /* scan for first non-zero digit */
            if(!*tmp) /* s1 was all zeros after decimal point then.. */
                return EQUAL;
        }
        return GREATER;
    }
    if(*s2) { /* s2 is longest */
        /* special case; what if a decimal point and then all zeros? */
        if(state==FRACTION || *s2=='.') {
            const char* tmp=(state==FRACTION)?s2:s2+1;
            while(*tmp=='0') tmp++; /* scan for first non-zero digit */
            if(!*tmp) /* s2 was all zeros after decimal point then.. */
                return EQUAL;
        }
        return LESS;
    }
    /* they are the same length */
    return EQUAL;
}

See, I had to test it *some time* ;-)
new nick, new rep
Tuesday, September 06, 2005
 
 
Okay, here are some collected and re-performanced results:

Test suite, normal:
secure: Time elapsed = 8705
secure (ccf): Time elapsed = 8250
bill: Time elapsed = 11250
ian original: Time elapsed = 13745
ian modified: Time elapsed = 13945
ian modified enhanced: Time elapsed = 14110
setsquare: Time elapsed = 13506
jordan: Time elapsed = 8694
damien: Time elapsed = 7155
new_nick: Time elapsed = 5745

Test suite, -O3:
secure: Time elapsed = 6775
secure (ccf): Time elapsed = 5020
bill: Time elapsed = 7100
ian original: Time elapsed = 8465
ian modified: Time elapsed = 8040
ian modified enhanced: Time elapsed = 7625
setsquare: Time elapsed = 7650
jordan: Time elapsed = 3705
damien: Time elapsed = 4135
new_nick: Time elapsed = 3370

Random strings[20], fixed dot, normal:
secure: Time elapsed = 19868
secure (ccf): Time elapsed = 19290
bill: Time elapsed = 19840
ian original: Time elapsed = 19370
ian modified: Time elapsed = 19440
ian modified enhanced: Time elapsed = 19410
setsquare: Time elapsed = 21075
jordan: Time elapsed = 19020
damien: Time elapsed = 18860
new_nick: Time elapsed = 18425

Random strings[20], fixed dot, -O3:
secure: Time elapsed = 9850
secure (ccf): Time elapsed = 8700
bill: Time elapsed = 9205
ian original: Time elapsed = 8565
ian modified: Time elapsed = 8490
ian modified enhanced: Time elapsed = 8390
setsquare: Time elapsed = 9300
jordan: Time elapsed = 8280
damien: Time elapsed = 8245
new_nick: Time elapsed = 8050

Equal strings[100], normal:
secure: Time elapsed = 21745
secure (ccf): Time elapsed = 21160
bill: Time elapsed = 20595
ian original: Time elapsed = 23030
ian modified: Time elapsed = 23825
ian modified enhanced: Time elapsed = 24075
setsquare: Time elapsed = 21640
jordan: Time elapsed = 21335
damien: Time elapsed = 21570
new_nick: Time elapsed = 19575

Equal strings[100], -O3:
secure: Time elapsed = 11173
secure (ccf): Time elapsed = 9375
bill: Time elapsed = 9385
ian original: Time elapsed = 9915
ian modified: Time elapsed = 10175
ian modified enhanced: Time elapsed = 10025
setsquare: Time elapsed = 9070
jordan: Time elapsed = 8715
damien: Time elapsed = 9350
new_nick: Time elapsed = 7725

As is often the case, one of the most unreadable and unmaintainable versions is the fastest. ;)
Secure
Tuesday, September 06, 2005
 
 
ako, I think your concerns about moving pointers outside the scope of the strings are in practical terms unfounded. It is defined by the C language that an array of objects will be stored consecutively in memory, and that pointer addition and subtraction within such an array of objects will produce the expected result. It follows almost unavoidably that the underlying representation of a pointer must be some kind of unsigned integer type. Our only concern therefore is whether we happen to advance a pointer beyond the largest value that it can hold, or whether we happen to decrease it below the zero address. In modern machines with large virtual address spaces, the probability of this happening is vanishingly small, but with outlandish input data it might theoretically happen.
Ian Boys Send private email
Tuesday, September 06, 2005
 
 
"ako, I think your concerns about moving pointers outside the scope of the strings are in practical terms unfounded."

Ian, I didn't mean to split hairs, and to be sure, who has ever checked an int_64t i++ against overflow? Yet I've seen it (the pointer thing) happen with depressing results, hence my allergy to statistical correctness.

{I've also seen some very peculiar memory models, which, being probably nowhere near the ISO C, neatly illustrated how dramatic a difference could be between a pointer and an address...).
ako
Tuesday, September 06, 2005
 
 
size_t is the correct integer datatype to use for pointer arithmetic. It varies in size with machine architecture.
Damien Katz Send private email
Tuesday, September 06, 2005
 
 
Damien,

"size_t is the correct integer datatype to use for pointer arithmetic."

Not always, it depends on the operation you want to perform. The ANSI Standard says:

"7.17 Common definitions <stddef.h>
The following types and macros are defined in the standard header <stddef.h>. Some are also defined in other headers, as noted in their respective subclauses.
The types are
  ptrdiff_t
which is the signed integer type of the result of subtracting two pointers;
  size_t
which is the unsigned integer type of the result of the sizeof operator; and
  wchar_t
[...]"

When using size_t, you should take care to not get negative results when subtracting two pointers.

Ian,

for an array and a pointer into this array
  type a[3], *p;
there are only 4 valid values for p:
&a[0], &a[1], &a[2], &a[3]
The latter must not be dereferenced, of course.

The same applies for intermediate result pointers:
  p = &a[1];
  p = (p + 3) - 2;
Though the overall result is valid, the intermediate result is not, resulting in undefined behaviour. The correct way would be
  p = p + (3 - 2);
Secure
Tuesday, September 06, 2005
 
 
Secure, I know what you say is correct and I will not dispute it. That is why I felt inclined to modify my original code above. I see also that I should have used ptrdiff_t to store the offsets rather than a long int.

At the same time I do believe that for short-lived code on known hardware the theoretical problems are not likely to surface. When one is hacking, one can joyfully break a few rules. For library code it's a different story of course.
Ian Boys Send private email
Tuesday, September 06, 2005
 
 
Ian,

I understand this, and I think the higher art of programming is to know when to break the rules. I have no problem to do it myself.

On the other hand, I'm pretty sure that if I would have the possibility and if I would care to ever take a deeper look into the application's and tool's source codes which I use on a daily basis, I would instantly deinstall most of them, never entrusting them one more byte of my data again. I know that most programmers don't even care about the rules, be it on the language level, the algorithm level, or the code design level.

I've done something like this myself, even in production code. Have you ever used a construct like this?

/*
 * Chip-Matrix:
 * Chip-Matrix    pppChipMatrix[ChipXSize] -> pp[ChipYSize] -> pppSubMatrix
 * SubChip-Matrix pppSubMatrix[SubXSize] -> pp[SubYSize] -> pChip
 */

  tsCHIP *****pppppMatrix;

  tsCHIP *pChip = pLay->pppppMatrix[Cx][Cy][Sx][Sy];

Any subarray dynamically allocated, of course. ;)

Should I be proud that I can handle 5 pointer levels with ease? Or should I be ashamed that I didn't took the time to introduce an intermediate Field(Xsize,Ysize) object, handling the two identical array indexings? This is production code, I sold it for a reasonable amount of money. Unmaintainable for anyone but me, and f*cking dangerous with mostly uncontrolled array indexing.

And that's why I don't care about the source code of my applications and tools, regularly taking non-overwriting backups of the changes in my data, knowing that I better should not implant more than 20.000 objects in a list or tree view in any of those tools, because it would either slow down to death or die in a crash.

Look at all the different sources here in this thread. A lot of them use quite fragile looking pointer operations. Did all of them really know what they were doing? How should I know? How should you know that I knew what I did there? Their numcmps are working, that's all that counts, and when it is hidden inside an application, no one cares anymore.

I don't know if I'm a really good programmer, but at least, when reading thedailywtf, I know for sure I'm not a bad one. ;)
Secure
Tuesday, September 06, 2005
 
 
I agree with you. I think it is a really good thing if code is written in a way that makes it "obvious" or as obvious as possible that the code is correct and cannot have fail cases. That is why I was fond of my particular implementation above; since there were no branches in the logic, there were no permutations of different paths through the code to explore when checking that every path was correct.

Now, going against everything I have just said, I have drawn from all the different approaches that everyone has demonstrated and come up with a feasibly smallest working implementation that also has reasonably good performance:

int numcmp(const char* s1, const char* s2)
{
    int dp1 = 0, dp2 = 0, cmp = 0, res = 0;
    char dig1, dig2;

    while (*s1 == '0') ++s1;
    while (*s2 == '0') ++s2;
    for (; !res && (*s1 || *s2); *s1 && ++s1, *s2 && ++s2)
    {
        dig1 = (!*s1 || *s1 == '.') ? dp1|=1, '0' : *s1;
        dig2 = (!*s2 || *s2 == '.') ? dp2|=1, '0' : *s2;
        if (!cmp) cmp = (dig1 > dig2) - (dig2 > dig1);
        res = (dp1 && dp2) ? cmp : dp2 - dp1;
    }
   
    return res;
}

It's a little obfuscated, but fun...
Ian Boys Send private email
Tuesday, September 06, 2005
 
 
Whooah.... great! ;)
Secure
Tuesday, September 06, 2005
 
 
It's all about engineering compromises isn't it ? Ian I think your latest best-of-all-approaches merged version is admirable in a lot of ways. No one could argue that it's not both concise and to the point. And it avoids special cases as you say. But (as you realise and admit) you've embraced obfuscation and sacrificed readability and clarity. Also you haven't addressed optimum speed. Now admittedly speed is perhaps the least important metric that we've discussed, but at least it's readily measurable.

I had a look at my own version to see whether I could address speed. No one likes being accused of being an amatuer. I found that I could keep almost all of the merits of my approach, and greatly improve the speed, by avoiding an initial pass to initialize middle and end pointers. I simply compare to '.' and '\0' in the main body of the algorithm. For what it's worth my improved version avoids all pointer comparisons (I never did other types of pointer arithmetic) and keeps the pointers constrained within the strings. Unfortunately I have sacrificed my usual "return once at end" policy to shave a few cycles. My new version is posted below.

Comparing my version to Ian's latest, my version has good speed and remains easy to understand. Ian's is concise and  avoids special case analysis. As I say it's all about compromises. As Secure pointed out, it's no surprise that the fastest implementation to date (no nick) is also one of (or perhaps the most) convoluted.

I used my original timing function (not Secure's improved version - maybe he can retest?) to retest all the versions to date. Ian's, Secure's and my latest changes are the so called "improved" versions at the bottom. I had to modify Damien's code to avoid C++ just in time variable declarations (my understanding is that this is strictly a C language discussion, and the examples should be compiled by a C compiler not a C++ compiler);

Speed test results;

secure: Time elapsed = 9784
bill: Time elapsed = 11026
ian original: Time elapsed = 12909
ian modified: Time elapsed = 13479
ian modified enhanced: Time elapsed = 13339
setsquare: Time elapsed = 12248
jordan: Time elapsed = 7801
damien: Time elapsed = 6299
new nick: Time elapsed = 5278
secure improved: Time elapsed = 9313
ian improved: Time elapsed = 11837
bill improved: Time elapsed = 6119

Improved code;

// Compare non-negative real numbers in string form
//  return +ve if s1>s2, -ve if s2>s2, 0 if s1==s2
int numcmp_bill_improved( const char *s1, const char *s2 )
{
    int diff=0;

    // Point beyond leading zeroes
    while( *s1 == '0' )
        s1++;
    while( *s2 == '0' )
        s2++;

    // Look at integer part
    for( ; *s1 && *s1!='.' && *s2 && *s2!='.'; s1++,s2++ )
    {
        if( diff==0 && *s1!=*s2 ) // first diff ?
            diff = (*s1-*s2); // decides only when
                              //  lengths are same
    }

    // If integer part digits left over, lengths differ
    if( *s1 && *s1!='.' )
        return 1;  // s1 longer than s2, so s1 bigger
    else if( *s2 && *s2!='.' )
        return -1; // s2 longer than s1, so s2 bigger

    // Otherwise if there was a difference, it decides
    else if( diff )
        return diff;

    // Integer parts same, so check fractional parts
    if( *s1 == '.' )
        s1++;
    if( *s2 == '.' )
        s2++;
    while( *s1 || *s2 )
    {
        char c = *s1 ? *s1++ : '0';
        char d = *s2 ? *s2++ : '0';
        if( c-d )
            return( c-d );
    }
    return 0;
}
Bill Forster Send private email
Tuesday, September 06, 2005
 
 
Well now, I was a little surprised at the variation in speed of different versions there. I hadn't run the test suite with the full set of examples before, so just as a sanity check I ran it now using the MSVC 6.0 compiler with full optimizations (-Ox). The results are relatively similar to Bill's measurements:

secure: Time elapsed = 1969
bill: Time elapsed = 1766
ian original: Time elapsed = 2437
setsquare: Time elapsed = 2219
jordan: Time elapsed = 1656
damien: Time elapsed = 969
new_nick: Time elapsed = 984
secure improved: Time elapsed = 1891
ian improved: Time elapsed = 1781
bill improved: Time elapsed = 1125

I find this interesting because I think it is far from obvious that you could readily predict how much faster some versions would be than others just from looking at the code.

We are often advised to avoid premature optimization, but there is a factor of about two there between the fastest and slowest versions. Imagine if whole application suites were coded using slower constructs rather than faster constructs, what an impact it might have on the overall feel.

The fastest performing versions appear to have "spread out" their algorithms, carrying out small steps sequentially, and not trying to achieve too much in one step. I wonder if this is a general lesson for writing faster code?
Ian Boys Send private email
Tuesday, September 06, 2005
 
 
<annoying pedant mode>
Oh yeah, I almost forgot, none of these solutions is *truly* portable. Because C doesn't specify a character encoding, none of these solutions would work with an encoding where a '1' maps to a higher number than '9'. I don't believe there is a completely portable solution.
</annoying pedant mode>

But I doubt such an encoding even exists, much less a compiler that supports it.

And sorry if I offended with the amatuers crack. I can see how it can be insulting, but I meant it jokingly.
ronk!
Tuesday, September 06, 2005
 
 
I've done a few size comparisons of the resulting assembly code.

1. ian1    218 bytes
2. ian2    233 bytes
3. damien  240 bytes
4. secure  241 bytes
5. newnick 253 bytes
6. forster 266 bytes
7. jordan  272 bytes
8. setsquare 340 bytes
9. soft duck (OP) 406 bytes

I worked this out by compiling in gcc version 3.3.2 with flags -Os -c and then running the object file through objdump -D and then scanning through the disassembly output.

My test suite is slightly extended and I get
whoops! 1,9 = 0
whoops! 9,1 = 0
output from new nick's code (Sorry)
setsquare Send private email
Tuesday, September 06, 2005
 
 
setsquare: Nice catch with the extended test suite. Can I ask whether the "forster" size is my improved version or my original version ?

ronk!: Good point about the character encoding thing. I can think of one way to make a portable implementation in spite of this objection; replace instances of code like;
 diff = (c-d);
with
 diff = character_compare(c,d);
throughout one of the solid implementations. Coding up character_compare() is feasible because even if you make no assumptions about character encoding, there are only 10*10 = 100 possible input combinations, so for example something like this;

int character_compare( char c, char d )
{
    if( c=='0' && d=='0' )
        return 0;
    else if( c=='0' && d=='1' )
        return -1;

    // ...<snip 96 or so boring cases>...

    else if( c=='9' && d=='8' )
        return 1;
    else if( c=='9' && d=='9' )
        return 0;
    else
        assert(1); // or whatever
}
   
No doubt with some ingenuity it would even be possible to code up something fast. But as you say, this is all fantasyland stuff.

Also, assuming Ronk! is an alias for Damien Katz, thanks for the apology. Possibly I am oversensitive as no one else indicated they were annoyed.
Bill Forster Send private email
Tuesday, September 06, 2005
 
 
... of course that should be assert(0); // or whatever
Grrrr I hate making gratuitous little errors.
Bill Forster Send private email
Tuesday, September 06, 2005
 
 
I just looked in K & R 2nd Ed. where on page 23 they say: "This works only if '0', '1', ..., '9' have consecutive increasing values. Fortunately, this is true for all character sets."

In embracing pedantry, we don't have to worry about theoretical possibilities that don't actually exist. ;-)
Ian Boys Send private email
Tuesday, September 06, 2005
 
 
Bill Forster:
>  Can I ask whether the "forster" size is my improved version or my original version ?

The Original.

I haven't yet done Ian's third version nor secure's upgrade either.

I've got several versions of Linux and one Windows on my hard disk. The one I use for web browsing is different from my C compiling OS, so I have to move things between partitions and reboot before I get new results.
setsquare Send private email
Wednesday, September 07, 2005
 
 
I've done an improved version of mine, and in the process found a missed test case: comparing two equal strings. e.g.

    check( f,name, 0,"123", "123");
    check( f,name, 0,"123.0", "123.0");

No-one fails this case, but at one point I had an implementation that failed them, but passed the others.

The following optimised version of mine includes one addition that could be unsafe, which is to assume that '.' doesn't contain 0x10 and that '0' to '9' do.


Anyway, here goes:


int numcmp_jordan_improved(const char *s1, const char *s2)
{
  int result = 0;

  const char *s1_tail;
  const char *s2_tail;
  ptrdiff_t  s1_length;
  ptrdiff_t  s2_length;
  char        s1_char;
  char        s2_char;


  /* find the first digit of the non-fractional part of s1 */
  while (*s1 == '0') ++s1;

  /* find the end of the non-fractional part of s1 */
  s1_tail = s1;
  while (*s1_tail & 0x10) ++s1_tail;

  /* find the first digit of the non-fractional part of s2 */
  while (*s2 == '0') ++s2;

  /* find the end of the non-fractional part of s2 */
  s2_tail = s2;
  while (*s2_tail & 0x10) ++s2_tail;

  /* find the length of the fractional parts of s1 and s2 */
  s1_length = s1_tail - s1;
  s2_length = s2_tail - s2;

  /* if the lengths are equal then we need to do a digit by digit comparison */
  if (s1_length == s2_length)
  {
      /* compare characters from the strings until they don't match, or we hit the end of them both */
      do
      {
        s1_char = *s1++;
        s2_char = *s2++;
      }
      while (s1_char == s2_char && s1_char);

      /* if we didn't hit the end of s1 */
      if (s1_char)
      {
        /* if we didn't hit the end of either string */
        if (s2_char)
        {
            /* the inequality is between two digits, so the result can be a comparison between them */
            result = s1_char - s2_char;
        }
        /* if we hit the end of s2 */
        else
        {
            /* if there is a non-zero digit in [the remainder of] s1's fractional part then s1 > s2 */
            do
            {
              ++s1_tail;
            }
            while (*s1_tail == '0');

            /* if we hit the end of the string, this will be zero */
            result = *s1_tail;
        }
      }
      /* if we hit the end of s1, but not s2 */
      else if (s2_char)
      {
        /* if there is a non-zero digit in [the remainder of] s2's fractional part then s2 > s1 */
        do
        {
            ++s2_tail;
        }
        while (*s2_tail == '0');

        /* if we hit the end of the string, this will be zero */
        result = -*s2_tail;
      }
      /* else we hit the end of both strings, and both were equal */
  }
  /* if the lengths are inequal then we can just choose the number with the longest non-fractional part */
  else
  {
      if (s1_length > s2_length)
      {
        result = 1;
      }
      else
      {
        result = -1;
      }
  }

  return result;
}
Jordan Stewart Send private email
Wednesday, September 07, 2005
 
 
'I just looked in K & R 2nd Ed. where on page 23 they say: "This works only if '0', '1', ..., '9' have consecutive increasing values. Fortunately, this is true for all character sets."'

It is guaranteed by the ANSI Standard:

"the 10 decimal digits 0 1 2 3 4 5 6 7 8 9
[...]
In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous."
Secure
Wednesday, September 07, 2005
 
 
lol, the final lines of my function should, of course, be:

    /* they are the same length */
    return ret;
}

-----------

Think there might be a forum here at JoS or somewhere else that has this kind of thread on a regular basis?  It seems to be a welcome passtime for us here..
new nick, new rep
Wednesday, September 07, 2005
 
 
Ok folks, I can blow all the previous results out of the water!!

secure: Time elapsed = 2937
bill: Time elapsed = 2594
ian original: Time elapsed = 3344
ian modified: Time elapsed = 3500
ian modified enhanced: Time elapsed = 3266
setsquare: Time elapsed = 3156
jordan: Time elapsed = 1750
damien: Time elapsed = 1734
new_nick: Time elapsed = 1531
new_nick_cached: Time elapsed = 329

Shows what idle hands can do..

int numcmp_new_nick_cached(const char* s1,const char* s2) {
    int ret;
    /* cache the most recent n comparisions */
    static struct {
        const char* s1;
        const char* s2;
        int ret;
    } cache[100];
    static unsigned len = 0; /* number cached */
    /* this comparision cached? */
    static unsigned prev = 0;
    for(prev++; prev<len; prev++) { /* this assumes that the cached result after the previous cached result is the most likely to occur next.. */
        if(cache[prev].s1==s1 && cache[prev].s2==s2) {
            /* found cached result */
            return cache[prev].ret;
        }
    }
    /* else calculate and cache it */
    ret = numcmp_new_nick(s1,s2);
    if(len<100) {
        cache[len].s1=s1;
        cache[len].s2=s2;
        cache[len].ret = ret;
        len++;
    }
    prev = 0;
    return ret;
}
new nick, new rep
Wednesday, September 07, 2005
 
 
Except that your solution is basically an optimisation specifically for this particular benchmark: try running it through Secure's Mersenne benchmark...
Jordan Stewart Send private email
Wednesday, September 07, 2005
 
 
but that's not to say that it isn't a pretty cool idea :)
Jordan Stewart Send private email
Wednesday, September 07, 2005
 
 
Here is the updated assembly code byte sizes table

 181 jordan2
 215 ian3
 218 ian1
 221 forster2
 221 secure-ccf
 233 ian2
 240 damien
 241 secure
 253 newnick
 266 forster
 272 jordan
 330 setsquare
 406 duck
setsquare Send private email
Thursday, September 08, 2005
 
 
Thanks for taking the time to make these measurements again.
Bill Forster Send private email
Thursday, September 08, 2005
 
 
Heres my attempt at fast routine:


int numcmp_setsquare6g(const char *s1, const char *s2) {
    int ret,tmp;
    const char *decimal1,*decimal2;
    decimal1=s1;
    decimal2=s2;
    //search for decimal pt in shortest integer
    while (*decimal1 & *decimal2 & 0x10) {
        decimal1++;
        decimal2++;
        //unrolled loop
        if (!(*decimal1 & *decimal2 & 0x10)) break;
        decimal1++;
        decimal2++;
        if (!(*decimal1 & *decimal2 & 0x10)) break;
        decimal1++;
        decimal2++;
        if (!(*decimal1 & *decimal2 & 0x10)) break;
        decimal1++;
        decimal2++;
    }
    //make sure front overhang is all zeros
    if ((*decimal2 & 0x10)==0) {
        while (*s1=='0' && (*decimal1 & 0x10)) {
            s1++;
            decimal1++;
            if (!(*s1=='0' && (*decimal1 & 0x10))) break;
            s1++;
            decimal1++;
            if (!(*s1=='0' && (*decimal1 & 0x10))) break;
            s1++;
            decimal1++;
            if (!(*s1=='0' && (*decimal1 & 0x10))) break;
            s1++;
            decimal1++;
        }
        if (*decimal1 & 0x10) return 1;
    } else {
        while ( *s2=='0' && (*decimal2 & 0x10)) {
            s2++;
            decimal2++;
            if (!( *s2=='0' && (*decimal2 & 0x10))) break;
            s2++;
            decimal2++;
            if (!( *s2=='0' && (*decimal2 & 0x10))) break;
            s2++;
            decimal2++;
            if (!( *s2=='0' && (*decimal2 & 0x10))) break;
            s2++;
            decimal2++;
        }
        if (*decimal2 & 0x10) return -1;
    }
    //now compare overlapping section
    while (*s1 && *s1==*s2) {
        s1++;
        s2++;
       
        if (!(*s1 && *s1==*s2)) break;
        s1++;
        s2++;
        if (!(*s1 && *s1==*s2)) break;
        s1++;
        s2++;
        if (!(*s1 && *s1==*s2)) break;
        s1++;
        s2++;
       
    }
    //check rear overhang is all zeros
    if (*s2==0) {
        if (*s1=='.') s1++;
        while (*s1=='0') s1++;
        if (*s1) return 1;
        else return 0;
    }
    if (*s1==0) {
        if (*s2=='.') s2++;
        while (*s2=='0') s2++;
        if (*s2) return -1;
        else return 0;
    }
    //check unmatched character
    if (*s1>*s2) return 1;
    else return -1;
}
setsquare Send private email
Friday, September 09, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz