The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Converting size_t to int

I have a function that reads a string from the network.  The string is unmarshaled like this:

int ReadString(SOCKET s, char *buf, size_t len)
{
  UINT32 size;

  if(ReadFully(&size, sizeof(size)) != 0)
    return -1;

  if(ReadFully(buf, min(len, size)) != 0)
    return -1;

  buf[len] = '\0';

  if(len < size)
  {
    char *temp = new[size - len];
    int r = ReadFully(temp, size - len);
    delete [] temp;
    if(r != 0)
      return -1;
  }

  return min(len, size);
}

Here's a chunk of code that needs to convert a size_t to an int.  The function has to report error conditions using negative numbers, but the maximum allowable space should be a size_t, by convention and by the fact that there's no such thing as negative space.

This causes a compiler warning on VC++ 7.1 of "conversion from 'size_t' to 'int', possible loss of data".

Anyone have any elegant techniques to do this without compiler warnings.  Dirty hacks are not desired.  One way is to say, "Screw it.  Who cares?"  But I'd rather not.
Ryan Phelps Send private email
Tuesday, September 06, 2005
 
 
size_t is unsigned.  You cannot convert it to a signed int without possibly having an overflow.  This is what your compiler is warning about.

For your specific compiler, I believe size_t is unsigned int.  That means that a cast to unsigned int should work without problems.  Alternatively, if you need to also store signedness, try casting to long long (long may also work, I forget).

You simply cannot convert from size_t to signed int without possibly losing data.  That's like asking how you can store 512 unique values in an 8-bit variable.  And although you _could_ turn off the compiler warning, it'll still leave a bug there.
Chris in Edmonton Send private email
Tuesday, September 06, 2005
 
 
Chris:

You're totally right on everything.  I guess I'm really looking for a best-practice here.  Sounds like it's best to simply have the compiler warnings.

This caused me pain recently when a size_t on a 64 bit g++ was 8 bytes instead of 4.  Now I'm converting all the size_t's to int's before sending them across the wire, to disk, whatever, and simply accepting the warnings.  I haven't found a better solution yet.
Ryan Phelps Send private email
Tuesday, September 06, 2005
 
 
I assume that the return statement is making the problems:

  return min(len, size);

I see two solutions. First, do a simple cast:

  return (int)min(len, size);

It is already part of the interface definition that no string longer than INT_MAX can be returned, because the return value can't express its length. Fortunately, you can externally control this with the len argument. Or you could put the test into the function itself:

  if (min(len, size) > INT_MAX)
      return -1;
  return (int)min(len, size);

Of cause you should realize that this flags a coding weakness, not a transmission error.

The second and cleaner solution is to separate flags from values:

int ReadString(SOCKET s, char *buf, size_t len, size_t *result)
Secure
Tuesday, September 06, 2005
 
 
One important thing more. This part

char *temp = new[size - len];
int r = ReadFully(temp, size - len);

has the potential to make your system vulnerable to network attacks by faking the size to large amounts, still small enough to be successfully allocated. In a multithreaded serverprocess with multiple connections it can bring the system to its knees, swapping itself to death.

***NEVER*** do allocations controlled by externally given and internally not sanity-controlled data.
Secure
Tuesday, September 06, 2005
 
 
Well, it is a problem.  By design, size_t is not a specific number of bits.  You are forced to assume it is.

The alternative would have been for C to assume size_t was a specific number of bits.  But how many?  Nowadays, 64 bits is sufficient.  When C was being designed initially, 32 would have been considered more than enough.  16 may even have been a possible value.  In any case, there'd be no real way to expand the number.

Now, what should you do?  There are really only three alternatives I can see:
1. Make an assumption that it is a single size, say 64 bits, that is at least as large as your largest platform.  If you do this, PLEASE use boost to add a static assertion to your code.  #include <boost/static_assert.hpp> and then do BOOST_STATIC_ASSERT(sizeof(size_t) * CHAR_BIT <= 64);
2. Use autoconf scripts to determine the size and handle it presumably using #ifdef statements.  Use static assertions to handle cases you hadn't thought of.
3. Use some other method of sending the data which does not require knowing the length in advance, which may not be practical in your case.
Chris in Edmonton Send private email
Tuesday, September 06, 2005
 
 
Good calls Secure.  The first one, make the result a parameter, is painful.  It makes checking the value one more line of code, which means it's even less likely to get used.  Not that anyone checks error values anyway...

Excellent observation on the large allocation.  Here's my first pass at fixing that one:

if(len < size)
{
  size_t read = 0;
  size_t remaining = size - len;
 
  while(read != remaining)
  {
    char temp[1024];
    readThisMuch = min(remaining - read, sizeof(temp));
    int r = ReadFully(temp, readThisMuch);
    read += readThisMuch;
    if(r != 0)
      return -1;
  }
}

Chris:  I'm just dealing with the compiler warnings.  I also typedef'ed my own INT32, et. al for non-Windows platforms, and they're ensured with my own static_assert that I implemented today.
Ryan Phelps Send private email
Tuesday, September 06, 2005
 
 
Ryan,

  while(read != remaining)

You should change this to

  while(read < remaining)

just to be on the safe side. And you should make this a subfunction

  int ReadAndIgnore(SOCKET s, size_t len)

or similar, it may be useful later on.

"It makes checking the value one more line of code, which means it's even less likely to get used.  Not that anyone checks error values anyway..."

It is not your problem if anyone does not check the flags you provide (besides using their programs...). But it is your problem if
- you do not provide any error checking at all, or
- you use parameter types that can't express the full range of possible values, or
- you do not check for overflow if you can't change the previous point for whatever reason.

Mixing types is always a bad idea. Look at the coordinate handling in the Win32 API. It is a mess. This function uses int, that function uses LONG, the WinProc encodes it into 16 bits into wParam, and so on. You better never use coordinates above 32.000, else you'll run into problems.

The same thing here: Your network protocol uses 32 bits unsigned to encode the length. int is expected to have at least 32 bits, thus you can use 31 because of the sign. size_t is expected to have at least 32 bits, but it is unsigned, thus the full range can be used. int and size_t are guaranteed to only have at least 16 bits by the Standard, but you won't compile it on a system that provides less than 32 bit, of course.

Nonetheless, you better never use string lengths above 2^31 - 1 (2 GB), else you will run into trouble with your different types.

Your smallest type defines your limits. You better stick to them, no matter what the other types provide.
Secure
Wednesday, September 07, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz