The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Embarrassing STL question

If I use an STL .size() in a loop conditional is the compiler smart enough to work out that there is no code in the loop that changes the size of the vector and optomise it away or will it do the function call everytime.

And if the container is a vector isn't the .size() call just to a precalculated value anyway?
Shlemiel the painter Send private email
Monday, February 05, 2007
 
 
I believe the STL guarantess that all size() operator calls will return in constant time. I know this it the case for the SGI implementation of map, vector, and list. Any implementation that doesn't do this is stupid anyway. :)
I believe it is all constant time Send private email
Monday, February 05, 2007
 
 
Yes, is sure that the compiler optimize this away. But you should still not do this since as you asked yourself this and even lost time to write this question, any other programer follwoing you will may start thinking at this also, and you will make him losing time...and this will happening over and over again.
Megabyte Maniac Send private email
Monday, February 05, 2007
 
 
I always try and code for readability (since I am probably going to be the poor sod trying to read it )

for (int ii=0;ii<blah.size();ii++) { }

Is better than some bunch of iterators, bind2nd junk and functions called "find_last_but_one_not_of"

Of course I'm old and grey and once knew Fortran, although I like lambda funcs in Python,
Martin Send private email
Monday, February 05, 2007
 
 
No guarantee that any compiler will try to understand what you're doing in the loop so that it can optimize away the size call.  (Note that this check can become a lot more complex if you're dealing with threaded code.)

For a specific compiler you can either ask customer support or examine the assembly output to see what it is doing.
Perl Solution
Monday, February 05, 2007
 
 
> ... is the compiler smart enough to work out that ...

My guess would be, "In general, no, especially if the body of the loop is calling any non-inlined function ... but the size() method can be inlined and is therefore cheap".
Christopher Wells Send private email
Monday, February 05, 2007
 
 
There is some misinformation in this thread.  The compiler cannot "optimize out" the call to size() since the body of the loop may change the size of the object.  size() is guaranteed to be constant time for vector, but not for all containers.  It doesn't make sense for list.size() to be constant time since you can insert arbitrarily many elements into the middle of a list; if list.size() were guaranteed constant time, there would have to be code to recompute the size each time an insertion is made.

IMO the best way to write a loop construct of the sort you want is
  for (i=0, the_size = foo.size(); i < the_size; i++) {...}

the_size is computed once, and as a side benefit is available for easy viewing during debugging.

Will
Will Dowling Send private email
Monday, February 05, 2007
 
 
Interesting Will, I always use "for (int i" to protect the index from being used outside the loop but I hadn't though of putting the size declaration inside.

Perhaps an STL container is not an ideal example but with something like a string.length() the compiler could easily check that the length was changed inside the loop.
Martin Send private email
Monday, February 05, 2007
 
 
IME, it's unusual to use index-based looping for STL containers.  Much more idiomatic is iterator-based looping, as it works on all containers (not just random access ones).
bmm6o Send private email
Monday, February 05, 2007
 
 
It would be the same issue though: if the test is "it != container.end()", would the compiler know that container.end() won't change and can be cached?
Christopher Wells Send private email
Monday, February 05, 2007
 
 
I think I've seen 'size' implementations that just looped through the list.  Terrible.
StudioTan
Monday, February 05, 2007
 
 
Is the vector const?
Jeff Zanooda
Monday, February 05, 2007
 
 
A sufficiently advanced compiler should be able to move the call to size() out of the loop and inline it, but whether or not it'll actually happen depends on the compiler, optimization options, container, STL implementation, and phase of the moon.  So trust no one and check the assembly, e.g. if you use GCC add command-line option -S and see the .s file.
Jeff Zanooda Send private email
Monday, February 05, 2007
 
 
Martin:

I'm with you about where i should be declared; I meant:

for (int i=0, the_size = foo.size(); i < the_size; i++) {...}

bmm6o:

I used to use almost exclusively iterator-based looping, (if I really needed to loop) but over time I've realized that the argument that you might change from a vector to a list (or whatever) at some point just isn't realistic.  There just aren't that many different container types, and they have very different spectrums of behaviour.  If I'm implementing now with a vector, for example, it's because I need something about vector (linear time random access, usually, or no space overhead for extra pointers).  That requirement won't change over time; so using index syntax doesn't lock me into anything I don't want anyway -- and it's a hell of a lot more readable than iterator declaration/access, at least to me.

StudioTan:

I can't tell if you are being sarcastic.  If you think about it, you *have* to define list::size() as a linear time scan over the list. That's the only way you can get a constant-time splice() method -- and often constant-time splice is the reason you picked the list container to begin with.  (Scott Meyers "Effective STL" discusses this.)  You can only get constant-time size() if you accept linear-time splice().
Will Dowling Send private email
Monday, February 05, 2007
 
 
I mean, vector: (constant time random access) ...

Will
Will Dowling Send private email
Monday, February 05, 2007
 
 
In genreal, I would not assume that the compiler is "smart" enough to optimize the call to size() out. To the compiler, the STL is just another library. What if the size() function had a side-effect? Then the compiler would not be doing what you expected if it optimized it out. For what it's worth I always do it like this:

std::vector< int > V;

//  Poulate the vector.

for(std::vector< int >::const_iterator i(V.begin()), last_i(V.end()); i != last_i; ++i)
{
  //  Do something with *i.
}

-or-

for(std::vector< int >::size_type i(0), last_i(V.size()); i != last_i; ++i)
{
  //  Do something with V[i].
}
A. Nonymous
Monday, February 05, 2007
 
 
...as Will said above.
A. Nonymous
Monday, February 05, 2007
 
 
the size() method is const, so no side effect
Ben Allison Send private email
Monday, February 05, 2007
 
 
"In genreal, I would not assume that the compiler is "smart" enough to optimize the call to size() out. To the compiler, the STL is just another library. What if the size() function had a side-effect?"

Also, what if the code in the loop body affects the output of the size() function? That's probably harder for the compiler to determine, in general.
clcr
Monday, February 05, 2007
 
 
(ref. my previous) Actually, it's impossible in some cases. Consider a loop body that calls a library function whose source is not available to the compiler. Unless the compiler can somehow prove that the library function could not get its hands on a reference to the container, there is no way to know if the container is modified or not.
clcr
Monday, February 05, 2007
 
 
>> if list.size() were guaranteed constant time, there would have to be code to recompute the size each time an insertion is made.

It wouldn't have to "recompute" the size each time but merely increment the stored size by the appropriate amount, in which case size() simply returns the appropriate member variable.  I imagine any sane STL implementation will do this.  I know the Visual C++ implementation does.  I really wouldn't be surprised if this is mandidated by the standard.
SomeBody Send private email
Monday, February 05, 2007
 
 
Will Dowling:
for (int i=0, the_size = foo.size(); i < the_size; i++)

A. Nonymous:
for(std::vector< int >::const_iterator i(V.begin()), last_i(V.end()); i != last_i; ++i)
{
  //  Do something with *i.
}

I think is much more readable to do it like this:
std::for_each(V.begin(), V.end(); DoSomething());
Megabyte Maniac Send private email
Tuesday, February 06, 2007
 
 
> I think is much more readable

And apart from that, because V.end() is a parameter, it will be optimized (ie. only invoked once).
Christopher Wells Send private email
Tuesday, February 06, 2007
 
 
"I think is much more readable to do it like this:
std::for_each(V.begin(), V.end(); DoSomething());"

I don't know that I agree. How does it aid understanding to move the body of the loop off to some other place in the code? Even if DoSomething has a more descriptive name, to find out what that "something" is you have to go find the definition of the function or class. Then, after digesting that, come back to the original loop and continue on.
A. Nonymous
Tuesday, February 06, 2007
 
 
"And apart from that, because V.end() is a parameter, it will be optimized (ie. only invoked once)."

But in both of the given examples, end() and size() will only be invoked once, as well.
A. Nonymous
Tuesday, February 06, 2007
 
 
"I think is much more readable to do it like this:
std::for_each(V.begin(), V.end(); DoSomething());"

Interesting.  Suppose DoSomething() takes an argument -- what does it look like then?  And suppose DoSomething() is a member method.  What does it look like then?  Oh, and please discuss the difference in syntax between the cases where the container holds objects, and where the container holds pointers to objects.  Then we can discuss readability.

SomeBody: You suggest "increment the stored size by the appropriate amount".  Which would be what in a case like
  l.splice(pos, l2, l2_iter1, l2_iter2)?
The "appropriate amount" can only be computed by traversing l2 from l2_iter1 to l2_iter2.  If splice() is to be constant time you can't do that.  This is an oft-discussed topic on comp.lang.c++.  See http://www.thescripts.com/forum/threadnav60248-1-10.html for a summary.
Will Dowling Send private email
Tuesday, February 06, 2007
 
 
"Suppose DoSomething() takes an argument..."

It looks like DoSomething is a functor object (at least that's what I can see from the example). In this case it would be defined like this:

class DoSomething
{
  public:
      DoSomething() {};

      void operator()(const int arg)
      {
        //  Do something with the integer argument.
      }

};

Now, this assumes that you are iterating over a vector of integers. If the vector stored pointers to integers it would only affect the signature of operator(). Now, the functor's constructor could take an argument, which would change the call to something like:

std::for_each(V.begin(), V.end(); DoSomething(some_arg));

What I don't like about this is that it moves the body of the loop away from the loop, thus forcing the casual reader to hunt down the definition of DoSomething.
A. Nonymous
Tuesday, February 06, 2007
 
 
I wasn't thinking about vectors of integers.  I was wondering if Megabyte Maniac could defend his claim of "much more readability" in a realistic case.  Like for instance where my loop looks like
  for (int i = 0, my_size = vf.size(); i < my_size; i++) {
    ...
    vf[i].mf(...)
    ...
  }
where foo is some class, vf is a std::vector<foo> and mf() is a member function of foo.  I'd be interested in seeing both the definition of DoSomething(), and the call to std::for_each().  And, what would be the difference in that call if vf is a std::vector<foo*> (the only difference in my code would be '->' for '.')

BTW I'm a big fan of higher-order operations (e.g. mapping as in this for_each example) in languages like ML.  But to me the attempt to graft this style of programming onto C++ with std::mem_fun_ref(), std::bind2nd(), etc. is just comical in its *un*readability.
Will Dowling Send private email
Tuesday, February 06, 2007
 
 
Martin Send private email
Tuesday, February 06, 2007
 
 
I did!  Thanks, Martin.

Will
Will Dowling Send private email
Tuesday, February 06, 2007
 
 
>> Which would be what in a case like
>>  l.splice(pos, l2, l2_iter1, l2_iter2)?

Uh, this would be an issue with that particularly signature of splice(), not size().  Implementations will obviously have to iterate through the list nodes making this example linear time rather than constant.  The C++ standard specifically states this.  Once you get to the call to size(), you've already taken the linear time hit on the splice() call.  size() is still constant time.  Note that this isn't an issue when splice() takes an entire list since the size is precalculated in this case and can just be added to the stored size of the list being spliced to.  In this case, splice() will be constant time, as expected. 

The link you provided just backs up what I'm saying.  The standard states that size should be constant time and that the above signature of splice should be linear time.  Implementations could do it the opposite way but that wouldn't make much sense so I imagine most do it as the standard recommends.  I know Visual C++ does, I can't speak for any others.
SomeBody Send private email
Tuesday, February 06, 2007
 
 
The stupid thing about STL to me is that

. I find use of it unreadable enough that I hide it's use in classes
. The implementations aren't efficient, aren't thread-safe, whatever.
. It usually isn't exactly what I need anyhow.

...so you end up writing your own low level thingies.
old.fart
Tuesday, February 06, 2007
 
 
Will:
> I wasn't thinking about vectors of integers

Yes, in this cases i prefear something like:
class DoSomething :
    public std::unary_function<WhateverVIsContaining,void>
{
public:
    DoSomething("argumetsList")
        : "membersList"("argumetsList")
    {};

    inline void operator()(const WhateverVIsContaining& argument) const
    {
        DoSomethingFunction("membersList",argument);
    };
private:
    "membersList";
};
ofcourse there is boost::bind which may do all this dirty job for you, see : http://boost.org/libs/bind/bind.html
But if the expresion is getting too complicate i usually prefer the upper approach instead of boost, since it's much more readable.
Megabyte Maniac Send private email
Wednesday, February 07, 2007
 
 
Actually, it is illegal for the size() method to be optimised out in this scenario - it will be called every time. Inlining is a function of your specific compiler and results do vary and are not deterministic. I.e. many compilers may just default to safe (non-inlined) behavriour if they are already "stressed" by being too deep inside a recursive template/method/function/etc.

In code you should always say exactly what you want, not what you mean. There is no intelligent being sitting behind your screen trying to figure out what you meant when you wrote the code. When you say size() then you instruct the compiler to call the function and return the result. If you want the compiler to cache the value and use it instead then that's the code you should write.

ie.
const int containerSize = myContainer.size();

Note that defining it const gives the compiler a (strong) hint that it is allowed to assume the value never changes, and therefore doesn't have to do a load everytime it refers to the variable. Most compilers will now place the value in a register and happily do exactly what you intended (because that's exactly what you asked for).

My 2 cents.
Coopsnakes Send private email
Saturday, February 24, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz