The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Function pointer performance

Hey all,
  I have written a C library to iteratively operate on some large datasets (30 million pixels).  The library is, I guess, pseudo-object-oriented.  I have a lot of structures that contain function pointers such that the entire thing resembles a C++ class (albeit stupider).

  What originally started off as a tiny little library has mushroomed into a massive, general analysis toolkit.  I'm starting to worry about the performance penalty of a 20-million function-pointer calls versus 20-million normal function calls.

  Here's where being a physicist and not a computer scientist bites me: I can write a tool to measure the performance, but that's not really the question that interests me.  What I want to know is, can I somehow *guess* what the performance penalty will be?  Does the machine have to look up the corresponding function every single time the pointer is dereferenced or is it intelligently cached somehow?  Does the compiler (gcc) optimize the calls somehow such that the performance hit would be negligible?  Or was it a profoundly stupid idea to put the function pointers everywhere just so the code would be a bit easier to read?

  I appreciate any insights anyone can give me into how to think about this problem.  Thanks!
Naru Situ Send private email
Thursday, August 30, 2007
 
 
If you want to really know, measure. That way you'll get an idea of how bad it is, relative to all the other stuff you are doing.

As to performance hit, a function pointer is just that, a pointer to the function. The code jumps to the address given in the pointer. It's a load from wherever the pointer is stored, but it's about the same as a virtual function call in C++.

Yes, it's slower than a directly linked function call, but I assume that you used function pointers for things that you'd need to change at runtime, so you couldn't have done it with direct linkage anyway.

So, in short, as long as you needed them to change behaviour, the function pointers are plenty fast. If you didn't need them, then they are hurting you, but not that much.
Michael Kohne Send private email
Thursday, August 30, 2007
 
 
The history of Software Engineering had demonstrated time and again that people are really bad at 'guessing' at the impact of things.

Measure it.  Characterize it.  Then you'll know the scale of the problem.

It's possible GCC does a function call in 500 uSecs, in which case taking 1 mSec per call might not be a problem -- or it might, it depends on your application.
AllanL5
Thursday, August 30, 2007
 
 
If you're using gcc, you can use gprof to profile it.  No need to write your own.  It's fairly easy - you compile with gcc using some extra flag (which escapes me), then run your program and it dumps out a performance data file, which you then read with gprof and it makes a nice table of which functions are being called, how long they take, etc.

And +1 for everyone who said "don't guess, measure."  If there's one rule of software performance, that's it.  If there are two rules of software performance, the other is "don't worry about it until you've measured and found a particular problem."
Michael G Send private email
Thursday, August 30, 2007
 
 
It's been a long time since I looked at stuff at this low a level, but IIRC, calling a function pointer basically takes two instructions to make the jump instead of one.  This follows dozens of instructions to load parameters from memory and push them onto the stack.  Basically, the overhead is minuscule compared to the overhead of calling any function at all. 

Don't sweat it - there are plenty of high-performance libraries that use function pointers as callbacks and so forth, and they really don't suffer on modern hardware.
Iago
Thursday, August 30, 2007
 
 
> I appreciate any insights anyone can give me into how to think about this problem.  Thanks!

If the function is non-trivial then the time required to invoke it (whether directly and/or via a function pointer) is probably negligible compared to the time it takes to run it.
Christopher Wells Send private email
Thursday, August 30, 2007
 
 
Calling through function pointers in C is not appreciably slower than direct function calls. There is a lot of work in a function call that is independant of the funciton pointer (basically building the stack frame for the called function and tearing it down at the end). I'd be very surprised if function calls through a pointer were more than 10% slower than direct calls (for non-trivial functions with several parameters).

As for the compiler converting calls through function pointers into direct calls, that shouldn't be possible. The reason you have used function pointers is so that the function to be called is determined at runtime rather than at compile time, so the compiler can't know what function to call (hence can't convert to a direct call). Now, most modern processors, however, probably do a pretty good job of predicting the jump target for your function call, even through a function pointer (assuming that you aren't changing the jump target on every call, which it sounds like you aren't). Modern branch prediction mechanisms, especially on x86 machines, are pretty sophisiticated.

This is essentially the same question as how much of a hit virtual functions are in C++ (or method calls in dynamically bound languages like Java or Objective C) represent. The general consensus is that the speed hit is small compared to the rest of the program and is well worth it (since you can't do runtime subroutine selection without it). Unless you have specific profiling data showing that the function pointer calls are your bottle neck, don't worry about it (but you should be profiling your code, so you know where the real bottle necks are).
Jeffrey Dutky Send private email
Thursday, August 30, 2007
 
 
Forget about function pointers for now. You need to run your code through a profiler. That will show where the hotspots are and you can then focus your attention on improving those areas.
Neville Franks Send private email
Thursday, August 30, 2007
 
 
The function call itself should not take more time.  A "function" is really just a pointer to a section of code.  A "pointer to a function" is also a pointer to a section of code, but you can change it in the program.  However, if you are using a pointer to a function that is stored in a structure, finding the specific structure, locating the element holding the function pointer, and possibly setting a new value if the function to be called changes are what will make the whole process longer.

I don't know what you can compare, because you would have to restructure your code.  My "guess" is that you are wasting your time worrying about this issue.  20 million times a couple of load and store operations is not much.  If your program is slow, follow the advice you have already gotten.  Profile it.
EMF
Thursday, August 30, 2007
 
 
Loading a function pointer takes a few instructions - I find it hard to imagine it would have much impact for 20 million invocations, even if there were all cache misses (going to slow main memory).

If it's too slow that's not the cause.
frustrated
Thursday, August 30, 2007
 
 
Yes, function pointers are relatively slow.  Depending on how they're used, the overhead of using a function pointer versus a direct call can be enormous. 

A function pointer call takes more cycles than a direct call.  The significance of this is going to depend on the total number of cycles your algorithm is using per function pointer call.  For example, you might have a function pointer for GetPixel that retrieves each pixel.  If your algorithm only is only doing a few cycles per pixel, you're going to take a noticeable hit from the extra cycles from the function pointer call.  If your algorithm is doing 1000 cycles per pixel, the function pointer call won't matter much relatively. 

You'll take the biggest performance hit in cases where a direct call can be inlined.  With the GetPixel example, you might be doing some basic array lookup to return the pixel value that would be more efficient if inlined.  By calling through a function pointer, you eliminate the possibility of the compiler inlining this function.  Some of the biggest performance hits that I've seen caused by function pointers result from this.

Modern processors are sophisticated but calling through a function pointer can still adversely affect things like branch prediction.  Don't assume that your compiler and processor are doing magical optimizations. 

I agree with the suggestions to run it through a profiler -- though some profilers get confused by function pointer calls or make it difficult to identify function call overhead.

By the way, I say this as someone who has actually done this sort of optimization work rather than someone making a bunch of uneducated guesses.
SomeBody Send private email
Thursday, August 30, 2007
 
 
According to the book effective c++, c++ function objects (aka functors) are more efficient than function pointers. Apart from being statefull and object oriented and type safe functors have the advantage that the operator () method can be inlined whereas you cannot inline functions called through function pointers.

But as everybody says profile first.
nullptr
Thursday, August 30, 2007
 
 
I just reread the OP's question:

"I'm starting to worry about the performance penalty of a 20-million function-pointer calls versus 20-million normal function calls."

Let's do a back of the envelope calculation.

Assume I have a 1 GHz processor (yours is possibly faster):
1 GHz = 1,000,000,000 Hz
1 CPUcycle = 0.000 000 001 s

20 million = 20,000,000
20 million CPU cycles = .02 s

Now suppose function pointer calls are really, really sucky.  Suppose, for instance, they take 100 CPU cycles to execute.  Then they would add 2 s to your time.

I doubt they're actually 100 CPU cycles, but maybe.  Or maybe 10 (that sounds like a more reasonable number to me) - 0.2s.  Or maybe I'm wrong and they're 1000 - 20s.

If you're doing realtime graphics, 0.2s might be okay, but 2s would be noticeable.  If you're doing some kind of batch job that calculates all night, adding 20s to it might not matter.  It all depends on what you're doing.

And finally, in an unrelated point: if you're really concerned about squeezing clock cycles, buy a copy of the Intel C compiler.  Change "gcc *" to "icc *" in your makefile.  We got 10% performance increase on already-optimized code just by doing that.
Michael G Send private email
Thursday, August 30, 2007
 
 
1. Profile the code, don't guess.
2. www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf - This is the official "Technical Report on C++ Performance". Pages 25 and 26 should provide some information.
to_be_defined
Friday, August 31, 2007
 
 
As SomeBody already has pointed out, the performance penalty doesn't result so much from the extra instruction to load the function pointer as such.

But a call through a function pointer disables the optimisation of the execution pipeline of any modern CPU. The branch prediction doesn't work, and neither does the speculative execution of the code after the branch. This means that essentially the complete execution pipeline is cleared after the function pointer call. Depending on your CPU architecture, especially its pipeline length, this effect can easily impact performance by a factor of 10-100 for the execution time of the call instruction only.

But of course I agree with everyone else: you'll have to measure to know whether this effect will be noticed at all.
kisch Send private email
Friday, August 31, 2007
 
 
After you clearly identified it as a bottleneck, by measure not by guessing, but now you should know this ;), usually you should begin with the algorithms and data structures, not with the low-level stuff. E.g. instead of getting one pixel per function call, you could use a buffer and get a complete row of pixels per function call.
Secure
Saturday, September 01, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz