The Joel on Software Discussion Group (CLOSED)

A place to discuss Joel on Software. Now closed.

This community works best when people use their real names. Please register for a free account.

Other Groups:
Joel on Software
Business of Software
Design of Software (CLOSED)
.NET Questions (CLOSED)
TechInterview.org
CityDesk
FogBugz
Fog Creek Copilot


The Old Forum


Your hosts:
Albert D. Kallal
Li-Fan Chen
Stephen Jones

switch() vs. lookup table?

Looking at the Python source revealed that Python's interpreter is really just one giant switch() statement in a for() loop:

http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Python/ceval.c?rev=2.238.2.6

I would have thought the pragmatic thing to do would be to put each operation in its own routine, then build a lookup table array where byte codes are used as indices to function pointers to the corresponding routines:

(*opcode_lookup_table[POP_TOP])(...);

At the very least I figured it would be more efficient, but I looked at some old postings on the Python newsgroups and they claimed they could get better performance by moving around the cases (most common at the top) and avoiding the overhead of a function call in the process.

I was told over and over again to always use tables if possible, if for no other reason than efficiency. What's the deal?
disenchanted with lookup tables
Monday, November 01, 2004
 
 
Many people have been playing at writing faster and faster VMs for corewars ( http://www.corewar.info )

Currently, the fastest way found is a local static array of computed gotos, which is something like you'd hope a compiler would make, but are best hand-tweaked.
i like i
Monday, November 01, 2004
 
 
Most compilers implement switch statements as lookup tables.
Mr Jack
Monday, November 01, 2004
 
 
That's why case statements have to be constants.
Mr Jack
Monday, November 01, 2004
 
 
Prime example of people wasting time trying to optimize the least significant thing. The time spent in the "switch" itself is likely far less than what each case block is actually doing (some contain multiple function calls, each of which are calling other functions, etc, etc) so that's the least of their problems.

I once worked with a guy who made a custom structure for a time-consuming function, which he stuffed with the three parameters "so we'd only have to pass one pointer, which is  quicker." Except within that function he had a huge nested FOR loop that iterated through an array of typically 100,000 elements in O(n^3), which he saw no problem with... yea, but the call into the function was "quicker!"
Bill
Monday, November 01, 2004
 
 
"Currently, the fastest way found is a local static array of computed gotos, which is something like you'd hope a compiler would make, but are best hand-tweaked."

On a pentium system, it's usually much faster to use compares and branches, because an array of gotos (jump table) is guaranteed to stall the instruction cache, whereas with compares you get the benefit of the branch predictions. That's why the MS compiler implements it as compares, and not a jump table, if you're compiling for pentium.

But trying to implement your own jump table on a pentium system would be faster only if you had a LOT of "cases" (say, dozens or hundreds), AND they were all fairly likely to occur, in which case the cache is likely to stall anyway no matter what you do, meaning you're wasting all those comparisons. They'd have to profile it to know for sure, but if some of those "case's" are much more frequent than the others, it's likely faster to leave it as a switch and put the more likely ones at the top (if using Microsoft's compiler, in descending order of likelyhood).
Joe
Monday, November 01, 2004
 
 
Don't forget that the Python interpreter has to run on a LOT of different architectures and OS's. Several significant optimizations have been deliberately left out of the interpreter specificially because on some other systems, it actually slowed things down.
Chris Tavares Send private email
Monday, November 01, 2004
 
 
I often start with a switch and then it's just not worth the effort to change it. So there may not be a good reason.
son of parnas
Monday, November 01, 2004
 
 
Perl uses function pointers for every opcode.

Honestly, the Python people probably don't know what the hell they're doing. If you have dozens of cases, no way is that faster than a function pointer table on most platforms and most compilers, expeially the most common ones.

They're probably don't have enough time to fix it, or they like the switch() logic too much. Python is dog-slow because of stuff like this; much slower even than Perl.

Go read some of those posts where they talk about the performance benifts from moving the cases around. If they just got rid of them all together, it would be a non-issue.
lookup tables rock
Monday, November 01, 2004
 
 
"Honestly, the Python people probably don't know what the hell they're doing. If you have dozens of cases, no way is that faster than a function pointer table on most platforms and most compilers, expeially the most common ones."

And you've backed your hunch up with tests and such, right? Don't insult someone's intelligence or ability unless you've done some research, or you look like an ass.

The Python people *do* know what they're doing. This has been tried: http://python.org/sf/693638 , and it was 15% slower.
scruffie Send private email
Monday, November 01, 2004
 
 
> But trying to implement your own jump table on a pentium
> system would be faster only if you had a LOT of "cases"
> (say, dozens or hundreds), AND they were all fairly likely
> to occur

yeap, in a CW VM there are typically about 8000-ish cases, although each program typically uses only about 50 of those.

And switch statements are the 'legacy'; computed gotos are typically about 200% faster, and have not yet been fully explored enough.  I think most VMs use computed gotos?  Clearly not Python lol
i like i
Tuesday, November 02, 2004
 
 
Use Code Profiling (measing the time spent for executing your code. it is available in most compilers. I use it in VC++). However, Make code profiling on a both a switch and the corresponing lookup table, you will discover that the time is very close because compilers switch implements as hashtables.
Hany Grees Ayoub Send private email
Tuesday, November 02, 2004
 
 
"And you've backed your hunch up with tests and such, right? Don't insult someone's intelligence or ability unless you've done some research, or you look like an ass."

I have done research. Lookup tables are almost always faster than giant switch statements unless the compiler optimizes one into the other. This is basic CS. Why do we use hash tables instead of linear searches?

"The Python people *do* know what they're doing. This has been tried: http://python.org/sf/693638 , and it was 15% slower."

From that link:

"Quick results: after the rewrite, Python was about 15%
slower. *I think a bunch more optimizations could be
made, but I think the best case would be to rival the
existing speed--so I stopped.*" (Emphasis added.)

As others have pointed out, switches often get optimized into lookups, so the performance is usually comparable.

I never said the Python people were stupid, but I don't want new coders putting giant switch() statements in their code instead of using lookup tables because "that's the way Python does it." The Python codebase is messy, and whether you like it or not, it is very, very slow.

(Parrot VM uses lookup tables, BTW)
lookup tables rock
Tuesday, November 02, 2004
 
 
Can you quantify "very, very slow"? Why do you think (your) perceived slowness of Python is solely due to its use of a large switch() statement?

Remember, you're comparing two different *languages*, not just different implementations of the same one.
scruffie Send private email
Tuesday, November 02, 2004
 
 
"Can you quantify "very, very slow"? Why do you think (your) perceived slowness of Python is solely due to its use of a large switch() statement?"

I never said it was due solely to the switch statement. (Though most interpreter time is spent there.) The fact is that they're relying on compiler optimization to pick up the slack for poor programming practice. If the compiler doesn't optimize that switch statement into a look up table, it takes scores of comparisons to finally reach the right case.

And Python *isn't* fast.

It's slower than Perl, it's slower than Java, it's slow plain and simple, and it shouldn't be. For what it does, it should be at least as fast as Perl and comparable with Java, but it's neither. This is no big secret. The speed of Python has been bemoaned over and over again on the mailing lists.
lookup tables rock
Tuesday, November 02, 2004
 
 
>I never said it was due solely to the switch
>statement. (Though most interpreter time is
>spent there.)

Most interpreter time is spent there -- if you include branch time, right?  That stands to reason because that switch statement represents the root of all branches.  But when looking at profiler output in this case, to determine where to optimize, it's probably better to ignore the 'eval' function here and look at times for the immediate branches from it.  The 'eval' function will always have some (in the optimal case, negligible) higher average time than the average of the internal functions it dispatches to.  But is it an icecap on top of a mountain or an iceberg on top of a pebble?

I know I'm just stating the obvious here, but if I was working on optimizing the Python interpreter that's probably the first thing I'd look at (highest average-times for internal functions other than 'eval').
Kalani Send private email
Tuesday, November 02, 2004
 
 
" Prime example of people wasting time trying to optimize the least significant thing."

Yes and no.  In a VM, you typically call tiny functions that each do very little.  It's the not the call/return that hurts you as much as the preamble and clean-up routine for each function often being a significant percentage of the execution time. This has been researched to death, especially by people who've implemented threaded interpreters.

That said, the Python implementers have intentionally targeted clarity and reliability over speed (threaded code is nasty in C, and you have to rely on gcc extensions to do it right).  There's no fundamental difference between a switch statement and a table of function pointers.  Quite possibly the switch statement is faster, because the compiler has more information available to it.
J. Send private email
Tuesday, November 02, 2004
 
 
"That said, the Python implementers have intentionally targeted clarity and reliability over speed"

Which is exactly their problem!

The Perl 5 codebase is frightening and next-to-unreadable, but damn is it fast.
lookup tables rock
Tuesday, November 02, 2004
 
 
Just a question for those x86 assembly folks out there, rather than just sitting around during the pipeline stall, could you not use this idle time to do a few computations in parallel?

That is, instead of writing:

while (1) {
  int token = *token_ptr++;
  TokenFunc *func = func_table[token];
  func();
}

where the processor can't know what line 4 will do before having executed line 3, you could write (if your bytecode format allows for it):

int token = *token_ptr++;
TokenFunc *func = func_table[token];
while (1) {
  int next_token = *token_ptr++;
  next_func = func_table[next_token];
  func();
  func = next_func;
}

where the call to func() can be predicted right after the first statement in the loop.
itch me
Wednesday, November 03, 2004
 
 
"Don't forget that the Python interpreter has to run on a LOT of different architectures and OS's."

As do Perl and Java...
David Burch Send private email
Wednesday, November 03, 2004
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz