## techInterview Discussion |
||

A part of techInterview.org: answers to technical interview questions.
Your host: Michael Pryor |
Are there better ways to do this than the following:
char arr[256] = {0, 1, 1, 2, 1, 2, 2, 3, 1, ...........}; int count_set_bits (int num) { char byte0, byte1, byte2, byte3; byte0 = num & 0xff; byte1 = (num & 0xff00) >> 8; byte2 = (num & 0xff0000) >> 16; byte3 = (num & 0xff000000) >> 24; return (arr[byte0] + arr[byte1] + arr[byte2] + arr[byte3]); } I am aware of the version that uses while loop and shifts, but could take upto 32 iterations in the worst case. thanks.
desperate_coder Wednesday, January 02, 2008
int num_bits_set(int num)
{ int count = 0; while(num) { num = num & (num - 1); count++; } return count; }
dark horse Wednesday, January 02, 2008
The one true method of counting bits comes from MIT Hakmem 169 (assuming 32-bit int):
inline UINT32 Count(UINT32 n) { UINT32 c = n; c -= (n>>1) & 033333333333; c -= (n>>2) & 011111111111; c = (c + c>>3) & 030707070707; return c % 63; } This can be optimized to 11 register-to-register operations, and has no conditional branches to mess up CPU pipelining.
@Dark Horse
That algorithm (Kernighan's?) is just awful! It's both slow and it relies on a subtle math trick so it's not immediately obvious why it works. "Slow and obvious" and "fast and subtle" both have their places, but "slow and subtle" doesn't. The "slow and obvious" solution: UINT32 Count(UINT32 n) { UINT32 c = 0; for( ; n != 0; n >>= 1 ) { c += n & 1; } return c; }
Yeah..but it's certainly faster than yours and not so obvious too!
dark horse Friday, January 04, 2008
I did some quick and dirty testing of all the three functions, and MIT Hakmem 169 was consistently faster than the other two.
Running each function 2^32 times while counting the bits of i (where i is the loop control variable): Q6600, VS2008, Release build MIT Hakmem 169 1x Kerninghan 3x Simple 5x PIII 500 Mhz, GCC 4.1.1, -O2 MIT Hakmem 169 1x Kerninghan 3x Simple 9x
If you lay out your lookup table in a 16x16 grid, you'll see both the first row and column are:
0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 On top of that, if you find the number you're looking for in that grid, and add together the first numbers in its row and column, you'll get the number of bits set in that number. The easiest way to get the row and column index is a simple divide and modulus. For example: # say we’re looking for the number of bits in 243 x = 243 / 16 => 15 (int) y = 243 % 16 => 3 # "a" is the lookup table from above a[3] + a[15] = 2 + 4 = 6 # Are there six bits set in the number 243? 243 = 11110011 # Indeed there are You only need 2^(N/2) entries in your lookup table for an N-bit number, so you can get away with 16 instead of 256. The fastest method is still the fancy one from above, but I figured this one out when looking for patterns like Joel mentioned in his article. I wrote my own article about it a few months ago, which you can read here: http://www.theantichris.dreamhosters.com/2007/11/15/patterns-of-set-bits-in-a-byte/ Most of the code's in Ruby, but it's easy to understand, and there's another pattern to the numbers I go into more detail on in the article. Hope this helps.
A 16-byte lookup table would actually be pretty good. The problem with lookup tables in general is that performance on modern processors is all about pipelining and multiple cache layers, and even a 256-byte lookup table is a problem in that regard (although a 256-byte lookup table is still pretty darn fast for most problems).
A 16-byte lookup table is small enough to stay in the upper cache layers and be fetched very fast. I don't know whether 2 16-byte lookups would actually be faster than 1 256-byte lookup, but it would be an interesting experiment (on a busy system - on a system that's doing nothing else the 256-byte lookup probably wins as there's no cache contention).
Skorj,
The method I mentioned would only need one lookup, since the first row and column contain the same numbers. No need to duplicate them, so you should be able to do it in 16 bytes. By the same method, you could get all of the set bits in a 16-bit integer with 256 bytes, or a 32-bit int in 65536 bytes. If you look at the other pattern I mention in my article, and you're clever enough, you could probably come up with some sort of quadtree algorithm and just calculate the additions you'd need to get there from: 0 1 1 2 I'm not that clever yet.
tried fillding around with this problem.
I came up with that solution that seem a bit more obvious than the hakmem's one. even tho maybe a bit slower. (% ?) long countBit(long n); { n = ( (n & 0x55555555) + ( (n >> 1) & 0x55555555); // possible pair value (0-2): 00 01 10 n = ( (n & 0x33333333) + ( (n >> 2) & 0x33333333); // possible quadbit value (0-4): 0000 0001 0010 0011 0100 n = ( (n & 0x0f0f0f0f) + ( (n >> 4) & 0x0f0f0f0f); // possible byte value (0-8): 00 to 0000 1000 // no more mask here, since the result will simply duplicate on each byte.. mask the final byte n = (n + (n >> 8)); // 0x00 - 0x10 (??10 ??10 ??10 ??10) n = ( (n +(n >> 16)) & 0x000000ff); // 0x0000 to 0x0020 // I dont like de % operator,what the average cycle count for % ? // for 64 bit int .. extend the mask and add 2 binary operation (while moving last mask to end) same for 128bits n = ( (n + (n >> 32)) & 0x000000ff); // possible value (0-64) = 0x00 to 0x40 return n; } I also like the fact that its a constant time. while the loop can be 0 to n iteration and the table lookup use 1 more lookup per byte lookup meaning a >>, a +, a & and an index operation my 2 cents |

Powered by FogBugz