techInterview Discussion

A part of techInterview.org: answers to technical interview questions.

Your host: Michael Pryor

Hash table implementation

Hi guys, I have this question and the solution.

Find the FIRST non-repeating character in a string.

Example: h h a a p p y x y z
Result : x (because y occurred twice, z is the SECOND non-repeating character)

I know the solution is to use a hash table such that:
h[key] = value

key = character
value = # of occurences

So we will have:
h[h] = 2
h[a] = 2
h[p] = 2
h[y] = 2
h[x] = 1
h[z] = 1

This requires 2 scans, 1st for the whole string and insert into hash table when h[key] does not already exist. The 2nd scan is to scan through the hash table to see which character/bucket has the first "1" in value.

In C/C++, how to create a hash table like this? Is there a built-in function or how to code something like this?

Many thanks.
nicky Send private email
Friday, March 17, 2006
 
 
If you're working exclusively with characters, then you have a range of 0..255 and you can just use an array with 256 entries to hold the frequency of each character.

I would probably iterate through the characters once to establish frequencies:

for (int i=0; i<len; i++)
  freq[str[i]]++;

then once more through the same characters, to find the first character with a frequency of 1:

for (int i=0; i<len; i++)
  if (freq[str[i]] == 1)
      return str[i]; // bingo!
Mr. Powers Send private email
Friday, March 17, 2006
 
 
As far as implementing a hashtable in C, you'd have to do it from scratch. But I'm not convinced you need a hash table at all. A bit array seems a better fit.

You can actually do this with a single scan through the string, if you start at the end of the string. Of course, in C, you need to scan through the string forwards in order to find the end (reason #14 why I hate C).

So, start at the end of the string, and scan towards the front. Use a bit array to keep track of which character values have occurred so far, and when you get to the beginning, the last "not seen before" value is the value you need.
Mark Bessey Send private email
Friday, March 17, 2006
 
 
OKay, WHat about finding the first non-repeating charater? Is there a better method than using a bit array?
Rave Hanker Send private email
Friday, March 17, 2006
 
 
Mr. Powers, yes it's only characters, are they only 256 characters only?

h h a a p p y x y z

freq [h] = 2
freq [a] = 2
freq [p] = 2
freq [y] = 2
freq [x] = 1
freq [z] = 1

Isn't this hash table?

Or you're using the ascii code of these characters?

(i'm just making up the ascii)
freq [21] = 2
freq [34] = 2
freq [45] = 2
freq [46] = 2
freq [50] = 1
freq [60] = 1

but after this when you scan through this freq array. suppose there's a "b" in the end of the string (hhaappyxyzb) and ascii of "b" is up front. when doing the sequential scan of freq and match for == 1 wouldn't "b" be returned first?

(although you can always keep another value with freq that increments every time a char is read, so when you read freq array you only need to find the smallest of this value with freq[n] == 1)




Mark, I am not sure I fully understand. Can you give me an example if possible?

Thank you guys for replying!
nicky Send private email
Saturday, March 18, 2006
 
 
nicky, yes, you index with the ascii value:

freq['b'] ++;

you don't iterate over freq[0]...freq[N]. You iterate over freq[str[0]] (the first character in the string) ... freq[str[N]] (last character in the string).

Going backwards as suggested is even better.
Mr. Powers Send private email
Sunday, March 19, 2006
 
 
This solution may be less acceptable for the Unicode case. Why do we even need hashtables? Here's a solution that can be adapted to Unicode (with some modifications to handle 3-byte characters):

// using char, though you should use wchar for Unicode
char FirstFirstNonRepeating(char *s)
{
    int len = strlen(s);
    if (len <= 1) return s[0];
    
    int index = 1;
    int char_count = 1;
    char current = s[0];
    do {
        while (index < len && s[index] == current) {
            ++index;
            ++char_count;
        }
        
        if (char_count == 1) {
            return s[index - 1];
        }
        else if (index >= len) {
            return '\0';
        }
        else {
            current = s[index];
            ++index;
            char_count = 1;
        }
    } while (true);
}
RaviSu Send private email
Wednesday, March 22, 2006
 
 
RaviSu wrote:

        if (char_count == 1) {
            return s[index - 1];
        }

There is no guarantee the repeating chars are adjacent.
Mr. Powers Send private email
Wednesday, March 22, 2006
 
 
Whoops, I misunderstood the problem. So the scan from the end  algorithm appears to be an elegant solution.
RaviSu Send private email
Thursday, March 23, 2006
 
 
I'm not sure I understand how scanning from the back may work better.

So you scan each character, flip the bits in the bit array. What should be done next?

Thanks.
PP Send private email
Thursday, March 23, 2006
 
 
You could create a linked list of the values.
This would give you a worst case of O(n).
Although each of the operations would be more expensive.
You could insert the values and modify values accordingly.
AJ
Thursday, March 23, 2006
 
 
I'm back...

Sorry for the wild-goose chase. The "Scan it from the other end" idea came to me in a flash of insight, after not getting enough sleep several nights in a row...

Prior experience shows that these are about 75% likely to be brilliant insights, and 25% likely to be completely bogus. Guess which one this was?

For fun, here's an implementation that doesn't use an external array to count characters:

char firstunique(const char *str)
{
  const char *candidate = str;
  while(*candidate != 0)
  {
    int is_unique = 1;
    const char *p = str;
    while(*p != 0)
    {
      if ((p != candidate) && (*p == *candidate))
      {
        is_unique = 0;
        break;
      }
      p++;
    }
    if (is_unique)
    {
      break;
    }
    candidate++;
  }
  return *candidate;
}

This has different performance characteristics than the (perhaps more-obvious) character-counting version, but it can also work with Unicode strings, if you change the variable definitions.

If I get a chance, I'll update this with the bit-vector version when I get a chance.
Mark Bessey Send private email
Thursday, March 23, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz