Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

hash occurance

Name: Anonymous 2010-07-29 23:14

Could some people with good pc run this code and say if they got any output.
I want to see how many times same hash is generated.
http://codepad.org/lsVJ2rfM

Any other ways to test it are welcome

Name: Anonymous 2010-07-30 0:25

The same hash is generated just about as often as you expect, for a decent hash function.  That doesn't mean that you've got a decent hash function, it just means that this particular test won't distinguish your hash function from a decent one.

However, your testing code is terrible.

First, you're testing for all hash collisions, which is NOT what you want.  You should discard any collision where both strings are identical, since OF COURSE they'd have the same

Second, let's get rid of that terrible quadratic algorithm you've got checking for hash collisions.  You know what? You could use a hash table instead, *ahem*, since you've already computed hashes on your strings.  This will give you a speed bump from "my god it never finishes" to "dear lord it might actually finish before I die".

Third, let's get rid of the allocations all over the place.  That "std::string" you use everywhere? Ditch it.  Use an array of "char" instead.  Notice that your names are never longer than 30 characters?  Well, then use an array of 31 char instead of "std::string" and suddenly you're not allocating all over the place.  This will give you a nice constant factor speedup bringing the program from "damn this program takes a long time" down to "okay actually that's pretty fast".

Fourth, let's do some minor tweaks.  Array of "char *"?  Remove an indirection by using "NAME_PARTS[64][4]" instead.  Preallocate all the memory.  Etc.

http://codepad.org/X1gTR0NJ (Mind you, use C99 and turn on optimizations.  If you don't have "stpcpy" you'll have to -D_GNU_SOURCE or rewrite it if that doesn't work).

After a glorious 977 ms, the program has found 132 collisions in 1048576 names.  How many collisions would you expect given a decent hash function?

Well, it's 32 bits wide, so the chances of any pair colliding, assuming an ideal hash function, are 2^-32.  In 2^20 names, there are about 2^39 different pairs of names.  When you multiply the two, you get 2^7, or 128 collisions.  How many collisions does the program find?  132.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List