Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

hash occurance

Name: Anonymous 2010-07-29 23:14

Could some people with good pc run this code and say if they got any output.
I want to see how many times same hash is generated.
http://codepad.org/lsVJ2rfM

Any other ways to test it are welcome

Name: Anonymous 2010-07-29 23:26

word_lenght

Name: Anonymous 2010-07-29 23:31

>>1
Good lord Sepples is ugly

I'm glad I don't have to program professionally

Name: Anonymous 2010-07-30 0:20

Even making allowances for the language, this is ugly code. It's particularly cheeky to request people ``with good pc'' when you're doing things like using strings where enums would be appropriate.

Name: Anonymous 2010-07-30 0:25

The same hash is generated just about as often as you expect, for a decent hash function.  That doesn't mean that you've got a decent hash function, it just means that this particular test won't distinguish your hash function from a decent one.

However, your testing code is terrible.

First, you're testing for all hash collisions, which is NOT what you want.  You should discard any collision where both strings are identical, since OF COURSE they'd have the same

Second, let's get rid of that terrible quadratic algorithm you've got checking for hash collisions.  You know what? You could use a hash table instead, *ahem*, since you've already computed hashes on your strings.  This will give you a speed bump from "my god it never finishes" to "dear lord it might actually finish before I die".

Third, let's get rid of the allocations all over the place.  That "std::string" you use everywhere? Ditch it.  Use an array of "char" instead.  Notice that your names are never longer than 30 characters?  Well, then use an array of 31 char instead of "std::string" and suddenly you're not allocating all over the place.  This will give you a nice constant factor speedup bringing the program from "damn this program takes a long time" down to "okay actually that's pretty fast".

Fourth, let's do some minor tweaks.  Array of "char *"?  Remove an indirection by using "NAME_PARTS[64][4]" instead.  Preallocate all the memory.  Etc.

http://codepad.org/X1gTR0NJ (Mind you, use C99 and turn on optimizations.  If you don't have "stpcpy" you'll have to -D_GNU_SOURCE or rewrite it if that doesn't work).

After a glorious 977 ms, the program has found 132 collisions in 1048576 names.  How many collisions would you expect given a decent hash function?

Well, it's 32 bits wide, so the chances of any pair colliding, assuming an ideal hash function, are 2^-32.  In 2^20 names, there are about 2^39 different pairs of names.  When you multiply the two, you get 2^7, or 128 collisions.  How many collisions does the program find?  132.

Name: Anonymous 2010-07-30 0:45

The real question though, is:

[size=18]WHY ARE YOU IMPLEMENTING ANOTHER DUMB HASH FUNCTION‽‽[/size]

There are already a bunch of nice ones out there that people have already written, tested, and released as unrestricted free source code.  They're also much, much faster than yours.

Here, go use this hash function instead:

http://burtleburtle.net/bob/c/lookup3.c

Name: Anonymous 2010-07-30 1:19

>>6
I can think of a few reasons. The best being for the learning experience. Beyond that, it's things like tinfoilhattery about the NSA/CIA and it bottoms out somewhere around general ignorance / NIH syndrome (aka willful ignorance.)

Name: Anonymous 2010-07-30 1:39

>>7
The problem is everyone and their dog implements a new hash function, and 99% of them don't know jack shit about how to properly test them.  And this has NOTHING to do with the SHA family of hashes -- those are cryptographic hashes, which have a different set of design criteria.

Name: Anonymous 2010-07-30 2:26

//On the name of ALLAH and may the blessing and peace of Allah
//be upon the Messenger of Allah Mohamed Salla Allahu Aliahi Wassalam.
//Author : Fouad Teniou
//Date : 29/07/10
//version :0.8

Name: Anonymous 2010-07-30 8:35

>>5
I thoroughly enjoyed your post, but only read it once. Please keep posting, though, by all means!

Name: Anonymous 2010-07-30 8:43

>>8
Why is this a problem again?

Name: Anonymous 2011-02-03 4:41

Name: Anonymous 2011-02-03 8:22

Don't change these.
Name: Email:
Entire Thread Thread List