Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

motherfucking hahstables

Name: Anonymous 2010-06-14 10:34

Hello /prog/

Im trying to implement a hash-table algorithm, the problem is: i dont know shit about hash-tables.

Now i might seem noobish for asking, but can someone explain them to me, Wikipedia didnt help a dick.
All i could find out was that the key has to be encoded in some way.

On a side note, why the fuck is it called a linked list? Isnt it obvious that it is linked, else it wouldnt be a list would it.

Name: Anonymous 2010-06-14 11:07

Hashtables are reasonably trivial, but why do you want to implement them without understanding them. If you just need one, just use the one your language provides, or get a hashtable library.

Anyway, a linked list is either:
a) end of the list, NIL
b) a 2 element structure (which I will call a cons, with two fields, first being the car and the second being the cdr). One element, the car, traditionally points to the first element of the list, and the other element, the cdr, points to the rest of the list.

So a list like (1 2 3), can be written as:
(cons 1 (cons 2 (cons 3 nil)))
where (cons car cdr) = (car . cdr).
In non-lispy, languages, linked-lists tend to be reinvented a lot and people may put more fields in the linked-list type (what I called the cons cell), but there will always be a next field.

Linked lists are usually used to store lists of arbitrary length. Adding a new element in front is O(1), traversal is O(n), adding an element at the end is O(n) (unless, you make it a tconc, which stores a pointer to the end of the list, then it becomes a O(1)). There's also doubly linked lists which allow moving backwards from one element, but they're not used as often (there's another field which points back to the previous structure).

A hashtable is usually a data type which allows you to index into it by some data (either arbitrary or of some specific type) called the key and get a value associated with that key. They work internally by having a hash function, which when applied to the key, will give a hash (usually an integer of sorts). A hash should represent an object uniquely, so hashes will usually be designed to either test for object equivalence (they point to the same place, so hash the pointer), or object equality (same as hashing all the object data, doing a deep compare), or maybe some string hashing functions (case insentivity options), and so on. While the hash will usually be different for different objects, it doesn't have to be (collisions can occur and usually do, especially if the hash function sucks or the key size is too small), so there's also an equality function that comes together with the key function, it tests if 2 objects are equal given the criteria the user picked.

The internal representation of the hashtable can vary, but here's one possible internal representations (of many):
1 - hash function
2 - equality test
3 - value vector (could be a resizable vector of a certain size... when more values are added past its capacity, resize it)
4 - keyhash vector (runs parallel to the value vector, maybe always sorted to allow binary searches, but doesn't have to be)

To look up a value, you, hash the key, do a binary search for the hash in the keyhash vector, once found, use that index to look up the value in the value vector. Each value vector element is a linked list of values, and each value in this linked list would have the same hash. One then proceeds to iterate through the linked list until one finds the element which is equal(according to the equality test), and returns that element (it could even be a key-value pair, if you choose to store it). There's really many variations possible here. Another common one would be to reduce the hash size to something manageable (0x100 or 0x10000), don't have a hash vector, and the hash is used as an index in the value vector, after which you just look what you need up. Each approach has its advantages/disadvantages...

So why do you need to implement your own? Even if I'm using a language that doesn't have hashtables, I just pick a good hashtable library and just use that.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List