Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

(Double) Linked Lists

Name: Anonymous 2008-04-27 16:01

I've been trying to write a little forum application that scales to million of threads in a category and million posts in a thread. The idea is that viewing a category or thread on any page will only take n+const key lookups in a database (like berkeleydb) where n is the number of items (threads or posts) on the page.

Implementing threads and posts as double linked lists makes it really easy to view the oldest and newest items (and adding new items) but obviously pagination doesn't work at all and also deletion is going to be a bitch because you have to "find" the item first.

I've been thinking about adding "shortcuts" to every 10th, 100th, 1000th, etc item to the next 10th, 100th, 1000th, etc item, but then deleting becomes very very expensive because you possibly have to update thousands of shortcuts. Is there a solution that doesn't suffer from having one really bad operation (or requires in-memory indices)? Or maybe a n+log(items)+const solution...

This is just a thought experiment about data structures but it has been driving me kinda nuts lately :p

Name: Anonymous 2008-04-29 20:34

You aren't going to find a suitable hierarchical DBMS for this. The relational functionality of an RDBMS isn't really meant for this. So the DB implementation is moot.

You need to focus on the structure of your data. And your best option is a modified preorder tree traversal.

Each node in your hierarchy stores a left and a right value. Together these numbers specify where in the tree the node is. You can use simple arithmetic to figure out all kinds of info on the node.

If a nodes right number = 1 + left then it has no children.
Everything with a left and right number between a given nodes left and right number is a descendant of the node.
The number of decedent is (right – left - 1) / 2
The path to the node is all nodes with a left number less than the nodes left number and a right number greater than the nodes right number.
And so on...

Updates to nodes are tricker, and if you are following you will realize there is some seeming inefficiencies in it. But there are ways of lessening that. And given the incredible efficiencies of look-ups, it should be appropriate for this implementation.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List