I've been trying to write a little forum application that scales to million of threads in a category and million posts in a thread. The idea is that viewing a category or thread on any page will only take n+const key lookups in a database (like berkeleydb) where n is the number of items (threads or posts) on the page.
Implementing threads and posts as double linked lists makes it really easy to view the oldest and newest items (and adding new items) but obviously pagination doesn't work at all and also deletion is going to be a bitch because you have to "find" the item first.
I've been thinking about adding "shortcuts" to every 10th, 100th, 1000th, etc item to the next 10th, 100th, 1000th, etc item, but then deleting becomes very very expensive because you possibly have to update thousands of shortcuts. Is there a solution that doesn't suffer from having one really bad operation (or requires in-memory indices)? Or maybe a n+log(items)+const solution...
This is just a thought experiment about data structures but it has been driving me kinda nuts lately :p
Name:
Anonymous2008-04-27 16:03
>>1
linked lists, so i didn't learn much more about it until after she told me to stop replying to these posts, for the cudders! <a href="http://dis.4chan.org/read/prog/1206639378/">croma</a> gtfo for the cudders! gtfo for the cudders! gtfo for each type of block, the loop checking function, and the loop change offset, like this. block blocks[] = &ret->lines; /* in case of gnu emacs had been numbered "1.x.x", but i think it's sending some garbage data
>>7 SECONDLY, THIS IS /prog/ DO NOT DEMAND USEFUL ANSWERS THE WAY YOU WANT THEM TO BE.
You have no idea how long I've been waiting for an opportunity to use this line.
Name:
Anonymous2008-04-27 16:53
>>7
Note that I very carefully avoided saying ``use a relational database'' -- just learn more about their structure and implementation. Doing so will help you better design what you are trying to accomplish.
Yeah, I read that right, I know about B trees and whatever. The problem is a bit different though, I'm trying to set up a data structure on top of an key/value datastructure (which itself will probably be implemented as a btree).
>>12
Obviously not, Google App Engine neither provides a BerkleyDB backend, nor does the infrastructure support using one (no script-accessible writable files).
Name:
Anonymous2008-04-27 19:41
>>12
In FIOC there's a "list" data type built in, what >>1 is considering is obviously low-level.
BTW, looking at FIOC's lists implementation may be useful.
Name:
Anonymous2008-04-27 19:49
>>4
I can't imagine a linked list implementation that is not fixed-size. Even if you store "variable sized" strings, you are actually storing a pointer the address of that string, which is often the same size of an int.
Well, you store the next/previous item inside the value, so you grab an item by it's key (let's say post53) and in the value there's the post ids to the next and previous post.
Listen to >>19, don't bother deleting threads and just have a field to mark them as deleted.
Name:
Anonymous2008-04-29 20:00
>>23
Or reuse the positions, changing the thread id.
Name:
Anonymous2008-04-29 20:15
>>24
Or fuck the positions and change the next/prev pointers of the next/prev threads.
Name:
Anonymous2008-04-29 20:34
You aren't going to find a suitable hierarchical DBMS for this. The relational functionality of an RDBMS isn't really meant for this. So the DB implementation is moot.
You need to focus on the structure of your data. And your best option is a modified preorder tree traversal.
Each node in your hierarchy stores a left and a right value. Together these numbers specify where in the tree the node is. You can use simple arithmetic to figure out all kinds of info on the node.
If a nodes right number = 1 + left then it has no children.
Everything with a left and right number between a given nodes left and right number is a descendant of the node.
The number of decedent is (right – left - 1) / 2
The path to the node is all nodes with a left number less than the nodes left number and a right number greater than the nodes right number.
And so on...
Updates to nodes are tricker, and if you are following you will realize there is some seeming inefficiencies in it. But there are ways of lessening that. And given the incredible efficiencies of look-ups, it should be appropriate for this implementation.
Name:
Anonymous2008-04-29 22:15
I was thinking once along the lines as >>1. I wondered if a linked list of fixed length arrays would avoid having any really bad operations. The hope would be that array length could be large enough to make the whole thing efficient but small enough that 'bad' operations on the array are still doable (like insertion to the middle).
jesus fuck you fucking nigger twats any self-respecting RDBMS can handle this stupid shit fine without you having to wrangle you fucking brains to figure out how to bastardize the incorrect data structure for the job.
``SELECT p.* FROM posts p WHERE p.thread_id=4085 ORDER BY p.post_id LIMIT 99 OFFSET 5837''
Obviously you're not going to be able to display all million posts of a thread at once, but that's going to break other shit (ie, your fucking pipes) before the DB collapses.
If you're seriously in a situation where you're generating enough content to NEED such a system, you're ass is gonna be fucking toast -- you're not going to be able to handle the entire fucker on a single machine. Fuck, just look at the goddamn storage requirements of such a system:
Get your goddamn head out of the clouds and just fucking implement something, then see how well it performs, THEN try to figure out goals. Don't just bullshit numbers and then decide to use a stupid ass bullshit datastructure you learned about in Computer Science III. Read SICP.
I don't give a shit if IHBT and IHBTE -- I'm drunk as shit right now and you niggers are the closest thing I have to a goddamn friend in this stupid fucking world. fuckking sage.
Name:
Anonymous2008-04-30 5:53
you niggers are the closest thing I have to a goddamn friend in this stupid fucking world
;_;
>>39
RDBMSs are basically hash tables anyway (ISAM trees).
Or B-trees, which are actually worse, since binary trees have O(n) worst case performance. So you'd be better off sticking with MyISAM
ITT, kids who don't understand the practicalities of implementing this. And who don't understand what a modified preorder tree traversal is, or why its the best solution.