We all realize this is set-up, so please just gtfo. You ask a question you've answered a priori. Even the answer post might be written a priori; but not necessarily. What you gain from this, is that you're able to associate your stupid junk with programming posts, thus giving the illusion that programmers here actually talk about this bullshit (your stupid shit).
>>1
so the idea is that the input to this program will be an infix ('nu'), and it should return the words / indices which contain that infix ('anus' and 'nucleus'?)
not really gigabytes, as a test just a 1.2mb file with 100.000 lines. building an index of
infix1->pos1,pos2,...
...
uses around 150mb. which huge. there are two million infixes.
Name:
Anonymous2009-09-16 8:25
>>12
obviously for this method, you would be obtaining indices of matches as needed. if you absolutely most be storing all possible infixes, then use a suffix-tree.
too slow if there are a lot of matches, and if you use a short (2-3 letters) query.
Name:
Anonymous2009-09-16 22:26
>>4
You doubt suffix trees use less space than brute force storage? I wasn't able to verify, but I'm pretty sure it's O(input size). Your algorithm is O(2^(input size)). Suffix trees win.
Yeah but when searching for short patterns (length m) with many matches (n) it will be slower: O(m+n) Also: The tree has many many many nodes and leaves (especially if you allow unicode code points instead of just a-z), which results in huge memory consumption. If you have ten thousand different unicode code points, the first node will have ten thousand children. Good luck with that.