/prog/ - Tagging system

Name: Anonymous 2012-03-19 15:06

'sup /prog/

Say I'm starting up an image site of sorts, and want to implement a tagging system. How would I best go about doing that?
It needs to support multiple tags per item, preferably not limited in length, be capable of performing multiple types of searches, be user-editable even after initial placement, I want to be able to display the amount of items that use a certain tag, etc. etc.

I already found this, but I'm not sure if that's the want I want to go.
http://forge.mysql.com/wiki/TagSchema#Recommended_Architecture

Enlighten me /prog/.

Name: Anonymous 2012-03-19 15:37

I'd almost suggest XML, even though it'd gobble memory and text-parsing requirements, but hey, what're you doing with multiple processor anyway?

A database could also work, since there'd probably be mandatory fields, like original upload name, stored name, upload date, last-modified date if you support stuff like runetranslations, other stuff, but this would be per-entry, not per-image.

If you reserve spaces like 0-15 for mandatory fields, then you could use a sparse tag tree for 128 tags, stored as elements of an array or something.

Might even be faster to mandate 0-31, as you could have like source type, source series/name, characters in image, etc. Then users could search different things, not strictly tags. You'd then also be able to index all the previous entries names/values, so when someone was indexing something new, you wouldn't need like 6 entries for variants on MLP you could just have it from a drop-down

Name: Anonymous 2012-03-19 15:44

Yeah, I'm mostly looking for a database-based solution. Other item-related variables such as title and author would be stored in there, too.

Not sure I quite follow you (3rd and 4th paragraph). Not extremely well-versed in this area yet, but hey, it's supposed to be one big and fun learning experience anyway. I just don't want to gunk out a bulky, slow system is all.
So, mind to elaborate?

Thanks a bunch!

Name: Anonymous 2012-03-19 15:56

The third and fourth paragraph would be focusing on the searching aspect.

Instead of searching tags in particular, which would have to check EVERY tag, you could have all the relevant tags for a search located in a certain box.

Like if location 8 was the location of the upload date, then you could search for images by date, and only looking in slot 8 would get you all those images, if you sorted on say a day basis, which would be much much faster than searching all possible tags.

Similarly, having an author box would permit the grabbing of all authors, with searchable or even indexed entries, so someone could search like "Noill" and get all his entries immediately.

It might be even more relevant to got out say the most practical option would be for all non-tags, you create specific extra documents as your index itself, so your searcher doesn't have to search the whole database for known things, it can just search via that like 1-100kb file. You'd want to make an automated sorter probably, so like every time there's changes made the entries are modified, and you can garbage collect against uploaded images every once in a while by holding uploads til the process is finished.

Name: Anonymous 2012-03-19 16:00

That's taking it a step further than I intended on going, but sounds pretty efficient. I'll look into that some more, and you may hear from me again if I bump into something.

Thanks a lot!

Name: Anonymous 2012-03-19 16:14

>>5
Something worth noting, is from the computer science perspective, once you make the relevant documents for "necessary" fieldsdata, with one entry per image (whether or not it has a value initially), then you don't need a database which holds any of those values, if your updater/cleaner/etc properly manages the indexed entries.

In short, you only need a database like thing for the tags, and since the tags aren't organized, you could use something as simple as a text parser where like each line of a txt is an image entry (which you can save say 8 entries for numbers and just start at position 9 or 10 so never scanning the number itself unless necessary) then each successive tag could be separated by like && then you can just check each line (starting at position 9) for (a) matching term(s), grab the number if there's a match, etc.

Then when you need to present the information, you grab the relevant information from all the separate documents and present it. Like you know, stored image number vs original filename, stored image number vs update time, etc.

The most brutal aspect would be reserving a certain amount of dataspace for each possible entry, like 256 or 512 bytes for tags, 32 or 40 characters for author, various sizes, then you wouldn't even need image numbers, as the image could be stored solely as it's index in the "database". The numbering of images would be convenient for storage/filesystem naming requirements though.

Name: Anonymous 2012-03-19 17:47

You guys here on /prog/ are way too smart. I'll be back when I've learned more.

Name: Anonymous 2012-03-19 17:50

>>2,4,6
Holy fuck, that was hard to read. I don't want to make you feel bad, but that explanation could surely be improved.

Name: Anonymous 2012-03-19 23:11

>>8
I'm ADHD, strongly disorganized and unmotivated. Damnit Jim I'm a computer scientist not a programmer!

Name: Anonymous 2012-03-20 2:19

Read SICP to learn about system design.

http://mitpress.mit.edu/sicp/full-text/book/book.html

Name: Dubs Guy 2012-03-22 13:51

DUBS, DUBS EVERYWHERE!

Tagging system

1 Name: Anonymous 2012-03-19 15:06

2 Name: Anonymous 2012-03-19 15:37

3 Name: Anonymous 2012-03-19 15:44

4 Name: Anonymous 2012-03-19 15:56

5 Name: Anonymous 2012-03-19 16:00

6 Name: Anonymous 2012-03-19 16:14

7 Name: Anonymous 2012-03-19 17:47

8 Name: Anonymous 2012-03-19 17:50

9 Name: Anonymous 2012-03-19 23:11

10 Name: Anonymous 2012-03-20 2:19

11 Name: Dubs Guy 2012-03-22 13:51