Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

FYI

Name: Anonymous 2012-01-19 9:21

Thanks to a scraper and an image database nearing 1TB, I have discovered that 60% of the images on /b/ are reposts.

Name: Anonymous 2012-01-19 9:27

>>1
stop the presses

Name: sage 2012-01-19 9:32

I thought it would have been greater than 60%.

And using an image db instead of an image checksum db is an idiotic idea.
You should feel bad.

Name: Anonymous 2012-01-19 9:34

>>3
Oh, My! Saging in the Name field!!
Now, I feel bad!

Name: Anonymous 2012-01-19 9:45

>>3
I have both. It checks against the md5-checksum provided in the 4chan HTML before downloading.

Name: Anonymous 2012-01-19 10:25

The other 40% are slighty altered images from spambots.

</thread>

Name: Anonymous 2012-01-19 13:54

>>1
Is this database publicly accessible?

Name: Anonymous 2012-01-19 19:08

>>7
No. I will consider opening it for you in exchange for sexual services from a relatively hot female, or a server to host it on.

It doesn't take that long to make, though. Just get a scraper (I wrote my own in Haskell, but you could base yours off Taro's) and let it run for a few months and you're there.

Name: Anonymous 2012-01-19 20:23

>>8
Your going to get v& for CP.

Name: Anonymous 2012-01-19 20:34

>>9
to get vet?

Name: Anonymous 2012-01-19 20:51

Scrape my anal dubs <<<<<<

Name: Anonymous 2012-01-19 22:05

>>3
I thought it would have been greater than 60%.
It very likely is. Unless >>1 went back far enough, his number is going to be smaller. How far back is enough? The data >>1 has can probably give you a decent approximation.

Name: Anonymous 2012-01-20 0:24

>>10

v& -> v-and -> vanned -> ambushed and taken away in a van

Don't change these.
Name: Email:
Entire Thread Thread List