And here's yet another lame script for dumping images off 4chan.. Since we probably don't have enough of them already.
Made it over the weekend.. Why?
Well, first of all, I was bored.
Second, all other I could find kinda sucked
Third.. I love python, and if its not made in python, its useless :D
>>1 1fichier.com Baise en congé, ``pédé-tempête'' !!
Name:
Anonymous2011-09-04 14:38
Looks like an insult.. Let me guess, if it was .ru site, you'd be convinced I was russian, eh? Because, no one can use a site in an unknown language? Stop insulting your own intellect, mate
>>10
A dumper is something that sends images, a scraper retrieves them.
I've been playing with a dumper for a few days that will hopefully be able to break CAPTCHA. I just need to get tesseract (an OCR engine) set up and then add that into my code.
>>12
Good luck, buddy. Don't come crying here when you fail though!
Name:
Anonymous2011-09-05 4:51
Considering the reCAPTCHA project is based on parts of old books where industry-strength OCR fails hard, and "crowdsource" the digitalization of those books...
Yeah, using some random open source OCR program will work well for this. Can't imagine what the problem could be. Your plan is full of win!
Finally grabber written someone who is not a complete moron.
You earn one upboat.
self.L.debug("Image already exist! Skipping")
I would asked Content-Length from server here and checked size with saved image. If sizes are different, then image should be saved.
Name:
Anonymous2011-09-05 5:48
True, the fetcher is still rather stupid. It could f.x check if the last image displayed in a thread on the index already exist on the disk before requesting a full thread dump,
and it doesn't handle other than the main page of index (not page 1, 2, 3 and so on). The last part is just laziness by me, tho.
And the Content-Length part is a very good idea, thanks :) Does 4chan server give HEAD? Will look into that when I get home.
Stil adding some small features here and there (added a "minimum images" limit option for when threads should be downloaded - this could also be done way smarter), and refactoring / restructuring code to make it easier to read and work with.
Hmm.. Your content-length idea could be combined with "only if image is less than or bigger than X size" options.
Any more ideas for simple functions and features to put in? Or better ways to structure the code?
>>17
How about this: Stop typing like a retard and learn to form coherent paragraphs. The pretense of ``intelligence'' you like to flaunt around would look more realistic that way.
There are many brands of <a href="http://www.barbourjackets-uk.org/"><strong>barbour fusilier</strong></a> in the market today. Each of the brand promises to bring out something new to the customers.
Name:
Anonymous2011-12-02 1:34
>>15
You know, one of the words is from a book and the other is generated. You only have to write the generated one correctly, it doesn't matter if you type in ``nigger'' for the one from a book.