Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Yet another lame image dumper

Name: Anonymous 2011-09-03 18:57

And here's yet another lame script for dumping images off 4chan.. Since we probably don't have enough of them already.

Made it over the weekend.. Why?
Well, first of all, I was bored.
Second, all other I could find kinda sucked
Third.. I love python, and if its not made in python, its useless :D

URL : http://bayfiles.com/file/Y6J/uBEivF/dump_4chan.zip (24kb)
URL2: http://cf4a7g.1fichier.com/ - no wait, but in cheese language

Command line script, tested on linux and windows.

Feedback wanted. Good feedback especially wanted.

Name: Anonymous 2011-09-03 19:13

Go back to the imageboards, please. Your shit is of no use here.

Name: Anonymous 2011-09-04 14:16

My shit, eh? Awfully defensive today, don't you think?

Name: Anonymous 2011-09-04 14:32

>>1
1fichier.com
Baise en congé, ``pédé-tempête'' !!

Name: Anonymous 2011-09-04 14:38

Looks like an insult.. Let me guess, if it was .ru site, you'd be convinced I was russian, eh? Because, no one can use a site in an unknown language? Stop insulting your own intellect, mate

Name: Anonymous 2011-09-04 14:42

>>4
Pitoyable.

Name: Anonymous 2011-09-04 15:27

The dots (.) in your regular expressions are special characters, and will match any character, but it looks like you only want them to match dots.

Name: Anonymous 2011-09-04 15:37

Great point :) Somehow I always forget to escape them. Usually I'll notice right away, but not this time.

Thanks for the comment!

Name: Anonymous 2011-09-04 18:01

Does it post images or download them? If it's the latter then it's a scraper, not a dumper. A dumper would be something which posts images.

Name: Anonymous 2011-09-04 18:05

Well, it depends on how you look at it :p For me, "get an url, store all images from there in that folder" is a dumper.

It's the old upload/download discussion. But yeah, "scraper" might be a bit less ambiguous in this context.

Name: Anonymous 2011-09-04 18:39

>>10
The technical term is reacher-grabber.

Name: Anonymous 2011-09-04 19:21

>>10
A dumper is something that sends images, a scraper retrieves them.

I've been playing with a dumper for a few days that will hopefully be able to break CAPTCHA. I just need to get tesseract (an OCR engine) set up and then add that into my code.

Name: Anonymous 2011-09-04 22:08

>>12
A dumper is a large ass, and so are you.

Name: Anonymous 2011-09-05 1:01

>>12
Good luck, buddy. Don't come crying here when you fail though!

Name: Anonymous 2011-09-05 4:51

Considering the reCAPTCHA project is based on parts of old books where industry-strength OCR fails hard, and "crowdsource" the digitalization of those books...

Yeah, using some random open source OCR program will work well for this. Can't imagine what the problem could be. Your plan is full of win!

Name: Anonymous 2011-09-05 5:33

import threading

Finally grabber written someone who is not a complete moron.
You earn one upboat.

self.L.debug("Image already exist! Skipping")

I would asked Content-Length from server here and checked size with saved image. If sizes are different, then image should be saved.

Name: Anonymous 2011-09-05 5:48

True, the fetcher is still rather stupid. It could f.x check if the last image displayed in a thread on the index already exist on the disk before requesting a full thread dump,

and it doesn't handle other than the main page of index (not page 1, 2, 3 and so on). The last part is just laziness by me, tho.

And the Content-Length part is a very good idea, thanks :) Does 4chan server give HEAD? Will look into that when I get home.

Stil adding some small features here and there (added a "minimum images" limit option for when threads should be downloaded - this could also be done way smarter), and refactoring / restructuring code to make it easier to read and work with.

Hmm.. Your content-length idea could be combined with "only if image is less than or bigger than X size" options.

Any more ideas for simple functions and features to put in? Or better ways to structure the code?

Name: Anonymous 2011-09-05 6:43

>>17
How about this: Stop typing like a retard and learn to form coherent paragraphs. The pretense of ``intelligence'' you like to flaunt around would look more realistic that way.

Name: Anonymous 2011-09-05 6:54

>>18

I am sorry, my good sir. I am just writing as I think. Any pretense, or lack of, is entirely inside your own judgemental head.

Name: Anonymous 2011-09-05 7:13

>>17
Does 4chan server give HEAD?
too lewd; didn't read

Name: Anonymous 2011-09-05 15:23

Does 4chan server give HEAD?

Sure it does.

Name: barbour mens classic duffle 2011-12-01 22:44

There are many brands of <a href="http://www.barbourjackets-uk.org/"><strong>barbour fusilier</strong></a> in the market today. Each of the brand promises to bring out something new to the customers.

Name: Anonymous 2011-12-02 1:34

>>15
You know, one of the words is from a book and the other is generated. You only have to write the generated one correctly, it doesn't matter if you type in ``nigger'' for the one from a book.

Name: toms shoes sale 2012-02-16 4:21

You will find that <a href="http://www.cheaptomsshoessaleusa.com/">toms shoes</a> are comfortable to wear While you can get <a href="http://www.cheaptomsshoessaleusa.com/">cheap toms</a> at low price online. Maybe <a href="http://www.cheaptomsshoessaleusa.com/last-chance-c-3.html">toms last chance shoes</a> will be your favorite.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List