/prog/ - Mac Thread Ripper

Name: Anonymous 2010-05-16 20:51

I was directed here to get one. I'm looking for something that'll take the images from a thread and download them a la 4chan downloader.exe

I use Firefox but will switch to chrome/safari if needed for some bizarre reason. If ,"payment" is required I can start and /s/ thread

Name: Anonymous 2010-05-16 20:53

RIP MY ANUS

Name: Anonymous 2010-05-16 20:55

I will if you do this for me.

Name: Anonymous 2010-05-16 20:56

THREAD MY ANUS

Name: Anonymous 2010-05-16 20:56

Just learn Ruby, get TextMate, and write it yourself.

Name: Anonymous 2010-05-16 20:57

Shit man I leave in a day I don't have time for that.

Name: Anonymous 2010-05-16 21:03

Pretty sure both Firefox and Opera can do this if you're browsing 4chan with them. If for whatever reason you want something standalone, you'll have to write it yourself (there's probably hundreds of these utils written already: some people posted some on /prog/ not long ago if you're willing to look). There's also true ENTERPRISE imageboard archivers like http://code.google.com/p/fuuka/ if you want to clone boards live as posts are updated, as well as track deleted posts and even allow users to post their own stuff. I remember writing my own dumper years ago, but I never ended up using it since my browser is really enough.

Name: Anonymous 2010-05-16 21:04

Here's some simple FIOC script I made four score and seven years ago. It rips all images from a thread into a predefined directory (by default it's "/Downloaded"; you can change it to whatever you want) and it can also delete any duplicates.

yes it's shit i know



#!/usr/bin/env python



import re

from urllib import *

import os

from hashlib import md5

import glob

import sys



thread = None

if len(sys.argv) > 1: thread = sys.argv[1]



def dupeChecker(directory = "/Downloaded/"):

    deleteBool = input("Delete known dupes? 0 for no, 1 for yes: ")

    hashArray = []

    inputArray = []

    knownDupes = []

    for image in glob.glob("%s*"%directory):

        # Reads the data contained in a file and gathers an md5 hash,

        # then appends the hash and filename to two arrays

        try:

            contents = file(image).read()

            m = md5()

            m.update(contents)

            currentHash = m.digest()



            hashArray.append(currentHash)

            inputArray.append(image)

        except:

            continue    

            

    # Cycles through the hash arrays, searching for matches

    for i, item in enumerate(hashArray):

        for j, item2 in enumerate(hashArray):

            if item == item2 and i != j and item2 not in knownDupes:

                print "Dupe found: ", inputArray[i], "==", inputArray[j]

                knownDupes.append(item)

                knownDupes.append(item2)

                # If the user chose to, it'll delete the dupe

                if deleteBool == 1:

                    print inputArray[j], "is being deleted..."

                    os.remove(inputArray[j])



def imageDownloader(thread):

    j = 0

    alreadyDownloaded = []

    if thread != None:

        threadName = str(thread)

    else:

        threadName = raw_input("\nInput a thread URL: ")    



    # Loads a page, then divides it up along quotation marks

    page = urlopen(threadName).read()

    page = page.split('"')



    # Fetches the image directory

    # Start by splitting the URL at each /, then cut it off when it reaches 'res' (which marks the thread number)

    imageDir = threadName.split('/')

    for word in imageDir:

        if "res" in word:

            i = imageDir.index(word)

    # Join them and add 'src' to get the image directory

    imageDir = '/'.join(imageDir[0:i])+"/src/"

    if "boards" in imageDir:

        imageDir = imageDir.replace('boards', 'images')



    # Now search a page for anything in the image directory, then download it

    for imageName in page:

        # Looks for something starting with the imageDir and ending with .jpg

        # It fills in the space [0-9]* between it with whatever image number

        # [0-9] will match any digit, and * tells it to keep repeating

        # A . tells it to look for anything, so we need to escape it with a \ to tell it to just use a period

        # After the period, it'll look for any char [a to z], and search for a series of 3 with {3}

        if re.search("^%s[0-9]*\.[a-z]{3}$"%imageDir, imageName):

            if imageName not in alreadyDownloaded:

                try:

                    print imageName, " has been downloaded"

                    image = URLopener()

                    # Saves images to the Downloaded folder

                    image.retrieve(imageName, "/Downloaded/"+imageName[-13:])

                    alreadyDownloaded.append(imageName)

                    j += 1

                except: pass

    print "%d images downloaded successfully"%j



imageDownloader(thread)



doCheck = input("Would you like to search for dupes? 0 for no, 1 for yes: ")



if doCheck == 1:

    dupeChecker()

Name: Anonymous 2010-05-16 21:05

someone posted an example of doing this with wget not long ago

Name: Anonymous 2010-05-16 22:49

What happened to DON'T HELP HIM? Let's not encourage this imageboard effluent.

Name: Anonymous 2010-05-16 22:59

If you have not set up you're mac so that double-clicking an exe makes wine happen, maybe you should use a wii instead.

Name: Anonymous 2010-05-16 23:23

You're suggestions all suck compared to

http://mrfreeze.github.com/ThreadWatcher/

Name: Anonymous 2010-05-17 0:09

>>12
You're website's button's are fucked up.

Name: Anonymous 2010-05-17 0:17

>>12
The wget suggestion was better than that.

Name: Anonymous 2010-05-17 0:36

http://dis.4chan.org/read/prog/1273470116/14

Name: Anonymous 2010-05-17 3:02

>>11
The Wii is nice. Maybe you should get some friends.

Name: air max shoes 2010-07-23 11:01

Name: StonedOnAJAX 2010-07-23 17:58

I wrote this a while back, it's in PHP but can run locally if you have Web Sharing enabled on your Mac and you edit the config to support PHP (It's already installed, just remove a # in httpd.conf).

It basically goes through a page and finds all links to stuff with the start of the URL as http://image.4chan.org/

Puts it all in a temporary folder then BZIP2's it up and sends it to the browser as a download (I designed it for a remote server)..

Give it a try and see what you think: http://dl.sassybox.net/chan/4Chan%20PHP%20Thread%20Ripper.zip

Name: Anonymous 2010-12-22 5:27

Name: Anonymous 2011-02-03 6:01

Don't change these.

Name:		Email:

Entire Thread Thread List

Mac Thread Ripper

1 Name: Anonymous 2010-05-16 20:51

2 Name: Anonymous 2010-05-16 20:53

3 Name: Anonymous 2010-05-16 20:55

4 Name: Anonymous 2010-05-16 20:56

5 Name: Anonymous 2010-05-16 20:56

6 Name: Anonymous 2010-05-16 20:57

7 Name: Anonymous 2010-05-16 21:03

8 Name: Anonymous 2010-05-16 21:04

9 Name: Anonymous 2010-05-16 21:05

10 Name: Anonymous 2010-05-16 22:49

11 Name: Anonymous 2010-05-16 22:59

12 Name: Anonymous 2010-05-16 23:23

13 Name: Anonymous 2010-05-17 0:09

14 Name: Anonymous 2010-05-17 0:17

15 Name: Anonymous 2010-05-17 0:36

16 Name: Anonymous 2010-05-17 3:02

17 Name: air max shoes 2010-07-23 11:01

18 Name: StonedOnAJAX 2010-07-23 17:58

19 Name: Anonymous 2010-12-22 5:27

20 Name: Anonymous 2011-02-03 6:01

Name: Anonymous 2010-05-16 20:51

Name: Anonymous 2010-05-16 20:53

Name: Anonymous 2010-05-16 20:55

Name: Anonymous 2010-05-16 20:56

Name: Anonymous 2010-05-16 20:56

Name: Anonymous 2010-05-16 20:57

Name: Anonymous 2010-05-16 21:03

Name: Anonymous 2010-05-16 21:04

Name: Anonymous 2010-05-16 21:05

Name: Anonymous 2010-05-16 22:49

Name: Anonymous 2010-05-16 22:59

Name: Anonymous 2010-05-16 23:23

Name: Anonymous 2010-05-17 0:09

Name: Anonymous 2010-05-17 0:17

Name: Anonymous 2010-05-17 0:36

Name: Anonymous 2010-05-17 3:02

Name: air max shoes 2010-07-23 11:01

Name: StonedOnAJAX 2010-07-23 17:58

Name: Anonymous 2010-12-22 5:27

Name: Anonymous 2011-02-03 6:01