Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

masterbloodfer.org leecher (python)

Name: Anonymous 2006-08-30 18:47

Here's a simple program to leech doujins off of Bloodfer's new site.

Simply pass the script the URL of the thumbnail page of the doujin you want and it will leech all the pages and save them to a subdirectory.

Example: leech "http://www.masterbloodfer.org:100/172x29-08-2006/%5BHAPPY%20WATER%5D%20Colorful%20Bleach%20(BLEACH)/indc.html";

This requires python and wget. It should work on Windows too if you add wget to your path or hard-code the path to wget(?).

Here it is:

#!/usr/bin/env python

import os, sys, urllib, re, subprocess

try:
    url = sys.argv[1]
except IndexError:
    print 'Usage:    leech url'
    sys.exit()

try:
    print 'Fetching index page...'
    page = urllib.urlopen(url)
except ValueError:
    print 'Invalid URL:    %s' % url
    sys.exit()

lines = re.split('\n', page.read())

url_root = re.sub('\w+.html', '', url)
urls = []

for line in lines:
    match = re.search('<H1 class="TITLE">(.+)</H1>', line)
    if match:
        doujin_name = match.groups()[0]
   
    match = re.search('<FONT SIZE="2" COLOR="#FFFFFF">([a-zA-Z0-9_\-\+\(\) ]+\.\w+)</FONT><BR>', line)
    if match:
        image_name = match.groups()[0]
        image_url = url_root + image_name
        urls.append(image_url)

os.mkdir(doujin_name)
os.chdir(doujin_name)
f = open('list.txt', 'w')
for item in urls:
    f.write('%s\n' % item)
f.close()

user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)'
p = subprocess.Popen(['wget', '--user-agent=%s' % user_agent, '--input-file=list.txt', '-nd', '--referer=%s' % url])
p.wait()

os.remove('list.txt')
print '\n'

Name: Anonymous 2006-08-30 19:34

sorry, but i don't want to download python, it is full of aids

Name: Anonymous 2006-08-30 20:53

you're sorta wasting our time here... FAG

Name: Anonymous 2006-08-31 4:43

Fetching index page...
Traceback (most recent call last):
  File "masterborg.py", line 34, in ?
    os.mkdir(doujin_name)
NameError: name 'doujin_name' is not defined

Name: Anonymous 2006-08-31 5:36

>>1
use wget, noob.

Name: Anonymous 2006-08-31 6:14

>>5
truth

Name: Anonymous 2006-08-31 10:16

>>5 >>6
But in the end it does not really matter.
Oh, wait, I do not git it

Name: Anonymous 2006-08-31 14:49

>>1
>>4
Dude, your script fails with error 400 - Bad request.
It seems like urllib is to blame here.

Name: Anonymous 2006-08-31 15:13 (sage)

#!/usr/bin/env python

import os, sys, urllib, re, subprocess

try:
        url = sys.argv[1]
except IndexError:
        print 'Usage:    leech url'
        sys.exit()

user_agent = 'Mozilla/4.0 (compatible); MSIE 7.0b; Windows NT 6.0)'


try:
        print 'Fetching index page...'
        p = subprocess.Popen(['wget', '--user-agent=%s' % user_agent, '--output-document=tmp.html', '-nd', '--referer=%s' % url, '%s' % url])
        p.wait()
except ValueError:
        print 'Invalid URL:    %s' % url
        sys.exit()

page = file('tmp.html', 'rt')
lines = re.split('\n', page.read())
page.close()
os.remove('tmp.html')

url_root = re.sub('\w+.html', '', url)
urls = []

for line in lines:
        match = re.search('<H1 class="TITLE">(.+)</H1>', line)
        if match:
                doujin_name = match.groups()[0]
        match = re.search('<FONT SIZE="2" COLOR="#FFFFFF">([a-zA-Z0-9_\-\+\(\) ]+\.\w+)</FONT><BR>', line)
        if match:
                image_name = match.groups()[0]
                image_url = url_root + image_name
                urls.append(image_url)

os.mkdir(doujin_name)
os.chdir(doujin_name)
f = open('list.txt', 'w')
print urls
for item in urls:
        f.write('%s\n' % item)
f.close()

p = subprocess.Popen(['wget', '--user-agent=%s' % user_agent, '--input-file=list.txt', '-nd', '--referer=%s' % url])
p.wait()

os.remove('list.txt')
print '\n'

Name: Anonymous 2010-06-27 15:14

my homework is to read the first chapter of SICP: Can someone do that for me please so I don't have to??

Name: Anonymous 2010-06-28 10:58

beware the army of 12 year old autistics

Name: Anonymous 2011-02-04 17:45

Name: Anonymous 2011-03-28 7:58

How did I writed code tags?

Name: Sgt.Kabu䧨kiman왢篐 2012-05-28 20:01

Bringing /prog/ back to its people
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List