Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

masterbloodfer.org leecher (python)

Name: Anonymous 2006-08-30 18:47

Here's a simple program to leech doujins off of Bloodfer's new site.

Simply pass the script the URL of the thumbnail page of the doujin you want and it will leech all the pages and save them to a subdirectory.

Example: leech "http://www.masterbloodfer.org:100/172x29-08-2006/%5BHAPPY%20WATER%5D%20Colorful%20Bleach%20(BLEACH)/indc.html";

This requires python and wget. It should work on Windows too if you add wget to your path or hard-code the path to wget(?).

Here it is:

#!/usr/bin/env python

import os, sys, urllib, re, subprocess

try:
    url = sys.argv[1]
except IndexError:
    print 'Usage:    leech url'
    sys.exit()

try:
    print 'Fetching index page...'
    page = urllib.urlopen(url)
except ValueError:
    print 'Invalid URL:    %s' % url
    sys.exit()

lines = re.split('\n', page.read())

url_root = re.sub('\w+.html', '', url)
urls = []

for line in lines:
    match = re.search('<H1 class="TITLE">(.+)</H1>', line)
    if match:
        doujin_name = match.groups()[0]
   
    match = re.search('<FONT SIZE="2" COLOR="#FFFFFF">([a-zA-Z0-9_\-\+\(\) ]+\.\w+)</FONT><BR>', line)
    if match:
        image_name = match.groups()[0]
        image_url = url_root + image_name
        urls.append(image_url)

os.mkdir(doujin_name)
os.chdir(doujin_name)
f = open('list.txt', 'w')
for item in urls:
    f.write('%s\n' % item)
f.close()

user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)'
p = subprocess.Popen(['wget', '--user-agent=%s' % user_agent, '--input-file=list.txt', '-nd', '--referer=%s' % url])
p.wait()

os.remove('list.txt')
print '\n'

Name: Anonymous 2006-08-31 5:36

>>1
use wget, noob.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List