Name: Anonymous 2006-08-30 18:47
Here's a simple program to leech doujins off of Bloodfer's new site.
Simply pass the script the URL of the thumbnail page of the doujin you want and it will leech all the pages and save them to a subdirectory.
Example: leech "http://www.masterbloodfer.org:100/172x29-08-2006/%5BHAPPY%20WATER%5D%20Colorful%20Bleach%20(BLEACH)/indc.html";
This requires python and wget. It should work on Windows too if you add wget to your path or hard-code the path to wget(?).
Here it is:
#!/usr/bin/env python
import os, sys, urllib, re, subprocess
try:
url = sys.argv[1]
except IndexError:
print 'Usage: leech url'
sys.exit()
try:
print 'Fetching index page...'
page = urllib.urlopen(url)
except ValueError:
print 'Invalid URL: %s' % url
sys.exit()
lines = re.split('\n', page.read())
url_root = re.sub('\w+.html', '', url)
urls = []
for line in lines:
match = re.search('<H1 class="TITLE">(.+)</H1>', line)
if match:
doujin_name = match.groups()[0]
match = re.search('<FONT SIZE="2" COLOR="#FFFFFF">([a-zA-Z0-9_\-\+\(\) ]+\.\w+)</FONT><BR>', line)
if match:
image_name = match.groups()[0]
image_url = url_root + image_name
urls.append(image_url)
os.mkdir(doujin_name)
os.chdir(doujin_name)
f = open('list.txt', 'w')
for item in urls:
f.write('%s\n' % item)
f.close()
user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)'
p = subprocess.Popen(['wget', '--user-agent=%s' % user_agent, '--input-file=list.txt', '-nd', '--referer=%s' % url])
p.wait()
os.remove('list.txt')
print '\n'
Simply pass the script the URL of the thumbnail page of the doujin you want and it will leech all the pages and save them to a subdirectory.
Example: leech "http://www.masterbloodfer.org:100/172x29-08-2006/%5BHAPPY%20WATER%5D%20Colorful%20Bleach%20(BLEACH)/indc.html";
This requires python and wget. It should work on Windows too if you add wget to your path or hard-code the path to wget(?).
Here it is:
#!/usr/bin/env python
import os, sys, urllib, re, subprocess
try:
url = sys.argv[1]
except IndexError:
print 'Usage: leech url'
sys.exit()
try:
print 'Fetching index page...'
page = urllib.urlopen(url)
except ValueError:
print 'Invalid URL: %s' % url
sys.exit()
lines = re.split('\n', page.read())
url_root = re.sub('\w+.html', '', url)
urls = []
for line in lines:
match = re.search('<H1 class="TITLE">(.+)</H1>', line)
if match:
doujin_name = match.groups()[0]
match = re.search('<FONT SIZE="2" COLOR="#FFFFFF">([a-zA-Z0-9_\-\+\(\) ]+\.\w+)</FONT><BR>', line)
if match:
image_name = match.groups()[0]
image_url = url_root + image_name
urls.append(image_url)
os.mkdir(doujin_name)
os.chdir(doujin_name)
f = open('list.txt', 'w')
for item in urls:
f.write('%s\n' % item)
f.close()
user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)'
p = subprocess.Popen(['wget', '--user-agent=%s' % user_agent, '--input-file=list.txt', '-nd', '--referer=%s' % url])
p.wait()
os.remove('list.txt')
print '\n'