if verify_trips and len(tripv) > 0:
+ # We get 403 with long URLs
+ # if too many trips to check, fetch the whole thread
+ tripv_url = read_url + thread[0] + '/'
+ if len(tripv) < 200:
+ tripv_url += ','.join(tripv)
try:
- hp = urlopen(read_url + thread[0] + '/' + ','.join(tripv))
+ hp = urlopen(tripv_url)
except:
print "Couldn't access HTML interface to verify tripcodes.",\
Actually, I guess that last update isn't strictly pipelining, though a lot of people seem to be calling it that. It's just a persistent connection, which is still an improvement.
AFAIK there's no documented way to do HTTP pipelining using just the Python standard library.
adding real pipelining would require restructuring most of the program
That's a sure sign that the program should have been structured from the start instead of a mess of spaghetti code.
>>21
It fetches a page, does things with it, and then does it again for the next one until all pages have been fetched. The fact that adapting that to fetch all of those page asymmetrically with a priority queue (when --verify-trips is turned on) is less than straightforward isn't a comment on the design of the program so much as it is just the difference between one algorithm and a completely different one.
>>25
The issues of parallelisation and HTTP pipelining are orthogonal. It's pretty easy to add multithreading to progscrape, and a few people have, over the years.
Though it would fuck up the progress bar.
>>27
As written, the progress bar just goes up one line (using the ANSI console codes Xarn is so fond of) and prints some text. If each thread has its own progress bar that will be a bit of a mess.
Name:
Anonymous2010-08-03 12:14
MESS MY ANUS
Also, I've found that your tripcode searcher does not allow searching for numeric trips like 9000 or so. Please fix it.
>>31
Hey, that's like a less readable, slower version of Xarn's, except that it's not as easy to distribute over a cluster, suffers from NIH syndrome, and doesn't compile.