Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Hey Xarn, apply this patch!

Name: Anonymous 2010-07-27 6:37

--- progscrape.py    2010-07-26 13:42:46.000000000 +0200
+++ /tmp/progscrape.py    2010-07-27 11:00:18.991885078 +0200
@@ -299,8 +299,14 @@
 
 
         if verify_trips and len(tripv) > 0:
+            # We get 403 with long URLs
+            # if too many trips to check, fetch the whole thread
+            tripv_url = read_url + thread[0] + '/'
+            if len(tripv) < 200:
+                tripv_url += ','.join(tripv)
             try:
-                hp = urlopen(read_url + thread[0] + '/' + ','.join(tripv))
+                hp = urlopen(tripv_url)
 
             except:
                 print "Couldn't access HTML interface to verify tripcodes.",\


This will solve the 403 errors.

Name: Anonymous 2010-07-27 22:23

>>21
It fetches a page, does things with it, and then does it again for the next one until all pages have been fetched. The fact that adapting that to fetch all of those page asymmetrically with a priority queue (when --verify-trips is turned on) is less than straightforward isn't a comment on the design of the program so much as it is just the difference between one algorithm and a completely different one.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List