Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Hey Xarn, apply this patch!

Name: Anonymous 2010-07-27 6:37

--- progscrape.py    2010-07-26 13:42:46.000000000 +0200
+++ /tmp/progscrape.py    2010-07-27 11:00:18.991885078 +0200
@@ -299,8 +299,14 @@
 
 
         if verify_trips and len(tripv) > 0:
+            # We get 403 with long URLs
+            # if too many trips to check, fetch the whole thread
+            tripv_url = read_url + thread[0] + '/'
+            if len(tripv) < 200:
+                tripv_url += ','.join(tripv)
             try:
-                hp = urlopen(read_url + thread[0] + '/' + ','.join(tripv))
+                hp = urlopen(tripv_url)
 
             except:
                 print "Couldn't access HTML interface to verify tripcodes.",\


This will solve the 403 errors.

Name: Anonymous 2010-08-04 16:25

>>69
I'm just saying that you really don't know much about it if you can't work out simple cooperation between threads enough to get a progress bar working.

Xarn's is "slow", if you want to call it that, by design. I could write a single-threaded scraper that would beat the pants off it (and the choice of FIOC isn't the issue here) but it would hammer the server in just the way Xarn wanted to avoid.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List