File "x.py", line 102, in <module>
page = urllib.urlopen(read_url + thread[0] + '/1-').read()
File "/usr/lib/python2.5/urllib.py", line 82, in urlopen
return opener.open(url)
File "/usr/lib/python2.5/urllib.py", line 190, in open
return getattr(self, name)(url)
File "/usr/lib/python2.5/urllib.py", line 328, in open_http
errcode, errmsg, headers = h.getreply()
File "/usr/lib/python2.5/httplib.py", line 1199, in getreply
response = self._conn.getresponse()
File "/usr/lib/python2.5/httplib.py", line 928, in getresponse
response.begin()
File "/usr/lib/python2.5/httplib.py", line 385, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.5/httplib.py", line 343, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.5/socket.py", line 372, in readline
data = recv(1)
IOError: [Errno socket error] (104, 'Connection reset by peer')
8896 threads to update.
Updating thread 1275783543...
Updating thread 1275783433...
Updating thread 1202944294...
Traceback (most recent call last):
File "progscrape.py", line 186, in <module>
db.execute(u'INSERT INTO posts (thread, id, author, email, trip, time, body) VALUES (?, ?, ?, ?, ?, ?, ?)', b)
sqlite3.IntegrityError: columns thread, id are not unique
Name:
Anonymous2010-06-05 23:12
>>18
I don't know how you managed that. Are you running the latest version? I restarted /prog/scrape from scratch a while ago and it's been running smoothly.
>>30
No, it was in FIOC. And I put so much effort into commenting my regexen, too! And into making it VROOM VROOM. But no-one cared. I sad :(
Name:
Anonymous2010-06-09 18:55
>>31
I was thinking about writing a Distributed Internet Content Archival System: Solution for Home Businesses, by which I mean a botnet-enabled 4chan archiver. It would have two main modes of operation, normal (legit) mode where the user willingly installs it on their system, and a silent (botnet) mode where the user is unaware of the program running. I started thinking about details, but now I wonder whether many people (aside from me) would actually be interested in using this.
>>38
The fact that it wasn't just one small script you can put somewhere was probably the most significant factor against it. Though the fact that progscrape actually works just fine probably didn't help.
An additional consideration is that even progscrape isn't that widely used. The Github project only has one follower (besides Xarn), after all.
Name:
Anonymous2010-06-09 22:29
progscrape actually works just fine
No, it breaks when parsing subject.txt, and is 8 times slower. Anyway, I completely understand that someone might not want to use something I wrote.
Name:
Anonymous2010-06-09 22:52
>>40
It doesn't break when parsing subject.txt at all; it correctly points out that /prog/'s subject.txt has an invalid entry. This is a Shiitchan bug, not a progscrape bug.
The fact that it's eight times slower is a debatable point. Xarn said that he thinks hammering dis with extra threads is poor form, and progscrape's speed is just fine for people who run it every day. The only people affected by it's single-threadedness are people running it for the first time, and even then it only takes a few hours to finish.
>>39 The Github project only has one follower
IMO, that fact is irrelevant. It's relatively easy to imagine that most /prog/riders aren't going to be willing to compromise their anonymity just to know when Xarn updates his script. You'll notice that even in contest threads most entries are anonymous, and those that aren't are only identified by a tripcode.
>>42
The imageboards may have turned anonymity into a cult, but /prog/ usually seems to realize there's a time and a place for things, and understands why world4ch is anonymous by default. We have no problem shedding anonymity for things like IRC or Web 2.0 social media, so I don't see why Github should be any different.
Pseudonymity is every bit as good as anonymity in the context of Github, anyway.
But progscrape actually has two followers besides Xarn.
Name:
Anonymous2010-10-04 2:00
Wait, it's called /prog/scrape?
I always thought it was progscape.
Name:
Anonymous2010-10-04 3:42
>> 44 sage.
>> 35 If you need some bots or are having trouble thinking of a unique way to spread them, make a new thread. I'll give you a hand on how to set up an effective server, a thousand bots.
>>44
The script is progscrape.py and the Github repository is progscrape because of technical limitations, but the name of the project is /prog/scrape. As close an approximation to this name as possible should be used.