Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Online /prog/scrape

Name: Anonymous 2010-06-05 11:56

Is there a web accessible /progscrape that'll allow me to scrape world4ch without downloading and updating the database at http://cairnarvon.rotahall.org/2008/11/30/progscrape/?

Name: Anonymous 2010-06-05 22:16



  File "x.py", line 102, in <module>
    page = urllib.urlopen(read_url + thread[0] + '/1-').read()
  File "/usr/lib/python2.5/urllib.py", line 82, in urlopen
    return opener.open(url)
  File "/usr/lib/python2.5/urllib.py", line 190, in open
    return getattr(self, name)(url)
  File "/usr/lib/python2.5/urllib.py", line 328, in open_http
    errcode, errmsg, headers = h.getreply()
  File "/usr/lib/python2.5/httplib.py", line 1199, in getreply
    response = self._conn.getresponse()
  File "/usr/lib/python2.5/httplib.py", line 928, in getresponse
    response.begin()
  File "/usr/lib/python2.5/httplib.py", line 385, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.5/httplib.py", line 343, in _read_status
    line = self.fp.readline()
  File "/usr/lib/python2.5/socket.py", line 372, in readline
    data = recv(1)
IOError: [Errno socket error] (104, 'Connection reset by peer')

Name: Anonymous 2010-06-05 22:45

>>17
not updating database considered harmful


Fetching subject.txt... Got it.
subjects.txt fail: Anonymous<><>1220909598<><a href="read/prog/1220718054/17">&gt;&gt;17</a><br/>Don't ask me, aniki.<> <>131.116.254.199<><><>1220718054<>6<><>1231185336

8896 threads to update.
Updating thread 1275783543...
Updating thread 1275783433...
Updating thread 1202944294...
Traceback (most recent call last):
  File "progscrape.py", line 186, in <module>
    db.execute(u'INSERT INTO posts (thread, id, author, email, trip, time, body) VALUES (?, ?, ?, ?, ?, ?, ?)', b)
sqlite3.IntegrityError: columns thread, id are not unique

Name: Anonymous 2010-06-05 23:12

>>18
I don't know how you managed that. Are you running the latest version? I restarted /prog/scrape from scratch a while ago and it's been running smoothly.

Name: Anonymous 2010-06-05 23:18

>>19

Mhmm. I just grabbed it from Sarn's blog.

Name: Anonymous 2010-06-05 23:21

>>19

Oh, I should add that that's updating Sarn's database. I eventually get the same problem as >>17-kun if done from scratch

Name: Anonymous 2010-06-05 23:43

>>20
By blog do you mean Github? Because the one on the blog itself is a few Shiitchan fuck-ups out of date.
http://github.com/Cairnarvon/progscrape/blob/master/progscrape.py

Name: Anonymous 2010-06-05 23:47

Yeah, I've got the one from the github attempting to run over the database

I'll try from scratch- the first time I did it with the non-github version

Name: Anonymous 2010-06-06 2:38

>>23
Going from scratch works just fine. A nice 141 MB database file.

Name: Anonymous 2010-06-06 10:58

>>24

Yeah, it did fine for me as well.
Mine's only 134.8 [code]:'([code]

And I still can't find that thread of vagina programs

Name: Anonymous 2010-06-06 11:30

>>25
FIND MY VAGINA

Name: Anonymous 2010-06-09 16:29

>>11

:(

What's wrong with it?

Name: Anonymous 2010-06-09 17:13

>>27
Please review PEP 8, for a start.

Name: Anonymous 2010-06-09 17:36

My world4chscrape is better than Xarn's, and yet it was ignored :(

Name: Anonymous 2010-06-09 17:42

>>29
I bet it wasn't in Python. /prog/ actually secretly loves the FIOC.

Name: Anonymous 2010-06-09 17:46

Somebody should cron this on their server and write a web 2.0 database searcher

Name: Anonymous 2010-06-09 17:51

>>31
CRON MY VAGINA

Name: Anonymous 2010-06-09 17:55

Writing a         scrape in bash.

Name: Anonymous 2010-06-09 18:48

>>30
No, it was in FIOC. And I put so much effort into commenting my regexen, too! And into making it VROOM VROOM. But no-one cared. I sad :(

Name: Anonymous 2010-06-09 18:55

>>31
I was thinking about writing a Distributed Internet Content Archival System: Solution for Home Businesses, by which I mean a botnet-enabled 4chan archiver. It would have two main modes of operation, normal (legit) mode where the user willingly installs it on their system, and a silent (botnet) mode where the user is unaware of the program running. I started thinking about details, but now I wonder whether many people (aside from me) would actually be interested in using this.

Name: Anonymous 2010-06-09 19:19

>>35
s/Archival/Kompression/ and you've got DICKS

Name: Anonymous 2010-06-09 19:20

>>29,34
Was that the ENTERPRISE FIOC one with its OOP and its half a dozen files?

Name: Anonymous 2010-06-09 19:50

>>37
I didn't do any intentional ENTERPRISING, but yeah, I did create one class. And there was only a quarter of a dozen files :<

Name: Anonymous 2010-06-09 22:00

>>38
The fact that it wasn't just one small script you can put somewhere was probably the most significant factor against it. Though the fact that progscrape actually works just fine probably didn't help.
An additional consideration is that even progscrape isn't that widely used. The Github project only has one follower (besides Xarn), after all.

Name: Anonymous 2010-06-09 22:29

progscrape actually works just fine
No, it breaks when parsing subject.txt, and is 8 times slower.
Anyway, I completely understand that someone might not want to use something I wrote.

Name: Anonymous 2010-06-09 22:52

>>40
It doesn't break when parsing subject.txt at all; it correctly points out that /prog/'s subject.txt has an invalid entry. This is a Shiitchan bug, not a progscrape bug.
The fact that it's eight times slower is a debatable point. Xarn said that he thinks hammering dis with extra threads is poor form, and progscrape's speed is just fine for people who run it every day. The only people affected by it's single-threadedness are people running it for the first time, and even then it only takes a few hours to finish.

Name: Anonymous 2010-06-09 23:04

>>39
The Github project only has one follower
IMO, that fact is irrelevant. It's relatively easy to imagine that most /prog/riders aren't going to be willing to compromise their anonymity just to know when Xarn updates his script. You'll notice that even in contest threads most entries are anonymous, and those that aren't are only identified by a tripcode.

Name: Anonymous 2010-06-09 23:14

>>42
The imageboards may have turned anonymity into a cult, but /prog/ usually seems to realize there's a time and a place for things, and understands why world4ch is anonymous by default. We have no problem shedding anonymity for things like IRC or Web 2.0 social media, so I don't see why Github should be any different.
Pseudonymity is every bit as good as anonymity in the context of Github, anyway.

But progscrape actually has two followers besides Xarn.

Name: Anonymous 2010-10-04 2:00

Wait, it's called /prog/scrape?

I always thought it was progscape.

Name: Anonymous 2010-10-04 3:42

>> 44 sage.

>> 35 If you need some bots or are having trouble thinking of a unique way to spread them, make a new thread. I'll give you a hand on how to set up an effective server, a thousand bots.

Name: Anonymous 2010-10-04 8:17

>>45
Less of you.

Name: Anonymous 2010-10-04 15:04

>> 46 Less of telling people how to do things? Or just breaking the law?

I'm so intrigued to know so I may make an apt response.

Name: Anonymous 2010-10-04 15:39

>>44
The script is progscrape.py and the Github repository is progscrape because of technical limitations, but the name of the project is /prog/scrape. As close an approximation to this name as possible should be used.

Name: Anonymous 2010-10-04 17:27

/prog/scrape

Name: Anonymous 2010-10-05 2:37

╱ℙ????????????╱????????????????????????

Name: Anonymous 2010-12-06 9:51

Back to /b/, ``GNAA Faggot''

Name: Anonymous 2010-12-17 1:35

Erika once told me that Xarn is a bad boyfriend

Name: Anonymous 2010-12-20 23:58

Name: Anonymous 2011-01-31 19:53

<-- check em dubz

Name: deafragr 2011-05-08 20:22


 http://www.charms-charms.com with fellow pop star Debbie Gibson for space on the covers of teen magazines, including Tiger Beat, and Teen Beat, as well as on  http://www.charms-charms.com.[citation needed] Her ballad "Could've Been" also peaked at the number one spot on the http://www.charms-charms.com/tiffany-bangles-c-16.html

Name: Anonymous 2013-02-09 6:05


Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List