Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Xarn

Name: Xarn !rllCC7PgaM 2008-03-12 16:58

It's been a long time since the last discussion on this topic.

Hello Xarn

Name: Anonymous 2010-03-25 15:09

>>35
Just write your own, you lazy bum. I did and it was much faster than Xarn's.

Name: Anonymous 2010-03-25 21:47

>>41
You can test it yourself, but it took ~16 min for a full scrape from scratch, as opposed to ~2 hours with Xarn's scraper1 (and at that time there were more posts to scrape).
It uses (by default) 16 worker threads and uses the JSON interface (which Xarn might have not been aware of) whenever it can. It also downloads only the missing posts (post selection also works with the JSON interface).
And it has a nice progress bar.

1: http://cairnarvon.rotahall.org/2008/12/25/χριστούγεννα/

Name: Anonymous 2010-03-26 13:38

>>44
The JSON interface didn't exist when Xarn first wrote /prog/scrape
I heard so too.

the problem with the JSON interface is that there's no way to distinguish a genuine tripcode from a fake one.
That's why my scraper also downloads the HTML versions of posts with tripcodes to verify them.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List