Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Hey Xarn, apply this patch!

Name: Anonymous 2010-07-27 6:37

--- progscrape.py    2010-07-26 13:42:46.000000000 +0200
+++ /tmp/progscrape.py    2010-07-27 11:00:18.991885078 +0200
@@ -299,8 +299,14 @@
 
 
         if verify_trips and len(tripv) > 0:
+            # We get 403 with long URLs
+            # if too many trips to check, fetch the whole thread
+            tripv_url = read_url + thread[0] + '/'
+            if len(tripv) < 200:
+                tripv_url += ','.join(tripv)
             try:
-                hp = urlopen(read_url + thread[0] + '/' + ','.join(tripv))
+                hp = urlopen(tripv_url)
 
             except:
                 print "Couldn't access HTML interface to verify tripcodes.",\


This will solve the 403 errors.

Name: Anonymous 2010-08-03 16:23

>>39
It's slower because they both use OpenSSL's DES_crypt, but Xarn doesn't wrap his in superfluous crap, doesn't unnecessarily generate a completely random string (with associated function calls; the attempt at inlining doesn't actually inline those functions) for every iteration, and doesn't use regexes by default.

It's not as easy to distribute over a cluster because processes are just easier to distribute than threads. That's a fact of life.

It doesn't compile because it inexplicably doesn't use POSIX regexes.

Readability can be subjective when comparing, say, K&R indentation to Sun indentation. It isn't when comparing K&R indentation to whatever ridiculous Lisp-wannabe style that piece of shit uses.

As far as NIH syndrome goes, random number generation is one thing C has a plenty of options for, but he still chose to implement a silly knock-off.

Name: Anonymous 2010-08-03 16:54

It's slower because they both use OpenSSL's DES_crypt, but Xarn doesn't wrap his in superfluous crap, doesn't unnecessarily generate a completely random string (with associated function calls; the attempt at inlining doesn't actually inline those functions) for every iteration, and doesn't use regexes by default.
there are very good reasons for generating a random string. this has been discussed extensively on other boards. the fact that your compiler is shit doesn't mean that every compiler is.

It's not as easy to distribute over a cluster because processes are just easier to distribute than threads. That's a fact of life.
you obviously didn't look at it:
Usage: %s [-c] [-p processes] [-t threads] [regex]

It doesn't compile because it inexplicably doesn't use POSIX regexes.
pcre is faster and more useful than any free implementation of posix regexes.

As far as NIH syndrome goes, random number generation is one thing C has a plenty of options for, but he still chose to implement a silly knock-off.
name one that's as fast as that and has a period at least as long.

Name: Anonymous 2010-08-03 17:12

>>42
Are you trying to become the new FrozenVoid?

Name: Anonymous 2010-08-03 17:14

>>43
Are you?

Name: Anonymous 2010-08-03 17:16

>>43
At least FrozenVoid was entertaining. Hotaru is just petty and obnoxious.

Name: Anonymous 2010-08-03 17:21

>>42
name one that's as fast as that and has a period at least as long.
No, no, that's not how it works - you're meant to do that, otherwise you're just pulling 66arguments99 out of you're anus.

Name: Anonymous 2010-08-03 18:13

>>38-46
back to /newpol/, please.

Name: Anonymous 2010-08-03 18:17

>>46
That's all he does.

Name: Anonymous 2010-08-03 18:43

Why do Xarn threads always end with a bunch of posts trying to defend Xarn against non-existent attacks?

Name: Anonymous 2010-08-03 18:53

>>49
Oh, we're only one-twentieth done yet.

Name: Anonymous 2010-08-03 18:57

>>49
Taking offense at this Hotaru idiot does not imply a defense of Xarn.

Name: Anonymous 2010-08-03 19:07

>>51
Hotaru only made 3 posts in this thread (>>18,19,47), and no one took offense at any of them.

Name: Anonymous 2010-08-03 19:15

>>52
Poor attempt at trying to save face, or you don't actually know who we're talking about.

Name: Anonymous 2010-08-03 19:18

>>53
Those three posts are the only ones in this thread that were made by the Hotaru that is hotaru2k3 on GitHub. Are you talking about some other Hotaru?

Name: Anonymous 2010-08-03 19:21

>>52,54
stop feeding the trolls.

Name: Anonymous 2010-08-03 19:30

>>55
Stop feeding [spoiler]MY ANUS

Name: Anonymous 2010-08-03 19:34

>>31,52,54,55
Same fucking person. That's pathetic, Robert.

Name: Anonymous 2010-08-03 19:43

>>56-57
Your BBCODE segfaulted. Your ``same person" detector is probably just as broken.

Name: Anonymous 2010-08-03 19:49

>>58
Not as broken as yours.

Name: Anonymous 2010-08-03 19:56

>>57-59
Same person.

Name: Anonymous 2010-08-03 20:03

>>60-1,62-
SPAWHBTFTDOAET

Name: Anonymous 2010-08-03 20:39

>>34
Wait, is this Hotaru the same person who writes stupid shit in lowercase all the time, and then babbles incoherent shit about the superiority of ancient civilizations when someone calls him out on being an incompetent assclown who's too lazy to use his shift key?

Name: Anonymous 2010-08-03 20:48

>>62
I don't know, and I don't care. No one here cares. Feel free to discuss your irrelevant fictitious personas elsewhere.

Name: Anonymous 2010-08-03 20:59

>>62
He's pretending not to be, apparently, but yes.

Name: Anonymous 2010-08-04 13:47

>>61
SPAWHBTFTDOAET MY ANUS

Name: Anonymous 2010-08-04 14:15

>>65
Dipshit.

Name: Anonymous 2010-08-04 14:43

MrVacBob really needs to fix the spambot detection on /prog/. Spambots that bump ancient threads don't get banned, while Hotaru has been banned for almost a week (and is still banned).
Hotaru could not possibly have made any posts in this thread after >>28.

Name: Anonymous 2010-08-04 14:49

>>67
So says Hotaru, who incidentally doesn't seem to understand how to structure a multithreaded task very well.

Name: Anonymous 2010-08-04 14:58

>>68
Is that why it's so much faster than Xarn's FIOC?

Name: Anonymous 2010-08-04 16:02

>>69
/prog/scrape just doesn't do multithreading at all, for moral reasons. Even very shitty multithreading will usually be faster than that.

Name: Anonymous 2010-08-04 16:12

moral reasons
Sure, it's not like it's possible to do multithreading and still only use one or two connections to the server or anything.

Name: Anonymous 2010-08-04 16:17

>>71
You're an idiot.

Name: Anonymous 2010-08-04 16:25

>>69
I'm just saying that you really don't know much about it if you can't work out simple cooperation between threads enough to get a progress bar working.

Xarn's is "slow", if you want to call it that, by design. I could write a single-threaded scraper that would beat the pants off it (and the choice of FIOC isn't the issue here) but it would hammer the server in just the way Xarn wanted to avoid.

Name: Anonymous 2010-08-04 16:28

>>70
Ethical reasons.

>>71
Since >>72 didn't elaborate, I will: waiting on requests is where the vast majority of the overhead is. You can't compute a download.

Name: Anonymous 2010-08-04 16:38

/prog/scrape isn't actually slow in any real sense. The only time you might actually want multithreading is when you're scraping a whole board, and you'll only be doing that once (or not at all, http://github.com/downloads/Cairnarvon/progscrape/prog.db.lzma).

Name: Anonymous 2010-08-04 17:04

>>73
And you'd be 403'd before it got anything accomplished.

Name: Anonymous 2010-08-04 18:37

>>76
I doubt that. I would like to believe it but I just don't.

Name: Anonymous 2010-08-04 20:01

>>73
I'm just saying that you really don't know much about it if you can't work out simple cooperation between threads enough to get a progress bar working.
Of course it's possible, but as written, turning progscrape multithreaded just requires replacing the if use_json: and the corresponding else: with two def statements, and then wrapping calls to those in a class inheriting from Thread and using that with slices of the to_update list, being maybe five lines of code. Fixing the progress bar after that is easily as much code again.

Name: Anonymous 2010-08-04 21:55

>>78
Oh shit. That would take a Python programmer the better part of a week.

Name: Anonymous 2010-08-05 1:30

>>79
To be fair, the equivalent code in any other language would take months.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List