Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-4041-8081-

Hey Xarn, apply this patch!

Name: Anonymous 2010-07-27 6:37

--- progscrape.py    2010-07-26 13:42:46.000000000 +0200
+++ /tmp/progscrape.py    2010-07-27 11:00:18.991885078 +0200
@@ -299,8 +299,14 @@
 
 
         if verify_trips and len(tripv) > 0:
+            # We get 403 with long URLs
+            # if too many trips to check, fetch the whole thread
+            tripv_url = read_url + thread[0] + '/'
+            if len(tripv) < 200:
+                tripv_url += ','.join(tripv)
             try:
-                hp = urlopen(read_url + thread[0] + '/' + ','.join(tripv))
+                hp = urlopen(tripv_url)
 
             except:
                 print "Couldn't access HTML interface to verify tripcodes.",\


This will solve the 403 errors.

Name: Anonymous 2010-07-27 6:45

Forgot to point the offensive thread.

http://dis.4chan.org/read/prog/1247978789

Name: Anonymous 2010-07-27 7:55

FORGET MY ANUS

Name: Xarn !Rmk.XarnE2!OR/nEWfAt6nbhpH 2010-07-27 14:03

Alright.

Name: Xarn !Rmk.XarnE2!OR/nEWfAt6nbhpH 2010-07-27 15:22

>>5
That's reasonable. Any other suggestions?

Name: Anonymous 2010-07-27 16:55

Bamp for useful thread.

Name: Anonymous 2010-07-27 17:03

How about using multiple threads?

Name: Anonymous 2010-07-27 17:11

>>8
Threading is pointless in the common use case, and approaching DOSing when scraping an entire board.

Name: Anonymous 2010-07-27 17:33

>>6
Add some blinking lights so that the program makes my boss think I'm doing important work.

Name: Anonymous 2010-07-27 17:59

>>9
How about at least using pipelining then?

Name: Anonymous 2010-07-27 18:05

>>11
That's unpythonic.

Name: Anonymous 2010-07-27 18:15

>>12
Oh you. Quit talking up the idea.

Name: Xarn !Rmk.XarnE2!OR/nEWfAt6nbhpH 2010-07-27 18:52

>>11
I'm not entirely convinced that makes a lot of difference, but alright.

Name: Xarn !Rmk.XarnE2!OR/nEWfAt6nbhpH 2010-07-27 19:07

Actually, I guess that last update isn't strictly pipelining, though a lot of people seem to be calling it that. It's just a persistent connection, which is still an improvement.
AFAIK there's no documented way to do HTTP pipelining using just the Python standard library.

Name: Anonymous 2010-07-27 19:19

Sage for Xarn

Name: Anonymous 2010-07-27 20:18

>>16
Age for anti-Xarn.

Name: Anonymous 2010-07-27 20:35

AFAIK there's no documented way to do HTTP pipelining using just the Python standard library.
not in the Python standard library, but PycURL can do it: http://pycurl.sourceforge.net/doc/curlmultiobject.html

it should be trivial to check if PycURL is installed and use libhttp otherwise.

Name: Anonymous 2010-07-27 21:05

>>18
s/libhttp/httplib/

Name: Anonymous 2010-07-27 21:29

There's also the issue that adding real pipelining would require restructuring most of the program and not really be that useful.

Name: Anonymous 2010-07-27 21:57

adding real pipelining would require restructuring most of the program
That's a sure sign that the program should have been structured from the start instead of a mess of spaghetti code.

Name: Anonymous 2010-07-27 22:23

>>21
It fetches a page, does things with it, and then does it again for the next one until all pages have been fetched. The fact that adapting that to fetch all of those page asymmetrically with a priority queue (when --verify-trips is turned on) is less than straightforward isn't a comment on the design of the program so much as it is just the difference between one algorithm and a completely different one.

Name: Anonymous 2010-07-27 22:40

>>22
Design of a program in no way relates to algorithms and data structures chosen
Clever, almost.

Name: Anonymous 2010-07-27 23:26

>>23
Reading comprehension isn't your forté. You may want to work on that.

Name: Anonymous 2010-07-28 0:26

>>22
If your program performs a trivially parallelizable task and is not itself trivially parallelizable, you're doing something very wrong.

Name: Anonymous 2010-07-28 0:44

>>25
The issues of parallelisation and HTTP pipelining are orthogonal. It's pretty easy to add multithreading to progscrape, and a few people have, over the years.
Though it would fuck up the progress bar.

Name: Anonymous 2010-07-28 2:24

>>26
Though it would fuck up the progress bar.
HIBT? It can't possibly be that difficult to make a progress bar work with threads.

Name: Anonymous 2010-07-28 13:52

>>27
As written, the progress bar just goes up one line (using the ANSI console codes Xarn is so fond of) and prints some text. If each thread has its own progress bar that will be a bit of a mess.

Name: Anonymous 2010-08-03 12:14

MESS MY ANUS

Also, I've found that your tripcode searcher does not allow searching for numeric trips like 9000 or so. Please fix it.

Name: Anonymous 2010-08-03 12:20

>>28
Perhaps Xarn should use ncurses.

Name: Anonymous 2010-08-03 12:56

Name: Anonymous 2010-08-03 13:11

>>31
Pig disgusting abuse of whitespace.

Name: Anonymous 2010-08-03 13:56

>>31
Fuck off hotaru.

Name: Anonymous 2010-08-03 14:03

>>33
Hotaru doesn't use capital letters like that, and will probably be annoyed by the fact that I capitalized ``Hotaru".

Name: Anonymous 2010-08-03 14:05

>>33,34
Don't be mean, this Hotaru is only seven years old.

Name: Anonymous 2010-08-03 14:38

>>35
He wrote a tripcode searcher when he was two years old?

Name: Anonymous 2010-08-03 15:16

>>29
Which tripcode searcher is that? Because the one on Github does just fine.

Name: Anonymous 2010-08-03 15:24

>>31
Hey, that's like a less readable, slower version of Xarn's, except that it's not as easy to distribute over a cluster, suffers from NIH syndrome, and doesn't compile.

Name: Anonymous 2010-08-03 15:27

slower
not as easy to distribute over a cluster
doesn't compile
Obviously false.

less readable
Completely subjective.

NIH syndrome
I don't think that means what you think it means.

Name: Anonymous 2010-08-03 15:42

>>39
Completely subjective.
Hello, FV.

Name: Anonymous 2010-08-03 16:23

>>39
It's slower because they both use OpenSSL's DES_crypt, but Xarn doesn't wrap his in superfluous crap, doesn't unnecessarily generate a completely random string (with associated function calls; the attempt at inlining doesn't actually inline those functions) for every iteration, and doesn't use regexes by default.

It's not as easy to distribute over a cluster because processes are just easier to distribute than threads. That's a fact of life.

It doesn't compile because it inexplicably doesn't use POSIX regexes.

Readability can be subjective when comparing, say, K&R indentation to Sun indentation. It isn't when comparing K&R indentation to whatever ridiculous Lisp-wannabe style that piece of shit uses.

As far as NIH syndrome goes, random number generation is one thing C has a plenty of options for, but he still chose to implement a silly knock-off.

Name: Anonymous 2010-08-03 16:54

It's slower because they both use OpenSSL's DES_crypt, but Xarn doesn't wrap his in superfluous crap, doesn't unnecessarily generate a completely random string (with associated function calls; the attempt at inlining doesn't actually inline those functions) for every iteration, and doesn't use regexes by default.
there are very good reasons for generating a random string. this has been discussed extensively on other boards. the fact that your compiler is shit doesn't mean that every compiler is.

It's not as easy to distribute over a cluster because processes are just easier to distribute than threads. That's a fact of life.
you obviously didn't look at it:
Usage: %s [-c] [-p processes] [-t threads] [regex]

It doesn't compile because it inexplicably doesn't use POSIX regexes.
pcre is faster and more useful than any free implementation of posix regexes.

As far as NIH syndrome goes, random number generation is one thing C has a plenty of options for, but he still chose to implement a silly knock-off.
name one that's as fast as that and has a period at least as long.

Name: Anonymous 2010-08-03 17:12

>>42
Are you trying to become the new FrozenVoid?

Name: Anonymous 2010-08-03 17:14

>>43
Are you?

Name: Anonymous 2010-08-03 17:16

>>43
At least FrozenVoid was entertaining. Hotaru is just petty and obnoxious.

Name: Anonymous 2010-08-03 17:21

>>42
name one that's as fast as that and has a period at least as long.
No, no, that's not how it works - you're meant to do that, otherwise you're just pulling 66arguments99 out of you're anus.

Name: Anonymous 2010-08-03 18:13

>>38-46
back to /newpol/, please.

Name: Anonymous 2010-08-03 18:17

>>46
That's all he does.

Name: Anonymous 2010-08-03 18:43

Why do Xarn threads always end with a bunch of posts trying to defend Xarn against non-existent attacks?

Name: Anonymous 2010-08-03 18:53

>>49
Oh, we're only one-twentieth done yet.

Name: Anonymous 2010-08-03 18:57

>>49
Taking offense at this Hotaru idiot does not imply a defense of Xarn.

Name: Anonymous 2010-08-03 19:07

>>51
Hotaru only made 3 posts in this thread (>>18,19,47), and no one took offense at any of them.

Name: Anonymous 2010-08-03 19:15

>>52
Poor attempt at trying to save face, or you don't actually know who we're talking about.

Name: Anonymous 2010-08-03 19:18

>>53
Those three posts are the only ones in this thread that were made by the Hotaru that is hotaru2k3 on GitHub. Are you talking about some other Hotaru?

Name: Anonymous 2010-08-03 19:21

>>52,54
stop feeding the trolls.

Name: Anonymous 2010-08-03 19:30

>>55
Stop feeding [spoiler]MY ANUS

Name: Anonymous 2010-08-03 19:34

>>31,52,54,55
Same fucking person. That's pathetic, Robert.

Name: Anonymous 2010-08-03 19:43

>>56-57
Your BBCODE segfaulted. Your ``same person" detector is probably just as broken.

Name: Anonymous 2010-08-03 19:49

>>58
Not as broken as yours.

Name: Anonymous 2010-08-03 19:56

>>57-59
Same person.

Name: Anonymous 2010-08-03 20:03

>>60-1,62-
SPAWHBTFTDOAET

Name: Anonymous 2010-08-03 20:39

>>34
Wait, is this Hotaru the same person who writes stupid shit in lowercase all the time, and then babbles incoherent shit about the superiority of ancient civilizations when someone calls him out on being an incompetent assclown who's too lazy to use his shift key?

Name: Anonymous 2010-08-03 20:48

>>62
I don't know, and I don't care. No one here cares. Feel free to discuss your irrelevant fictitious personas elsewhere.

Name: Anonymous 2010-08-03 20:59

>>62
He's pretending not to be, apparently, but yes.

Name: Anonymous 2010-08-04 13:47

>>61
SPAWHBTFTDOAET MY ANUS

Name: Anonymous 2010-08-04 14:15

>>65
Dipshit.

Name: Anonymous 2010-08-04 14:43

MrVacBob really needs to fix the spambot detection on /prog/. Spambots that bump ancient threads don't get banned, while Hotaru has been banned for almost a week (and is still banned).
Hotaru could not possibly have made any posts in this thread after >>28.

Name: Anonymous 2010-08-04 14:49

>>67
So says Hotaru, who incidentally doesn't seem to understand how to structure a multithreaded task very well.

Name: Anonymous 2010-08-04 14:58

>>68
Is that why it's so much faster than Xarn's FIOC?

Name: Anonymous 2010-08-04 16:02

>>69
/prog/scrape just doesn't do multithreading at all, for moral reasons. Even very shitty multithreading will usually be faster than that.

Name: Anonymous 2010-08-04 16:12

moral reasons
Sure, it's not like it's possible to do multithreading and still only use one or two connections to the server or anything.

Name: Anonymous 2010-08-04 16:17

>>71
You're an idiot.

Name: Anonymous 2010-08-04 16:25

>>69
I'm just saying that you really don't know much about it if you can't work out simple cooperation between threads enough to get a progress bar working.

Xarn's is "slow", if you want to call it that, by design. I could write a single-threaded scraper that would beat the pants off it (and the choice of FIOC isn't the issue here) but it would hammer the server in just the way Xarn wanted to avoid.

Name: Anonymous 2010-08-04 16:28

>>70
Ethical reasons.

>>71
Since >>72 didn't elaborate, I will: waiting on requests is where the vast majority of the overhead is. You can't compute a download.

Name: Anonymous 2010-08-04 16:38

/prog/scrape isn't actually slow in any real sense. The only time you might actually want multithreading is when you're scraping a whole board, and you'll only be doing that once (or not at all, http://github.com/downloads/Cairnarvon/progscrape/prog.db.lzma).

Name: Anonymous 2010-08-04 17:04

>>73
And you'd be 403'd before it got anything accomplished.

Name: Anonymous 2010-08-04 18:37

>>76
I doubt that. I would like to believe it but I just don't.

Name: Anonymous 2010-08-04 20:01

>>73
I'm just saying that you really don't know much about it if you can't work out simple cooperation between threads enough to get a progress bar working.
Of course it's possible, but as written, turning progscrape multithreaded just requires replacing the if use_json: and the corresponding else: with two def statements, and then wrapping calls to those in a class inheriting from Thread and using that with slices of the to_update list, being maybe five lines of code. Fixing the progress bar after that is easily as much code again.

Name: Anonymous 2010-08-04 21:55

>>78
Oh shit. That would take a Python programmer the better part of a week.

Name: Anonymous 2010-08-05 1:30

>>79
To be fair, the equivalent code in any other language would take months.

Name: Anonymous 2010-12-21 17:54

Name: Anonymous 2011-06-16 13:49

Bump for thread with Xarn.

Name: Anonymous 2011-07-27 8:27

I am the OP and posted the proposed patch just one year ago, and Xard did apply it. I am proud.

Name: Anonymous 2011-07-27 8:29

>>84
Good thing he is not on /prog/ anymore. He was the worst shitposter, one year ago this place was horrible.

Name: Anonymous 2011-07-27 8:53

403 errors... that's a lot of bugs patched in such a small piece of code... congratulations

Name: Anonymous 2011-07-27 8:59

>>85
More horrible than right now? Unbelievable!

Name: Anonymous 2011-07-27 10:19

>>85,87
fuck off and die, xarn hater

Name: Anonymous 2011-07-27 11:05

>>88
Fuck off, XARN!

Also nice doubles, Xarn ;)

Name: Anonymous 2012-06-27 7:00

>>85 now is even worse

Name: Anonymous 2012-06-27 7:04

wow is this the only thread that isn't afflicted by that nippon gibberish?

Name: Anonymous 2012-06-27 7:09

瘣ᖖ∣⒃݈醘悅劈ቴ焑虥╴㕱镱咄夀肕⍸╢饃└覔蝀㔂掄皑䌢䎄茢聤ᄠ褸隀瘑饕蕸⁧甲ᄧ̲ŕ㞉ᤒ醗陨犁࡙Ă頒癲Ȩ䀒⎘坸噃禇袘䜢ᔙ䌢ቴ犁捓䑠捣㔡▙怡襃嘰皀鞅璈蜃膄偅䍈䉀频ܗ䒙ᚉ攨䡅鞅愱┸蜤͠脃猧卲㤅䤵鄐䉕镔蝷蔠憙ᜁ化憗⦈聡䔹呕㌙ፉ቉袖㤃Ũ愑ƙ嚓ᑰ㍰餲晡┰急䥴॔⁹ᒑ刑聈睱࠶耷═悓⢗靰妉夘攤✘脹暑镳捨朵䕔腵ᤆ夢┱䉁ও塁螙㥅Ɓ奤螁枖鄇☦䖃舨堕敒ᅣ蕩䑶᥁⚁⁘逄錣䞑桢䅠划鈇ᜣ㦅ń聥☥㙱Җ爢ᅙ錰锨គ恔ކ噳昘週啥㉔鉴ᄙࡐ衶願⡤䦐砥㑃ᔧ否均噁‴㈴銉鉳蜠蚇耥㥥ᜉ複㙙⠴摈攘ဒ頶❣禃挧䈅萙䖃倲⠅煇醃荑錒逅ᅵΖږᐠ肐頹螃霙ᖃ癱㥅ԅᠡ䆙㝤₉ш虷奁晓㦘℉吩癗㙲〱靥䔵ↆ隐䆉祆蔀捳袔舗⡩፳刖撑㌥葇☳☄蔴ո焃瞉⠴̑䀕鑉梈噑慧襥ℰ〙呩阨呑㍵

Name: Anonymous 2012-06-27 7:34

>>91 oh well, shit

Don't change these.
Name: Email:
Entire Thread Thread List