Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-4041-

Show 4chan your GitHub Projects

Name: Anonymous 2012-03-04 21:19

4chanImageGroper is a image crawler that grabs all the images in a thread and saves them on your computer.
It's written in Ruby.

https://github.com/serv/4chanImageGroper

--

Show off yours and let's watch interesting ones.

Name: Anonymous 2012-03-04 21:24

Ruby The Booby

Name: Anonymous 2012-03-04 21:46

back to the imageboards, ``son"

Name: Anonymous 2012-03-04 22:11

>>3
it is ''on topic´´ you ''fag´´

Name: Anonymous 2012-03-04 22:27

>>1
not using downthemall for 4chan threads
Shit, 4chan is just a single page with direct links to images. You should spend time writing crawlers for things dTa can't handle, like gelbooru or imagefap, where image links are on a second level page from a gallery/search results.

Name: Anonymous 2012-03-04 22:38

Don't even need downthemall. I can just use ctrl+shift+L on Opera.

Name: Anonymous 2012-03-04 22:39

>>6
Opera is shit.

Name: Anonymous 2012-03-04 23:36

>ruby

lol how cute. do you wanna be a real programmer when you grow up?

Name: Anonymous 2012-03-04 23:43

>>7
Opera is the best.

Name: Anonymous 2012-03-04 23:48

lines 25-137
Jesus fucking Christ.

Name: Anonymous 2012-03-04 23:49

Name: Anonymous 2012-03-04 23:56

>>10
And this is why Lisp is dead.

Name: Anonymous 2012-03-04 23:56

Ruby
Here's why one should be wise regarding Ruby:
- Ruby indulges obfuscation: Ruby has no keyword/optional arguments, so you'll have to use hash parameters as a substitute. This is an idiom that comes from Perl. Ugly, Perl-looking code, like proc {|obj, *args| obj.send(self, *args)} or (0..127).each { |n| p n.chr }, considered beautiful. Another confusing Perl borrowing are postfix `if` and `while` (line = file.readline while line != "needle" if valid line) and quirky variable names (partially due to naive environment design): @instance_var, @@class_var, CONSTANT_VAR, $global_var, :sym, &proc, $~[1], $!, $>, $@, $&, $+, $0, $~, $’, $`, $:, $., $* and $?. If A is [1,2,3] and B is [10,20,30], then A+B is [1,2,3,10,20,30], when you probably wanted [11,22,33]. If `a` and `b` are undefined, then "a = b" produces error, but "a = a" gives `nil`.
- Faulty syntax. Ruby cant distinguishing a method call from an operator: "a +b" can be both "a(+b)" and "a + b" - remove the space to the left of "+" or add a space to the right of "+", and it will be parsed as an addition. Same with "puts s *10", which is parsed as puts(s(*10)). Ruby's expressions terminate by a newline and you have to implicitly state that the expression is not over, using trailing + or \. That makes it easy to make a dumb syntactic mistake by forgeting to continue line. It also encourages putting everything onto a single line, producing messy looking code. A good amount of your code will consist of "begin end begin begin end end..." noise.
- Slow: JIT-compiling implementations exist, but they're still slow and incomplete, due to Ruby's complexity and bad design, which make Ruby difficult to optimize compared to other dynamic languages, like Lisp. For example, Ruby has to accomodate for somebody in another thread changing the definition of a class spontaneously, forcing compiler to be very conservative. Compiler hints, like `int X` from C/C++ or `declare (int X)` from Lisp, arent possible either.
- Ruby's GC is a naive mark-and-sweep implementation, which stores the mark bit directly inside objects, a GC cycle will thus result in all objects being written to, making their memory pages `dirty` and Ruby's speed proportional to the number of allocated objects. Ruby simply was not designed to support hundred thousand objects allocation per second. Unfortunately, that’s exactly what frameworks like Ruby on Rails do. The more objects you allocate, the more time you "lose" at code execution. For instance something as simple as 100.times{ ‘foo’ } allocates 100 string objects, because strings are mutable and therefore each version requires its own copy. A simple Ruby on Rails 'hello world' already uses around 332000 objects.
- OOP: Matz had a bit too much of the "OOP is the light and the way" philosophy in him, in effect Ruby doesn't have stand-alone functions and Ruby's blocks can't be used in exactly the same way as usual closures. Even high-order functions are attached to objects and produce verbose code: "names.map { |name| name.upcase }", instead of simple "map upcase names".
- Ruby (like most other scripting languages) does not require variables to be declared, as (let (x 123) ...) in Lisp or int x = 123 in C/C++. If you want a variable private to a block, you need to pick an unique variable name, holding the entire symbol table in your head. Ruby  introduces new variables by just parsing their assignements, meaning "a = 1 if false; a" wont raise an error. All that means Ruby can't detect even a trivial typo - it will produce a program, which will continue working for hours until it reaches the typo. Local and global scopes are unintuitive. Certain operations (like regular expression operator) create implicit local variables for even more confusion.
- "def method_missing(*args)" is a blackhole, it makes language semantic overly cryptic. Debugging code that uses method_missing is painful: at best you get a NoMethodError on an object that you didn't expect, and at worst you get SystemStackError.
- Non-othogonal: {|bar| bar.foo}, proc {|bar| bar.foo}, lambda {|bar| bar.foo}, def baz(bar) bar.foo end - all copy the same functionality, where Lisp gets along with only `lambda`. Some Ruby's features duplicate each other: print "Hello", puts "Hello",  $stdout<<"Hello", printf "Hello", p "Hello", write "Hello" and putc "Hello" -- all output text to stdout; there is also sprintf, which duplicates functionality of printf and string splicing. begin/do/then/end, {} and `:` also play role in bloating syntax, however, in some cases, precedence issues cause do/end and {} to act differently ({} binds more tightly than a do/end). More bloat comes from || and `or`, which serve the same purpose.
- Ruby as a language supports continuations via callcc keyword. Ruby's callcc is incredibly slow, implemented via stack copying. JRuby and IronRuby don't have continuations at all, and it's quite unlikely they will ever get them. There were also support breaches in mainline Ruby, where Ruby 1.9 has not supported continuations for a while. If you want your code to be portable, I'd suggest not using Ruby.
- Ruby was created "because there was no good scripting language that could handle Japanese text". Today it's mostly Rails hype and no outstanding feature, that makes the language, like the brevity of APL or simplicity and macros of Lisp. "There is some truth in the claim that Ruby doesn’t really give us anything that wasn’t there long ago in Lisp and Smalltalk, but they weren’t bad languages." -- Matthew Huntbach

Name: Anonymous 2012-03-05 0:26

>>13 nice wall of text bro.

Name: Anonymous 2012-03-05 0:57

>>10
>>12

Op here.

They are just regex to grab the appropriate image url.

If you have a better way of doing this, I'd be happy to hear it and make the change.

Name: Anonymous 2012-03-05 1:10

>>15
Construct the regex programmatically. Please.

Name: Anonymous 2012-03-05 1:19

>>16

I tried to do this initially, but I couldn't find any way to insert string variables within Regex.
This is why resorted to listing out all the possibilities.

Name: Anonymous 2012-03-05 1:25

>>15
http:\/\/images\.4chan\.org\/.*\/\d+\.[jpg|jpeg|gif|png]

That should handle all the relevant cases. Your regexes are too specific. They only need to be specific enough.

Name: Anonymous 2012-03-05 1:32

>>18

Thanks!
The regex is grabbing beyond the url currently. It looks something like this.

http://images.4chan.org/sp/src/1330928910330.jpg" target="_blank">1330928910.jpg</a>-(11 KB, 198x250, <span title="DAT SMILE.jpg">DAT SMILE.jpg</span>)</span><br><a href="http://images.4chan.org/sp/src/1330928910330.j

Can you make it so that it only grabs

http://images.4chan.org/sp/src/1330928910330.jpg

Name: Anonymous 2012-03-05 1:34

>>15-16
OP clearly has no idea how regexes work. A pattern like /[http]{4}/ will match the strings hhhh and ptth in addition to the intended http. Furthermore he doesn't seem to understand that a dot (.) will match any character that could possibly show up in a URL, not just punctuation.

A sane version would look something like this:
def whichBoard(html_content, url)
  %w(a b c d e f g gif h hr co ic k).each do |boardname|
    host = 'http://images.4chan.org/'
    if url["#{boardname}"]
      regex = %r[#{host}#{boardname}/src/[0-9]{13}\.(jpe?g|png|gif)]
      return html_content.scan(regex)
    end
  end
end # end end end end end end


OP, please learn to use your tools.

Name: Anonymous 2012-03-05 1:35

>>19
Whoops. I just realized it would do that.

http:\/\/images\.4chan\.org\/.*?\/\d+\.[jpg|jpeg|gif|png]

I would help you debug but I'm banned from the image boards.

Name: Anonymous 2012-03-05 1:41

Whoops again.

http:\/\/images\.4chan\.org\/.*?\/\d+\.(jpe?g|gif|png)

Name: Anonymous 2012-03-05 1:42

>>21
Still too general, and wrong. Change .*? to [a-z]+, and change the square brackets at the end to parentheses.

Name: Anonymous 2012-03-05 1:47

>>20
Extract the board name from the original URL provided. There's no need to test them all like that.

Name: Anonymous 2012-03-05 1:48

>>3
[a-z]+
That won't work without a forward slash.
>22 should work.

Name: Anonymous 2012-03-05 1:48

>>20
>>21
Thanks for the help. debugging now

Name: Anonymous 2012-03-05 2:03

OP here.

Thanks for help everyone.
The code looks more concise now.
https://github.com/serv/4chanImageGroper/blob/master/4chan_image_crawler.rb

Do you guys know how I can go about creating a Chrome extension or Firefox extension from Ruby code? Or is it even possible?

Name: Anonymous 2012-03-05 2:08

Chrome extensions are written in Javascript. You should learn that anyway.

Name: Anonymous 2012-03-05 3:18

List of top AIDS programming languages in 2012:
1. Ruby
2. Javascript
3. C++

Name: Anonymous 2012-03-05 3:43

>>29
Fuck off ENTERPRISE ``C'' guy. I bet your code is full of hanging pointers and buffer overflows.

Name: Anonymous 2012-03-05 3:46

>>30
It is. That doesn't invalidate my point though.

Name: Anonymous 2012-03-05 3:48

>>33
GitDubs

Name: Anonymous 2012-03-05 4:03

Fork me on Gitdub.

Name: Anonymous 2012-03-05 8:39

>>33
OH, BABY!

Name: Anonymous 2012-03-05 9:23

Ruby?
Not nginx quality, sorry

Name: Anonymous 2012-03-05 12:41

require 'open-uri'; Dir.mkdir(ARGV[1]) rescue nil
re = %r[(?<=File: <a href=")(http://images.4chan.org/\w+/src/(\d+\.\w+))]
open(ARGV[0]).read.scan(re).each do |url, filename|
  open(File.join(ARGV[1], filename), 'wb').write(open(url, 'rb').read)
end

Name: Anonymous 2012-03-05 13:53

>>36
Writing code in Ruby contributes only to the regression of computer science as an artform. You should have written this in Lisp.

Name: Anonymous 2012-03-05 14:08

>>37
Writing code in Lisp contributes only to the regression of computer science as an artform. You should have written this in D.

Name: Anonymous 2012-03-05 14:12

>>38
Writing code in D contributes only to the regression of computer science as an artform. You should have written this in Symta.

Name: Anonymous 2012-03-05 14:26

>>39
Writing code in Symta contributes only to the regression of computer science as an artform. You should have written this in Pascal.

Name: Anonymous 2012-03-05 14:28

>>40
Writing code in Pascal contributes only to the regression of computer science as an artform. You should have written this in COBOL.

Name: Anonymous 2012-03-05 14:29

>>41
Writing code in COBOL contributes only to the regression of computer science as an artform. You should have written this in FORTRAN.

Name: Anonymous 2012-03-05 14:33

>>42
Writing code in FORTRAN contributes only to the regression of computer science as an artform. You should have written this in TWOTRAN.

Name: Anonymous 2012-03-05 14:34

>>43
But TWOTRAN only has doubles!

Name: Anonymous 2012-03-05 14:35

>>43
 Writing code in TWOTRAN contributes only to the regression of computer science as an artform. You should have written this in Java.

Name: Anonymous 2012-03-05 15:00

>>45
Writing code in Java contributes only to the regression of computer science as an artform. You should have written this in Ruby.

And so the cycle is complete.

Name: Anonymous 2012-03-05 15:01

>>44
Writing code in doubles contributes only to the HOLY FUCKING SHIT DOUBLES NICE JOB BRO 10/10 WOULD READ AGAIN!

Name: Anonymous 2012-03-05 15:03

not a single github was given...

pretty much sums up how on topic /prog/ is

Name: Anonymous 2012-03-05 15:06

>>37
L.I.S.P. doesn't have regexes or http librays so you'd have to parts the URL and the ICP streams yourself. It would be a lot longer than in ruby. Maybe he should write in perl or a shell script instead.
--
Sent from my iPhone

Name: Anonymous 2012-03-05 15:08

>>49
Sage

Back to Reddit, please!

Name: Anonymous 2012-03-05 17:38

You do realise you could have the same effect with a wget one liner right...

Name: Anonymous 2012-03-05 17:46

>>51
Yes but it wouldn't be slow as shit and require downloading an interpreter nobody uses.

Name: Anonymous 2012-03-05 17:52

>>48
Nobody gives a fuck. Code is code, whether you post in in this thread or on gist.

Name: Anonymous 2012-03-05 18:41

>>52
wget is too slow and it only comes preinstalled on Mac and Linux. Screw that.
Ruby is lightning fast and only comes preinstalled on Macs, I'll use that instead !

Name: Anonymous 2012-03-05 18:42

Name: Anonymous 2012-03-05 20:12

>>55
WoW epic /d/ubs bro xD!!!!111one

Don't change these.
Name: Email:
Entire Thread Thread List