Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

/Prog/ challenge #4362

Name: Anonymous 2013-08-17 13:12

Make a HTML parser in regex.
Deadline: before the thread is deleted by the mods

Name: Anonymous 2013-08-18 21:05

>>40
(a b c $ d e f) expands to (a b c (d e f)). It starts a paren and closes it at the next closing paren.

Name: Anonymous 2013-08-18 21:11

>>41
Oh, like Haskal's $? Why not, I guess. I personally don't mind using Paredit and having ))))), but that /b/uddy over there isn't convinced yet.

Name: Anonymous 2013-08-18 21:17

how about FIOC webpages

Name: Anonymous 2013-08-18 21:19

>>42
Yeah, that's the origin of the symbol I'm sure. Gauche scheme has it, but you can implement it as a macro in lisp or in r6rs using syntax parse. It might be possible with syntax-rules but I'd rather not try.

http://practical-scheme.net/gauche/index.html

Name: Anonymous 2013-08-18 21:22

>>43
too much white space. That would be ok for editing, or for the page preprocessor language, but not for sending to the client.

Name: Anonymous 2013-08-18 21:24

>>43
Sexp can get autoindented nicely.

Name: Anonymous 2013-08-20 14:22

Can someone do an example of a real site rather than shitty ``Hello World!'' sites?

Name: Anonymous 2013-08-20 15:06

The answer is JSON

Name: Anonymous 2013-08-20 15:39

Anus-to-Anus JavaScript

Name: Anonymous 2013-08-20 17:32

>>47
<html>
<body>
<p>Hello World!</p>
</body>
</html>

Name: Anonymous 2013-08-20 17:39

>>34
what about attributes

Name: Anonymous 2013-08-20 17:41

>>51
Who needs 'em?

Name: Anonymous 2013-08-20 17:44

>>51
Why would we need attributes, when functions that take arguments can be expressed as S-expressions?

You seem like the type who would go to Israel and bitch about the lack of Russian Buddhist niggers, just because you were told in your third world shithole that they were an essential part of every community.

Name: Anonymous 2013-08-20 19:07

>>52,53
how you're gonna define width and height of elements? or make an <a href="url">click here</a>?

Name: Anonymous 2013-08-20 19:12

>>54
Ignoring the part where you're not supposed to control those manually and use a stylesheet instead (not that I like stylesheets but whatever),

(button "Submit") shows a button with the text Submit
(button "Submit" 50 100) shows a button with the text Submit, 50px tall, 100px wide.

For links, (link "http://example.com" "Example text") or just (link "http://example.com").

Basically, (variadic) functions (or S-exprs, whatever, same shit) with optional arguments.

Name: Anonymous 2013-08-20 20:15

>>55
but then you need to know the order, instead of just knowing the name of the attributes
that's a lot worse

Name: Anonymous 2013-08-20 20:19

>>56
Yeah, you'd have to know the order of the attributes. What about that? What do you do when you have to use sprintf|strcmp in C or atan2 in any other programming language and you forget the order? Bitch about it, or read the reference?

Of course, this calls for the creation of a standard, but if it means getting rid of the </dsfasdfa></asdasdasd>, I don't mind.

Name: Anonymous 2013-08-20 21:52

>>57
Well not really much to remember for sprintf or strcmp...

Name: Anonymous 2013-08-21 15:10

Ending of google's html in lisp would be:
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))

Name: Anonymous 2013-08-21 15:24

>>58
Not really much to remember for web S-exprs either. Stop bitching.

Name: Anonymous 2013-08-21 15:31

>>59
</div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div>

Name: Anonymous 2013-08-21 18:47

quads

Name: Anonymous 2013-08-24 0:12

>>61
No

Name: Anonymous 2013-08-24 0:32

>>63
Yes

Name: Anonymous 2013-08-24 0:33

>>63
What the fuck are you even saying?

Name: Anonymous 2013-08-24 0:42

html = "<html><head><title>hello</title></head><body><h1>Hello, world!</h1></body></html>"

html = html.replace(/<([\w\d]+)>(.*)<\/(\1)>/g, "{'$1':{'@text':$2}}");
html = html.replace(/<([\w\d]+)>(.*)<\/(\1)>/g, "{'$1':{'@text':$2}}");
html = html.replace(/<([\w\d]+)>(.*)<\/(\1)>/g, "{'$1':{'@text':$2}}");
html = html.replace(/<([\w\d]+)>(.*)<\/(\1)>/g, "{'$1':{'@text':$2}}");
console.log(html)


C:\Users\Regis\basingao>node htmlparser.js
{'html':{'@text':{'head':{'@text':{'title':{'@text':hello}}}}{'body':{'@text':{'h1':{'@text':Hello, world!}}}}}}

Name: Anonymous 2013-08-24 0:53

>>59

Which would still be readable.

Name: Anonymous 2013-08-24 1:28

>>40
You don't get it because you're a dumb goy. Sexps are designed so that one missing parenthesis fucks everything up. That's why we Jews created them. It's absolute genius. We wanted to give you dumb goyim a headache. We could never let M-expressions become popular. People might actually use Lisp! People might actually create an AI that does a better job at economics than us! Imagine that. ``Your'' countries could have been out of debt years ago. We couldn't let that happen. Now they're our countries. Now you know why we torment you with )))))))))))))). Dumb fucking goy!

Name: Anonymous 2013-08-24 1:53

You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. HTML-plus-regexp will liquify the n​erves of the sentient whilst you observe, your psyche withering in the onslaught of horror. Rege̿̔̉x-based HTML parsers are the cancer that is killing StackOverflow it is too late it is too late we cannot be saved the trangession of a chi͡ld ensures regex will consume all living tissue (except for HTML which it cannot, as previously prophesied) dear lord help us how can anyone survive this scourge using regex to parse HTML has doomed humanity to an eternity of dread torture and security holes using regex as a tool to process HTML establishes a breach between this world and the dread realm of c͒ͪo͛ͫrrupt entities (like SGML entities, but more corrupt) a mere glimpse of the world of reg​ex parsers for HTML will ins​tantly transport a programmer's consciousness into a world of ceaseless screaming, he comes, the pestilent slithy regex-infection wil​l devour your HT​ML parser, application and existence for all time like Visual Basic only worse he comes he comes do not fi​ght he com̡e̶s, ̕h̵i​s un̨ho͞ly radiańcé destro҉ying all enli̍̈́̂̈́ghtenment, HTML tags lea͠ki̧n͘g fr̶ǫm ̡yo​͟ur eye͢s̸ ̛l̕ik͏e liq​uid pain, the song of re̸gular exp​ression parsing will exti​nguish the voices of mor​tal man from the sp​here I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful t​he final snuffing of the lie​s of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST the pon̷y he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ

Name: Anonymous 2013-08-24 1:59

>>68
I can't even keep track of how many layers of sarcasm and trolling there are anymore!

Name: Anonymous 2013-08-24 3:03

>>70
It's kind of true. Would you trust a web developer to give you a page where none of the () are fucked up? Those idiots can't get something this simple right. So we have a data format with <redundancy></redundancy>, so when they fuck up, we can still sort of piece together what they were trying to say.

Name: Anonymous 2013-08-24 3:10

>>71
Would you trust a web developer to give you a page where none of the () are fucked up?
Yes because that would almost always completely fuck up the page, unlike XML HTML where if you miss a closing tag the browser will just ignore it.

Name: Anonymous 2013-08-24 10:24

what do you guys think about Jade?
http://jade-lang.com/
it's whitespace sensitive

Name: Anonymous 2013-08-24 11:26

While the JVM backend mostly implements the same features as the Parrot backend, many small IO bits are still missing, rendering some crucial parts like the module installer unsable.

Name: Anonymous 2013-08-24 11:45

>>68
Sexps are designed so that one missing parenthesis fucks everything up
That's how things should be. And you use a good text editor for avoiding that.

Or are you going to tell me C/sepples compilers are forgiving with lack of semicolons or curly braces? No, Ahmed Al-Salamijihad, that's not how it works.

>>71-72
There are many shit Java programmers, and I'm pretty sure their programs compiled. That must mean they didn't skip a single ; or } and your argument is shit. Web developers being retarded doesn't have anything to do with the shittiness and non-determinism of HTML and other steaming turds of markup.

Name: Anonymous 2013-08-24 12:53

>>75
There is a bit of a difference. A java programmer types away at their code and then compiles it. They compile it again and again until the compiler says there's no syntax errors. Then the code stays the same and the source is not referenced again.

Now imagine the same java programmer, but the java programmer is writing a java program that takes queries and generates java code. Ey gets it to compile. Ey tests er program on 10 or 15 samples, investigates the results briefly and calls it done. Now it's on the net, serving queries. 10000 users make 40 requests each, with each request slightly different. 10% of those requests are test cases the java coder didn't check, and malformed java code is returned.

Name: Anonymous 2013-08-24 12:56

Who the fuck is Ey?

Name: Anonymous 2013-08-24 12:59

>>76
And what makes you think malformed Java code should be ``fuzzily compiled''? If it's malformed, it's malformed, period. If Ey-san can't even do that correctly, then he doesn't do it.

>>77
Probably some guy from Vietnam.

Name: Anonymous 2013-08-24 12:59

spammer, please continue.

Name: Anonymous 2013-08-24 13:01

>>77
Check your privilege, cisgender scum.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List