Make a HTML parser in regex.
Deadline: before the thread is deleted by the mods
Name:
Anonymous2013-08-17 13:15
THIS THREAD HAS CHANGED TOPIC
New title: /prog/ challenge #4363
Redefine the web completely by replacing HTML with S-exps and Javashit with Scheme.
Deadline: before the report advocate realizes his futile efforts are futile
I'd make a regex to parse html, but unfortunately some people think it's a good idea to write things like <br> instead of <br />, or <img> instead of <img />. Really, these people are the worst.
>>25
XML and HTML are both terrible in different ways. XHTML, at least, had the benefit of removing many of the terrible parts of HTML at only the cost of adding the terrible parts of XML, which were already there for the most part. And at least with XHTML you get the ability to point expat at the thing and not worry about whether someone closed their tags or not.
>>30
Paredit makes S-Expressions work entirely on you favor, the only people who complain are the ones who haven't tried it in earnest.
Now, the only downside I can see is if you for some reason have to read Lisp code in a shitty editor or printed, so you can't actually trust the indentation, or follow it easily, but that's rare.
Name:
Anonymous2013-08-18 20:30
>>30
One point of XML is so humans don't require a software editor to parse the code. If you really like Lisp so much, it won't take you any time to write some Lisp macros that write proper XML and XHTML.
>>29
How is this
(anchor
(list
(item "1")
(item "2")
(item "3")))
any worse than
<anchor>
<list>
<item>1</item>
<item>2</item>
<item>3</item>
</list>
</anchor> [/code]
Do a character count, a line count, try to type both of these manually, and tell me which one is worse. If you have parens so much, you could use S-exprs with angular/curly braces, a specific byte outside of ASCII, SJIS emoticons, little Unicode penises or whatever you want, and they will still be better than the markup languages shat by the W3shit.
Name:
Anonymous2013-08-18 20:49
(anchor
$ list
(item "1")
(item "2")
$ item "3")
Name:
Anonymous2013-08-18 20:49
>>34
Now try writing an Lisp parens example for a front page like Yahoo.
Name:
Anonymous2013-08-18 20:51
>>29
A modification of >>34-san's post: Forced-indentation S-expressions.
anchor
list
item "1"
item "2"
item "3"
which sort of reminds me of YAML, known to be fully parsable both by humans and machines. Not that I'm suggesting YAML or FIS-exprs, but anything will always be better than XML based markup languages.
>>30
XML has never been easy to parse by humans. Try opening an Ant configuration file or any of those TURNKEY SOLUTIONS and tell me if you could read it after pasting it on Notepad or printing it.
Name:
Anonymous2013-08-18 20:55
>>36
Still would turn out to be better than HTML. Did you even read my post? Yes, there would be ))))))))))))), but if you don't like them and you like languages with C-like syntax so much, then you could do
(anchor
(list
(item "1")
(item "2")
(item "3")
)
)
and the ``closing tags'' still take less space than [code]</list></anchor></body></html></document></program></end></eof></really-end-this-time>[code].
Name:
Anonymous2013-08-18 20:57
>>38
If you want to get rid of ))))))))), see >>35
>>39
I don't get it. Why is that $ before the third item? It doesn't seem to open or close anything specific, it seems to be randomly added inside the list.
Name:
Anonymous2013-08-18 21:05
>>40 (a b c $ d e f) expands to (a b c (d e f)). It starts a paren and closes it at the next closing paren.
>>41
Oh, like Haskal's $? Why not, I guess. I personally don't mind using Paredit and having ))))), but that /b/uddy over there isn't convinced yet.
Name:
Anonymous2013-08-18 21:17
how about FIOC webpages
Name:
Anonymous2013-08-18 21:19
>>42
Yeah, that's the origin of the symbol I'm sure. Gauche scheme has it, but you can implement it as a macro in lisp or in r6rs using syntax parse. It might be possible with syntax-rules but I'd rather not try.
>>51
Why would we need attributes, when functions that take arguments can be expressed as S-expressions?
You seem like the type who would go to Israel and bitch about the lack of Russian Buddhist niggers, just because you were told in your third world shithole that they were an essential part of every community.
Name:
Anonymous2013-08-20 19:07
>>52,53
how you're gonna define width and height of elements? or make an <a href="url">click here</a>?
Name:
Anonymous2013-08-20 19:12
>>54
Ignoring the part where you're not supposed to control those manually and use a stylesheet instead (not that I like stylesheets but whatever),
(button "Submit") shows a button with the text Submit (button "Submit" 50 100) shows a button with the text Submit, 50px tall, 100px wide.
Basically, (variadic) functions (or S-exprs, whatever, same shit) with optional arguments.
Name:
Anonymous2013-08-20 20:15
>>55
but then you need to know the order, instead of just knowing the name of the attributes
that's a lot worse
Name:
Anonymous2013-08-20 20:19
>>56
Yeah, you'd have to know the order of the attributes. What about that? What do you do when you have to use sprintf|strcmp in C or atan2 in any other programming language and you forget the order? Bitch about it, or read the reference?
Of course, this calls for the creation of a standard, but if it means getting rid of the </dsfasdfa></asdasdasd>, I don't mind.
>>57
Well not really much to remember for sprintf or strcmp...
Name:
Anonymous2013-08-21 15:10
Ending of google's html in lisp would be:
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
html = "<html><head><title>hello</title></head><body><h1>Hello, world!</h1></body></html>"
html = html.replace(/<([\w\d]+)>(.*)<\/(\1)>/g, "{'$1':{'@text':$2}}");
html = html.replace(/<([\w\d]+)>(.*)<\/(\1)>/g, "{'$1':{'@text':$2}}");
html = html.replace(/<([\w\d]+)>(.*)<\/(\1)>/g, "{'$1':{'@text':$2}}");
html = html.replace(/<([\w\d]+)>(.*)<\/(\1)>/g, "{'$1':{'@text':$2}}");
console.log(html)
>>40
You don't get it because you're a dumb goy. Sexps are designed so that one missing parenthesis fucks everything up. That's why we Jews created them. It's absolute genius. We wanted to give you dumb goyim a headache. We could never let M-expressions become popular. People might actually use Lisp! People might actually create an AI that does a better job at economics than us! Imagine that. ``Your'' countries could have been out of debt years ago. We couldn't let that happen. Now they're our countries. Now you know why we torment you with )))))))))))))). Dumb fucking goy!
Name:
Anonymous2013-08-24 1:53
You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. HTML-plus-regexp will liquify the nerves of the sentient whilst you observe, your psyche withering in the onslaught of horror. Rege̿̔̉x-based HTML parsers are the cancer that is killing StackOverflow it is too late it is too late we cannot be saved the trangession of a chi͡ld ensures regex will consume all living tissue (except for HTML which it cannot, as previously prophesied) dear lord help us how can anyone survive this scourge using regex to parse HTML has doomed humanity to an eternity of dread torture and security holes using regex as a tool to process HTML establishes a breach between this world and the dread realm of c͒ͪo͛ͫrrupt entities (like SGML entities, but more corrupt) a mere glimpse of the world of regex parsers for HTML will instantly transport a programmer's consciousness into a world of ceaseless screaming, he comes, the pestilent slithy regex-infection will devour your HTML parser, application and existence for all time like Visual Basic only worse he comes he comes do not fight he com̡e̶s, ̕h̵is un̨ho͞ly radiańcé destro҉ying all enli̍̈́̂̈́ghtenment, HTML tags lea͠ki̧n͘g fr̶ǫm ̡yo͟ur eye͢s̸ ̛l̕ik͏e liquid pain, the song of re̸gular expression parsing will extinguish the voices of mortal man from the sphere I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful the final snuffing of the lies of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL IS LOST the pon̷y he comes he c̶̮omes he comes the ichor permeates all MY FACE MY FACE ᵒh god no NO NOO̼OO NΘ stop the an*̶͑̾̾̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e not rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
>>68
I can't even keep track of how many layers of sarcasm and trolling there are anymore!
Name:
Anonymous2013-08-24 3:03
>>70
It's kind of true. Would you trust a web developer to give you a page where none of the () are fucked up? Those idiots can't get something this simple right. So we have a data format with <redundancy></redundancy>, so when they fuck up, we can still sort of piece together what they were trying to say.
>>71 Would you trust a web developer to give you a page where none of the () are fucked up?
Yes because that would almost always completely fuck up the page, unlike XML HTML where if you miss a closing tag the browser will just ignore it.
While the JVM backend mostly implements the same features as the Parrot backend, many small IO bits are still missing, rendering some crucial parts like the module installer unsable.
Name:
Anonymous2013-08-24 11:45
>>68 Sexps are designed so that one missing parenthesis fucks everything up
That's how things should be. And you use a good text editor for avoiding that.
Or are you going to tell me C/sepples compilers are forgiving with lack of semicolons or curly braces? No, Ahmed Al-Salamijihad, that's not how it works.
>>71-72
There are many shit Java programmers, and I'm pretty sure their programs compiled. That must mean they didn't skip a single ; or } and your argument is shit. Web developers being retarded doesn't have anything to do with the shittiness and non-determinism of HTML and other steaming turds of markup.
Name:
Anonymous2013-08-24 12:53
>>75
There is a bit of a difference. A java programmer types away at their code and then compiles it. They compile it again and again until the compiler says there's no syntax errors. Then the code stays the same and the source is not referenced again.
Now imagine the same java programmer, but the java programmer is writing a java program that takes queries and generates java code. Ey gets it to compile. Ey tests er program on 10 or 15 samples, investigates the results briefly and calls it done. Now it's on the net, serving queries. 10000 users make 40 requests each, with each request slightly different. 10% of those requests are test cases the java coder didn't check, and malformed java code is returned.
Name:
Anonymous2013-08-24 12:56
Who the fuck is Ey?
Name:
Anonymous2013-08-24 12:59
>>76
And what makes you think malformed Java code should be ``fuzzily compiled''? If it's malformed, it's malformed, period. If Ey-san can't even do that correctly, then he doesn't do it.
>>78
But then 50% of the web wont work, the page will say syntax error and that is that, 10% of the time. Which browser would you use, one that can recover from minor deformations or one that rejects the page entirely. Put aside purity. These dipshits created the page you need to access to take care of your shit.
Name:
Anonymous2013-08-24 13:08
>>81 But then 50% of the web wont work
Good fucking riddance.
>>81
Believe me, if we used strict languages to describe data the and interfaces for representing it, we wouldn't have hipsters, shabbos goyim and Redditards trying to write pages. Instead, the people who actually know their shit would be the ones making the web.
But no, the solution for improving the web is keeping retards from failing, and encouraging to submit their halfassed work to the entire world. (Sorry, that doesn't make sense at all.)
>>83
You rely on CLOUD ENABLED WEB SERVICES for carrying out your daily activities? Poor misguided soul.
Before you ``call out on my retardation'', I host my own mail server and I use the Internet for this board and maybe Danbooru. I read my news on my trusty old printed newspaper, thank you very much.
Name:
Anonymous2013-08-24 13:14
>>86
And if we used strict languages to allow people to post on this board, it would be devoid of activity.
Name:
Anonymous2013-08-24 13:15
>>89
BBCode won't allow the slightest failure, though.
Name:
Anonymous2013-08-24 13:17
>>88
Some people are forced to use crappy websites a part of their job, or insert any other obligation here. And I would like to point out that this very page has malformed html in it. Look at the tag matching in >>1.
>>91
I thought everyone here agreed on the shittiness of Shiichan, though. Now if Shii had done a proper job.
It HTML were as strict as C, this site would probably be free of imageredditors and it would have been born probably one or two months later than its original conceiving date. I fail to see what's the problem with that.
Just because everything is built with dog turds and toothpicks nowadays doesn't mean we couldn't have built the same things with better materials.
Name:
Anonymous2013-08-24 13:54
If you want strictness, use XHTML. You'll get a blank screen until it's completely rendered and nothing if there's an error, but that's what some people want.