I have a pretty good educational understanding of computer grammar theory, so I get the concepts behind parsers and context-free grammars, but I've never actually implemented a parser. And now I need to write one.
Specifically, I'm looking to write a loader for .md5mesh files (they are text format). Let's say I'm able to get a grammar for it written on paper that matches that file format's general structure. Where do I start in code? This is fortunately not as complicated as a programming language's language, since it's a format that's about data and not code, but it's still a somewhat more complicated 3D format than, say, the Alias Wavefront .obj format (where reading and parsing is as simple as a for loop and sscanf -- but that format doesn't support skeletal animation and other things the way .md5mesh does).
Name:
Anonymous2006-10-18 19:32
Are you sure there aren't already a few parsers written for that format that you could use?
Flex/Bison is pretty nice. Certainly beats doing it by hand, unless you're using a functional language.
Name:
Anonymous2006-10-18 20:15
If you're using Perl, Python, Ruby, PHP or JavaScript, you may want to consider simple regexes inside simple loops.
Name:
Anonymous2006-10-18 20:22
in shitty languages like: Perl, Python, Ruby, PHP or JavaScript regexes run faster than iterating per character.
Name:
Anonymous2006-10-18 20:33
Simple regexes in simple loops?
Trust me, using Bison is both easier and more fullproof than that.
Besides, what do you think Flex is?
Name:
Anonymous2006-10-18 20:41
>>6
A regex-based lexer. Only it produces C code, and uses fast but crappy deterministic finite automata regexes. Perl-compatible regular expressions are far more powerful, supporting backtracking, ungreedy operators, backreferences, conditional matching, positive and negative lookaheads and lookbehinds in any point, and recursivity. Plus, in these languages, you receive captured pattern as lists, which are often exactly what you wanted.
When you have this, you don't need a LALR parser for most languages, as you can get the job done pretty easily with very simple loops of regexes, doing far more than flex and in far fewer lines of code.
Name:
Anonymous2006-10-18 20:57
From a quick Google this looks like a line by line format without expressions or anything complicated, so parsing it should be trivial...
Name:
Anonymous2006-10-19 0:12
>>7
In case you didn't notice the implication in >>6, I've used both approaches. I'm sticking to bison and flex, kthnx. It's nowhere near as error-prone as using regex expressions.
Yeah, sure, it takes more lines... readable and easily provable lines, okay? It's little more than a direct translation of BNF.
There is no contest here. Take Perl's dick out your ass, seriously.
I've seen some people hand-write parsers. I'm guessing that is for the "special" programmers and most of us have to get by on and by happy with bison/flex?
I never figured out how to use bison and flex, though. Their documentation is shitty and written for people who already know how to use it. lol GNU culture.
Good no-nonsense intro to using flex and bison. It doesn't have a lot, but I found it short enough to get me off the ground without having to pile-drive through a mountain of theory.
Name:
Anonymous2006-10-19 7:26
>>9
Regexes become understandable and harder to fail at once you've gotten used to how the backtracking automata works. They have the advantage of being almost universally available and extremely productive. They are several order of magnitudes more comfortable and faster to use when you're doing Python, Ruby, or similar.
BTW, I'm not a Perl guy, and I don't like Perl's $%/%&(%//#<>+ syntax for *EVERYTHING*. But I love PCRE. It's a matter of using the right tool for every job.
Name:
Anonymous2006-10-19 8:55
>>15
Have you ever written a parser for anything non-trivial in BNF? I'm assuming you've done it several times with regex.
This isn't a question of power, it's a question of readability. I have doubts about anyone who thinks a modern "regex" is anywhere in the same league as bachus-naur form. Parsing with regular expressions is plain ugly, and I have yet to see an example otherwise.
But, feel free to enlighten.
Name:
Anonymous2006-10-19 9:10
Regular expressions are not Turing complete, dipshits. They're not powerful enough to describe a context sensitive language. Go learn automata theory.
by doing some structural pre-parsing you can reduce the parsing time just by reducing the search space so heavily.
Name:
Anonymous2006-10-19 14:03
strtok() and GTFO
Name:
Anonymous2006-10-19 18:40
>>17
a) Most languages are context free, not context sensitive.
b) >>18 is right. Regular expressions in the real world have evolved far beyond their rather limited formal language definition.
>>1 failed to specify a language, so I'm going to be an arsehole and suggest Haskell and a common parser combinator library for it known as Parsec. Having written a parser for C in less than 400 lines, I can with confidence say that it totally fucking rocks.
Name:
Anonymous2006-10-28 15:30
>>21
a) (i) Bullshit.
(ii) Regular expressions can't parse context free languages either.
b) Then regular expressions in the "real world" (aka: Perl) aren't. They're something else that some insane Christian linguist decided to misname as regular expressions.
I just wrote a parser for C#. You can specify the grammar in almost EBNF. I suppose it looks like Spirit.
Name:
Anonymous2006-10-28 18:36
>>16
I agree that it's less clear or obvious, but you can (and should) use the extended syntax where you can use whitespace liberally, then indent your regex properly, align |s, etc. I've had to modify such regexes and, if you are used to the syntax and don't let the /&$()=()/$&? look scare you, it can be done reasonably well.
Parsing with PCRE (assuming your language has a decent interface to them) has the advantage of being ridiculously simple and quick to write; you just can't do the same with a traditional lexer and a parser generator even if you typed everything as fast as you can without stopping one second to think.
>>17
Get your head out of your university teacher's ass and discover that what we call "regular expressions" are no longer regular expressions. They feature recursion, conditional matching, and backreferences and backward and forward lookaheads for context. I don't have a proof that they are Turing complete and don't need any, because when I want to use them for something that could get hard to write in a single regex, I write a 3 lines loop around a regular expression (or nest regular expressions using a 3 lines loop).
And BTW, we call them regexes because they started as such. But to avoid confusion, you can say "PCRE" (Perl-Compatible Regular Expressions, also a famous C library).
Name:
Anonymous2006-10-28 18:39
>>27
Forgot to say, the extended syntax also supports comments. You can do something like:
>>25
Wake up and smell the coffee. Regular expressions in computation theory and regular expressions used by scripting languages are two different thing. They look similar on the surface, and one evolved from the other, they share the same name, but they're different.
Saying that perl regular expressions aren't regular expressions when the whole fucking world names them that is being a pendantic ass. We know that already. Care to show off more?
In case it wasn't bloody stinking obvious, we're talking about the perl-type regex here.
Name:
Anonymous2006-10-28 21:43
Incidentally, in Perl 6 they've purged all reference to "regular expression" and only call them regexes now.
regexes != regular expressions ;)
Name:
Anonymous2006-10-28 23:22
Are they going to come out with Perl6 this decade? Pugs is nice an all, but...
Bringing /prog/ back to its people
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy