/prog/ - OMeta/JS Parser/Generator DSL

Name: Anonymous 2011-08-23 3:12

This is a proof of concept s-expression parser in OMeta/JS I just made in about 10 minutes, produces Javascript nested lists as the resulting AST, although I could easily produce output text in a different language like C or C++. OMeta/JS is a Parsing Expression Grammar DSL which uses Javascript as the host language.



ometa SexprParser {

    start      = sexpr*,

    sexpr      = spaces (atom | list),

    atom       = symbol | number,

    list       = "(" sexpr:h sexpr*:t spaces ")"            -> [h].concat(t)

               | "(" sexpr:e spaces ")"                     -> [e]

               | "(" blank ")"                              -> [],

    symbol     = firstAndRest(`letter, `letterOrDigit):s    -> s.join(''),

    digit      = ^digit:d                                   -> d.digitValue(),

    number     = number:n digit:d                           -> (n * 10 + d)

               | digit

}



$> SexprParser.matchAll("""

(print (add 1 2))

(map xarnicate

    (you me TheSussman)

    (100 200 300))

""", `start)



$> [[print, [add, 1, 2]], [map, xarnicate, [you, me, TheSussman], [100, 200, 300]]]

Name: Anonymous 2011-08-23 3:57

I also forgot to ask, but is there anything mature like this that is native to Common Lisp that can handle arbitrary context-free grammars or parsing expression grammars? I know there's a port of OMeta to CL, but I couldn't get it to work, and the code looks shoddy.

Name: Anonymous 2011-08-23 5:42

>>2
Try ESRAP: https://github.com/nikodemus/esrap

Also, look here: http://www.cliki.net/parser%20generator

Name: Anonymous 2011-08-23 5:54

this looks interesting OP. how big is the generated parser + the ometa/js library? just wondering if ometa/js is practical to use.

Name: Anonymous 2011-08-23 7:05

>>3
Thanks, I'll give it a look.

>>4
The generated parser above ended up only being a few dozen lines of code, it's just imperatively specifying a bunch of parse rules/productions which are used to build the underlying state machine at runtime. It inherits most of its functionality from a base class in OMeta/JS library. The library itself looks to be few thousand lines of Javascript which isn't too bad and you can run it in a web browser. I'm just running it from within a V8 shell on the command line.

The cool thing about packrat parsers/pegs is that they unify the lexer, parser, analyzer/optimizer and generator all under the same mechanism, and its easy to chain multiple DSLs/IM processors together that start at a very high-level and output lower-level code such as C or assembly at the final stage.

You can even implement an interpreter. Notice in my SexprParser example, how I'm evaluating the value of integers using the (n * 10 + d) production--you could extend that technique to evaluate all expressions as they're being parsed.

Apparently, the guy who came up with this language used it in a group project where they reimplemented the million LoC Cairo compositing engine using a high-level DSL, and their DSL stack was written using OMeta which generated x86 assembly listings as output. The top-level program and the code for their DSL tool chain was altogether just several LoC, not including the LoC for OMeta itself.

Look here for more if you're interested: http://tinlizzie.org/ometa/

Name: Anonymous 2011-08-23 7:07

>>5
The top-level program and the code for their DSL tool chain was altogether just several LoC
I meant several hundred LoC.

Name: Anonymous 2011-08-23 17:17

>>5

thanks for the info. i read (well, skimmed) the linked dissertation but it didn't mention things like "spaces", "firstAndRest", etc that you used in your example. did you define those earlier or are those part of the "standard library" and explained somewhere?

Name: Anonymous 2011-08-23 19:23

>>7
They're part of the base class grammar they're just parametric rules that you can override or define your own. You can create your own character classes this way. For example, I think you can define your own versions as:

ometa AGrammar {
    characterRange char:x char:y = char:c ?(x <= c && c <= y) -> c,
    firstAndRest anything:x anything:y = (x y*):r -> r,
    lowerCase = characterRange('a', 'z'):c -> c,
    upperCase = characterRange('A', 'Z'):c -> c,
    digit = characterRange('0', '9'):c -> c,
    letter = (lowerCase | upperCase):c -> c,
    letterOrDigit = (letter | digit):c -> c,
    symbol = firstAndRest(letter, letterOrDigit):s -> s.join('')
}

Name: Anonymous 2011-08-24 6:22

sexp = '

(define (square x) (* x x))

(print

  (sqrt

    (+ (square (- a b))

       (square (- c d))

       (* 500 200))))'

p eval "(#{sexp})".gsub(/[^\s()]+/){|s|(s=~/\d+/?s: ":'#{s}'")+?,}.gsub(/[()]/,?(=>?[,?)=>'],').chop

Name: Anonymous 2011-08-24 16:52

The information from the web site is quite limited. Where can I find better/more resources from OMeta? Seems interresting!

Name: Anonymous 2011-08-24 16:56

Can parse HTML with regaxum?

Name: Anonymous 2011-08-24 19:11

>>9
That only handles s-expressions using functions that already exist, it can't handle different language grammars.

Name: Anonymous 2011-08-24 19:19

>>10
Once you know what Parsing Expression Grammars and Packrat Parsers are, there's not much more to it. You can consult Wikipedia for that. After that, it's just trial and error figuring out the syntax.

I've also switched to using the FIOC version of OMeta, PyMeta2, because I remembered FIOC has some nice templating engines, namely Jinja2 which I can use for back-end code generation in tandem with OMeta for AST parsing, analysis and optimization.

OMeta/JS Parser/Generator DSL

1 Name: Anonymous 2011-08-23 3:12

2 Name: Anonymous 2011-08-23 3:57

3 Name: Anonymous 2011-08-23 5:42

4 Name: Anonymous 2011-08-23 5:54

5 Name: Anonymous 2011-08-23 7:05

6 Name: Anonymous 2011-08-23 7:07

7 Name: Anonymous 2011-08-23 17:17

8 Name: Anonymous 2011-08-23 19:23

9 Name: Anonymous 2011-08-24 6:22

10 Name: Anonymous 2011-08-24 16:52

11 Name: Anonymous 2011-08-24 16:56

12 Name: Anonymous 2011-08-24 19:11

13 Name: Anonymous 2011-08-24 19:19