Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Help with unambiguous grammar

Name: long long 2012-01-13 22:59

I'm creating yet another fail C-family compiler.
Objectives: simple language, generics, type inference, and unambiguous grammar.
I would like to hear your opinions, any help is very appreciated.

bison parser:
%union {
    char *str;
}

%right '=' "+=" "-=" "*=" "/=" "%=" "~=" "&=" "|=" "^=" "<<=" ">>="
%left ':'
%right '?'
%left '|'
%left '^'
%left '&'
%left "or"
%left "and"
%left "==" "!="
%left '<' "<=" '>' ">="
%left "<<" ">>"
%left '+' '-'
%left '*' '/' '%'
%right "not" '~' PREFIX
%left "++" "--"
%left '['
%left '.'
%right '$'

%token <str> ID INT HEX BIN DOUBLE FLOAT COMPLEX STRING CHAR
%token-table

%%

program:
       | program import
       | program decl
       ;

import: "import" module
      | "import" module "as" identifier
      ;

module: identifier
      | module '.' identifier
      ;

decl: declvar
    | declfunc
    | declstruct
    ;

declvar: "var" identifier
       | "var" identifier '=' expr
       | "def" identifier '=' expr
       ;

declstruct: "def" identifier '{' declattrs '}'
          ;

declattrs: declattr
         | declattrs declattr
         ;

declattr: "var" identifier
        | declfunc
        ;

declfunc: "def" identifier '(' declargs ')' block
        ;

declargs: declarg
        | declargs ',' declarg
        ;

declarg: identifier
       | "var" identifier
       | identifier identifier
       ;

block: '{' '}'
     | '{' stmts '}'
     ;

stmts: stmt
     | block
     | stmts stmt
     | stmts block
     ;

stmt: declvar
    | purestmt
    | identifier ':' purestmt
    ;

purestmt: assign
        | call
        | if
        | for
        | while
        | switch
        | "goto" identifier
        | "return" expr
        ;

assign: lvalue '=' expr
      | lvalue "+=" expr
      | lvalue "-=" expr
      | lvalue "*=" expr
      | lvalue "/=" expr
      | lvalue "%=" expr
      | lvalue "~=" expr
      | lvalue "&=" expr
      | lvalue "|=" expr
      | lvalue "^=" expr
      | lvalue "<<=" expr
      | lvalue ">>=" expr
      | lvalue "++"
      | lvalue "--"
//      | "++" lvalue %prec PREFIX
//      | "--" lvalue %prec PREFIX
      ;

lvalue: identifier
      | lvalue '.' identifier
      | lvalue '[' expr ']'
      | '$' lvalue
      ;

identifier: ID
          ;

expr: literal
    | lvalue
    | assign
    | call
    | '~' expr
    | '@' lvalue
    | '-' expr %prec PREFIX
    | expr '+' expr
    | expr '-' expr
    | expr '*' expr
    | expr '/' expr
    | expr '%' expr
    | expr '&' expr
    | expr '|' expr
    | expr '^' expr
    | "not" expr
    | expr "==" expr
    | expr "!=" expr
    | expr '<' expr
    | expr "<=" expr
    | expr '>' expr
    | expr ">=" expr
    | expr "<<" expr
    | expr ">>" expr
    | expr "and" expr
    | expr "or" expr
    | expr '?' expr ':' expr
    | "proc" '(' declargs ')' block
    | '(' expr ')'
    ;

literal: INT
       | HEX
       | BIN
       | FLOAT
       | DOUBLE
       | COMPLEX
       | STRING
       | CHAR
       ;

call: onecall
    | call '.' onecall
    ;

onecall: lvalue '(' args ')'
       ;

args:
    | expr
    | args ',' expr
    ;

if: "if" '(' expr ')' block
  | "if" '(' expr ')' block "else" block
  ;

for : "for" '(' forarg "in" foriter ')' loop
    | "for" '(' forinit ';' expr ';' forincr ')' loop
    ;

forarg: identifier
      | "var" identifier
      ;

forinit: forinitarg
       | forinit ',' forinitarg
       ;

forinitarg: assign
          | "var" identifier '=' expr
          ;

forincr: assign
       | forincr ',' assign
       ;

foriter: lvalue
       | call
       ;

loop: ';'
    | loopblock
    ;

loopblock: '{' '}'
         | '{' loopstmts '}'
         ;

loopstmts: loopstmt
         | loopstmts loopstmt
         ;

loopstmt: "break"
        | "continue"
        | loopblock
        | stmt
        ;

while: "while" '(' expr ')' loop
     | "do" loopblock "while" '(' expr ')'
     ;

switch: "switch" '(' expr ')' '{' cases '}'
      ;

cases: allcases "default" ':' stmts
     ;

allcases: "case" expr ':' casestmts
        | allcases "case" expr ':' casestmts
        ;

casestmts: casestmt
         | casestmts casestmt
         ;

casestmt: "break"
        | caseblock
        | stmt
        ;

caseblock: '{' '}'
         | '{' casestmts '}'
         ;

Name: Anonymous 2012-01-14 13:30

Ok, I added C-like array initializers (var x : list[int] = {1,2,3}; var g : Game = {"loser", 0}) and pointer types (var y : ptr[int]) to the grammar.

But for now I'll take some time (maybe one week?) to mature ideas before writing the AST to LLVM backend.

Also guys, thanks for your time and help.

Name: Anonymous 2012-01-14 15:13

>>41
Didn't you have type inference?

What about var x = {1,2,3}?

Name: Anonymous 2012-01-14 15:35

>>42
Yes, it's just an bad example :P
Here, a parseable n-body code (translated from http://shootout.alioth.debian.org/u32/program.php?test=nbody&lang=gcc&id=1):
let PI = 3.141592653589793
let SOLAR_MASS = 4 * PI * PI
let DAYS_PER_YEAR = 365.24

def main(args : array[string]) {
    let n = parseInt(args[1])
    var system = createSystem()
   
    printf("%.9f\n", energy(system))
   
    for (var i = 0; i < n; i++) {
        advance(system, 0.01)
    }
   
    printf("%.9f\n", energy(system))
}

def createSystem() {
    var sun : Body = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, SOLAR_MASS }
    var jupiter : Body = {
         4.84143144246472090e+00,
        -1.16032004402742839e+00,
        -1.03622044471123109e-01,
         1.66007664274403694e-03 * DAYS_PER_YEAR,
         7.69901118419740425e-03 * DAYS_PER_YEAR,
        -6.90460016972063023e-05 * DAYS_PER_YEAR,
         9.54791938424326609e-04 * SOLAR_MASS
    }
    var saturn : Body = {
         8.34336671824457987e+00,
         4.12479856412430479e+00,
        -4.03523417114321381e-01,
        -2.76742510726862411e-03 * DAYS_PER_YEAR,
         4.99852801234917238e-03 * DAYS_PER_YEAR,
         2.30417297573763929e-05 * DAYS_PER_YEAR,
         2.85885980666130812e-04 * SOLAR_MASS
    }
    var uranus : Body = {
         1.28943695621391310e+01,
        -1.51111514016986312e+01,
        -2.23307578892655734e-01,
         2.96460137564761618e-03 * DAYS_PER_YEAR,
         2.37847173959480950e-03 * DAYS_PER_YEAR,
        -2.96589568540237556e-05 * DAYS_PER_YEAR,
         4.36624404335156298e-05 * SOLAR_MASS
    }
    var neptune : Body = {
         1.53796971148509165e+01,
        -2.59193146099879641e+01,
         1.79258772950371181e-01,
         2.68067772490389322e-03 * DAYS_PER_YEAR,
         1.62824170038242295e-03 * DAYS_PER_YEAR,
        -9.51592254519715870e-05 * DAYS_PER_YEAR,
         5.15138902046611451e-05 * SOLAR_MASS
    }
    var system = { sun, jupiter, saturn, uranus, neptune }
    var px = 0.0
    var py = 0.0
    var pz = 0.0
    for (var body in system) {
        px += body.vx * body.mass
        py += body.vy * body.mass
        pz += body.vz * body.mass
    }
    offsetMomentum(sun, px, py, pz)
    return system
}

struct Body {
    x  : double
    y  : double
    z  : double
    vx : double
    vy : double
    vz : double
    mass : double
}

def offsetMomentum(p, x, y, z) {
    p.vx = -x / SOLAR_MASS
    p.vy = -y / SOLAR_MASS
    p.vz = -z / SOLAR_MASS
}

def advance(system, dt) {
    for (a in system) {
        for (b in system) {
            let dx = a.x - b.x
            let dy = a.y - b.y
            let dz = a.z - b.z

            let dSquared = dx * dx + dy * dy + dz * dz
            let distance = sqrt(dSquared)
            let mag = dt / (dSquared * distance)

            a.vx -= dx * b.mass * mag
            a.vy -= dy * b.mass * mag
            a.vz -= dz * b.mass * mag

            b.vx += dx * a.mass * mag
            b.vy += dy * a.mass * mag
            b.vz += dz * a.mass * mag
        }
    }
    for (body in system) {
        body.x += dt * body.vx
        body.y += dt * body.vy
        body.z += dt * body.vz
    }
}

def energy(system) {
    var e = 0.0
    for (a in system) {
        e += 0.5 * a.mass
                 * ( a.vx * a.vx
                   + a.vy * a.vy
                   + a.vz * a.vz )
        for (b in system) {
            let dx = body.x - b.x
            let dy = body.y - b.y
            let dz = body.z - b.z
            let distance = sqrt(dx*dx + dy*dy + dz*dz)
            e -= (a.mass * b.mass) / distance
        }
    }
    return e
}

Name: Anonymous 2012-01-14 15:40

>>43 Some remarks:

I miss sexprs here T-T:
        e += 0.5 * a.mass
                 * ( a.vx * a.vx
                   + a.vy * a.vy
                   + a.vz * a.vz )


C programmers will ask for multiple variable declarations, if they have the same type. I thought something like this.
struct Body {
    x, y, z, vx, vy, vz, mass : double
}

Name: Anonymous 2012-01-14 16:05

Does "proc" work like this?
def outer() {
     return proc (x) {
          return x*2
     }
}
outer()(10)

Name: Anonymous 2012-01-14 16:17

>>43
Not enough π decimals! 3.1415926535897932384626433

It looks good, and now I see why * isn't the dereference operator, you can drop the semicolons and make a\n*b mean a*b.

Name: long long 2012-01-14 17:04

>>45
Yes, but I was thinking about currying:
def mul(x) {
    return proc (y) { return x*y }
}
def main() {
    printf("%d\n", mul(2)(4))
}

8
But I didn't think about name binding yet.

>>46 :P
Also, I was seriously thinking in overloading the . operator to derefence pointers when accessing members, and turning $ to:
lvalue: '$' identifier
       | '$' '(' lvalue ')'

and these expressions would be equivalent:
$a*a
a.b.ca->b->c
$(a.b.c)*(a->b->c)

The verbose alternative can show what names are pointers in expressions like $a.b.$c.d, but I don't think this is a really great advantage, because if someone changes a member to a pointer, then we need to update all the code accessing it.

Name: Anonymous 2012-01-14 22:32

2012
not using off-side rule

Name: Anonymous 2012-01-14 22:51

>>48
If you're talking about indentation-based statement grouping, it has some implications that aren't good, at least for language tools.
Since often there isn't enough visual clue to distinguish different amount or types of whitespaces, I don't mind to ignore them entirely on the lexer.

Name: Anonymous 2012-01-14 23:04

>>48
FIOC is shit

Name: long long 2012-01-14 23:41

Another valid program, mandelbrot, gave [or reassured] me some ideas:
* C-compatible ABI (a must)
* importing C headers direcly (it would be good)
* implicit conversion between pointer types / void pointer?
* cpp-like preprocessor, only for #pragmas?
* optional non-qualified imported identifiers if there aren't conflicts?
* OS/arch dependent modules, like the std.sse below?
* modules could specify proper compilation flags, like
        std.sse specifying -mfpmath=sse -msse
        openmp specifying -openmp
        gl specifying -lgl
     which could also ease library dependency chains.

import openmp
import std.sse as sse
import posix
import stdlib

//
// Translated from http://shootout.alioth.debian.org/u32/program.php?test=mandelbrot&lang=gcc&id=4
//

let zero : sse.v2df = { 0.0, 0.0 }
let four : sse.v2df = { 4.0, 4.0 }

var bytes_per_row : int
var inverse_w : double
var inverse_h : double
var N : int

var Crvs : ptr[sse.v2df]
var bitmap : ptr[byte]

def main(args : array[string]) {
    N = atoi(args[1])
   
    bytes_per_row = (N + 7) >> 3

    inverse_w = 2.0 / (bytes_per_row << 3)
    inverse_h = 2.0 / N
   
    if (posix_memalign(@Crvs, sizeof(sse.v2df), sizeof(sse.v2df) * N / 2)) {
        return EXIT_FAILURE
    }

    #omp parallel for
    for (var i = 0; i < N; i += 2) {
        var Crv : sse.v2df = { (i+1.0)*inverse_w-1.5, (i)*inverse_w-1.5 }
        Crvs[i >> 1] = Crv
    }

    bitmap = calloc(bytes_per_row, N)
    if (bitmap == NULL) {
        return EXIT_FAILURE
    }

    #omp parallel for schedule(static,1)
    for (var i = 0; i < N; i++) {
        calc_row(i)
    }

    printf("P4\n%d %d\n", N, N)
    fwrite(bitmap, bytes_per_row, N, stdout)

    free(bitmap)
    free(Crvs)
    return EXIT_SUCCESS
}

def calc_row(y) {
    let row_bitmap = bitmap + (bytes_per_row * y)
    let Civ_init : sse.v2df = { y*inverse_h-1.0, y*inverse_h-1.0 }

    for (var x = 0; x < N; x += 2)
    {
        var Crv = Crvs[x >> 1]
        var Civ = Civ_init
        var Zrv = zero
        var Ziv = zero
        var Trv = zero
        var Tiv = zero
        var i = 50
        var two_pixels : int
        var is_still_bounded : sse.v2df

        do {
            Ziv = (Zrv*Ziv) + (Zrv*Ziv) + Civ
            Zrv = Trv - Tiv + Crv
            Trv = Zrv * Zrv
            Tiv = Ziv * Ziv

            is_still_bounded = sse.cmplepd(Trv + Tiv, four)

            two_pixels = sse.movmskpd(is_still_bounded)
           
            i--
        } while (i > 0 and two_pixels)

        two_pixels <<= 6

        let b : byte = two_pixels >> (x & 7)
        row_bitmap[x >> 3] |= b
    }
}

Name: Anonymous 2012-01-15 7:31

>>47
How about just '$' lvalue? Also, I think $a.b.$c.d is better.

>>51
* implicit conversion between pointer types / void pointer?
With parametric polymorphism, you can just return a polymorphic pointer:
malloc : (int) -> ptr[α]
The type α can be inferred from context and use.
Now your code is type-safe, congratulations.
* cpp-like preprocessor, only for #pragmas?
Pragma comments like Haskell.

Name: Anonymous 2012-01-15 8:49

>>51
This is a fucking nightmare, you're butchering this more and more, so every object file has to carry what it should be linked with? How else is the linker supposed to know what link with when you decide to merge all the object files into a shared object or something of that sort?

Name: Anonymous 2012-01-15 11:26

>>52
You can write a function and pass pointers or references to it:
def getm(obj) { return obj.m }
var s = {1,2,3}
getm(s)  // getm(s :     array[int] ) → int
getm(@s) // getm(s : ptr[array[int]]) → int


>>51
I was thinking in compiling to some sort of intermediate representation [just one step back from the llvm-as], that could be portably executed in a VM [for easy distribution], and also natively compiled [for -march=native -mtune=native speed].
This representation also will help distributing libraries with generic functions/structs, whose types will be inferred at compile time, generating instances. Just include a simple “make” in the compiler: you can pass source and output directories, and the source tree is recursively searched for source files, which will be compiled if its or any of its imports timestamp [recursively and cached] are newer than the matching object file in the output directory. Then we can remember the template occurrences with same type parameters in the source tree and compile each just once.

I did this for my C++ projects, 200 lines of python [except the template caching part] and it runs very nice, I can even parallelize all sources until all memory is consumed by the compiler.

$ cd ~/foo
$ make
./make.py -ssrc -obuild -j3 gcc debug
compiling src/a.cc     to build/gcc-debug/a.o
compiling src/b.cc     to build/gcc-debug/b.o
compiling src/bar/b.cc to build/gcc-debug/bar_b.o
(waiting sources compilation)
linking build/*.o      to build/gcc-debug/foo.exe


Without breaking a sweat. Compiler flags [for platform dependent CFLAGS/LDFLAGS] are put in a separate make.ini file, under different sections ([default] [win32] [linux64]).

Name: long long 2012-01-15 11:42

>>53 Sorry for mistyping, the half bottom of >>54 is for you.

Name: Anonymous 2012-01-15 16:14

>>55
What you posted has nothing to do with the issue in >>53 but okay.

Name: Anonymous 2012-01-15 16:24

>>56
Oh I see.
Maybe with some quirky #pragmas?

// module gl
#cflags.linux -lGL
#cflags.win32 -lopengl32

def glRotatef(a, x, y, z) {
·
·
·

Name: Anonymous 2012-01-15 17:17

>>57
Seems like a bad idea.

Name: OP 2012-01-26 3:49

>>59
Using for a week now, I think I'll drop the C curly braces, and switch to Algol's keyword .. end blocks. Thoughts?

Name: Anonymous 2012-01-26 4:04

>>59

whatever you'd like. Some people say end block are more readable, and other people find curlys to have a more definite structure, both for their eyes and for their text editors.

Name: Anonymous 2012-01-26 4:26

>>60
I don't know... I think that curlies could be used for something more useful, like another data structure literal (hashes?).

--
Updates:
Maybe type initializers and conversions looks better as “function calls”:
var car = Vehicle(1, 2, "Mazda")
let x = byte(car.speed & 0b11110000)


Also, I don't know if it's better to use C++'s vtables or Clay's variant types. I designed something around the second, but there's some flaws here and there yet...

Name: Anonymous 2012-01-26 6:54

>>59
Go with forced indentation, chicks dig it.

Name: Anonymous 2012-01-26 20:22

oh god, another sepples. kill it before it reproduces.

Name: Anonymous 2012-01-26 20:37

>>63
You now know how Israeli snipers feel when they see a Palestinian.

Name: Anonymous 2012-01-26 20:40

>>66
nice dubs bro

Name: Anonymous 2012-01-26 20:41

>>65
thanks

Name: Anonymous 2012-01-26 20:43

For every natural number greater than or equal to 2 there exists at least one base in which the number constitutes ``dubs''.

Name: Anonymous 2012-02-03 11:42

What's up?

How about this?

var n:int
scanf("%d", @n)
puts("[output]")
if n
is 0,1: puts("")
is 2: printf("a")
      continue
is 3: puts("b")
else: puts("c")
end


giving the result below?

$ cat | ./a.out
0
1
2
3
4^D
[output]


ab
b
c

Name: Anonymous 2012-02-04 1:50

ponup

Name: Anonymous 2012-02-04 4:12

>>67
nice base 66 dubs

Name: Anonymous 2012-02-04 4:29

>>70
I just realized... all numbers are [dubs-base0,...,-dubs-basen] dubs for some n.

Name: Anonymous 2012-02-04 15:24

>>71

except for 1, silly billy.

Name: Anonymous 2012-02-04 20:40

>>72
Or 2.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List