Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Help with unambiguous grammar

Name: long long 2012-01-13 22:59

I'm creating yet another fail C-family compiler.
Objectives: simple language, generics, type inference, and unambiguous grammar.
I would like to hear your opinions, any help is very appreciated.

bison parser:
%union {
    char *str;
}

%right '=' "+=" "-=" "*=" "/=" "%=" "~=" "&=" "|=" "^=" "<<=" ">>="
%left ':'
%right '?'
%left '|'
%left '^'
%left '&'
%left "or"
%left "and"
%left "==" "!="
%left '<' "<=" '>' ">="
%left "<<" ">>"
%left '+' '-'
%left '*' '/' '%'
%right "not" '~' PREFIX
%left "++" "--"
%left '['
%left '.'
%right '$'

%token <str> ID INT HEX BIN DOUBLE FLOAT COMPLEX STRING CHAR
%token-table

%%

program:
       | program import
       | program decl
       ;

import: "import" module
      | "import" module "as" identifier
      ;

module: identifier
      | module '.' identifier
      ;

decl: declvar
    | declfunc
    | declstruct
    ;

declvar: "var" identifier
       | "var" identifier '=' expr
       | "def" identifier '=' expr
       ;

declstruct: "def" identifier '{' declattrs '}'
          ;

declattrs: declattr
         | declattrs declattr
         ;

declattr: "var" identifier
        | declfunc
        ;

declfunc: "def" identifier '(' declargs ')' block
        ;

declargs: declarg
        | declargs ',' declarg
        ;

declarg: identifier
       | "var" identifier
       | identifier identifier
       ;

block: '{' '}'
     | '{' stmts '}'
     ;

stmts: stmt
     | block
     | stmts stmt
     | stmts block
     ;

stmt: declvar
    | purestmt
    | identifier ':' purestmt
    ;

purestmt: assign
        | call
        | if
        | for
        | while
        | switch
        | "goto" identifier
        | "return" expr
        ;

assign: lvalue '=' expr
      | lvalue "+=" expr
      | lvalue "-=" expr
      | lvalue "*=" expr
      | lvalue "/=" expr
      | lvalue "%=" expr
      | lvalue "~=" expr
      | lvalue "&=" expr
      | lvalue "|=" expr
      | lvalue "^=" expr
      | lvalue "<<=" expr
      | lvalue ">>=" expr
      | lvalue "++"
      | lvalue "--"
//      | "++" lvalue %prec PREFIX
//      | "--" lvalue %prec PREFIX
      ;

lvalue: identifier
      | lvalue '.' identifier
      | lvalue '[' expr ']'
      | '$' lvalue
      ;

identifier: ID
          ;

expr: literal
    | lvalue
    | assign
    | call
    | '~' expr
    | '@' lvalue
    | '-' expr %prec PREFIX
    | expr '+' expr
    | expr '-' expr
    | expr '*' expr
    | expr '/' expr
    | expr '%' expr
    | expr '&' expr
    | expr '|' expr
    | expr '^' expr
    | "not" expr
    | expr "==" expr
    | expr "!=" expr
    | expr '<' expr
    | expr "<=" expr
    | expr '>' expr
    | expr ">=" expr
    | expr "<<" expr
    | expr ">>" expr
    | expr "and" expr
    | expr "or" expr
    | expr '?' expr ':' expr
    | "proc" '(' declargs ')' block
    | '(' expr ')'
    ;

literal: INT
       | HEX
       | BIN
       | FLOAT
       | DOUBLE
       | COMPLEX
       | STRING
       | CHAR
       ;

call: onecall
    | call '.' onecall
    ;

onecall: lvalue '(' args ')'
       ;

args:
    | expr
    | args ',' expr
    ;

if: "if" '(' expr ')' block
  | "if" '(' expr ')' block "else" block
  ;

for : "for" '(' forarg "in" foriter ')' loop
    | "for" '(' forinit ';' expr ';' forincr ')' loop
    ;

forarg: identifier
      | "var" identifier
      ;

forinit: forinitarg
       | forinit ',' forinitarg
       ;

forinitarg: assign
          | "var" identifier '=' expr
          ;

forincr: assign
       | forincr ',' assign
       ;

foriter: lvalue
       | call
       ;

loop: ';'
    | loopblock
    ;

loopblock: '{' '}'
         | '{' loopstmts '}'
         ;

loopstmts: loopstmt
         | loopstmts loopstmt
         ;

loopstmt: "break"
        | "continue"
        | loopblock
        | stmt
        ;

while: "while" '(' expr ')' loop
     | "do" loopblock "while" '(' expr ')'
     ;

switch: "switch" '(' expr ')' '{' cases '}'
      ;

cases: allcases "default" ':' stmts
     ;

allcases: "case" expr ':' casestmts
        | allcases "case" expr ':' casestmts
        ;

casestmts: casestmt
         | casestmts casestmt
         ;

casestmt: "break"
        | caseblock
        | stmt
        ;

caseblock: '{' '}'
         | '{' casestmts '}'
         ;

Name: Anonymous 2012-01-13 23:00

I would suggest reading SICP

Name: Anonymous 2012-01-13 23:02

Objectives: simple language, generics, type inference, and unambiguous grammar.
Sounds great but you're going to ruin it somehow since you're more likely than not a mental midget.

Name: Anonymous 2012-01-13 23:03

Little remarks: I've changed the addressof operator to @, and the dereference operator to $. They look more natural IMHO.

Also, I don't know how to make unambiguous prefix ++ and -- operators. =T

Name: Anonymous 2012-01-13 23:03

* little people

Name: Anonymous 2012-01-13 23:04

>>4
I've changed the addressof operator to @, and the dereference operator to $
You've already ruined it, great job.

Name: Anonymous 2012-01-13 23:06

>>2 Sir! I'm doing it, sir!
>>3 Yes I know... But at least I expect to learn something from it, and spend more time programming than posting shit.

Name: Anonymous 2012-01-13 23:09

>>4
Also, I don't know how to make unambiguous prefix ++ and -- operators.
You fix them by removing them.

Name: Anonymous 2012-01-13 23:10

>>8 It's why I commented them out. So there isn't any other way?

Name: Anonymous 2012-01-13 23:24

>>9
They should already be unambiguous. What exactly is the problem? The typical way to disambiguate the operators is by placing spaces to separate the tokens, as in i++ + + ++j. Did you make sure the lvalue wasn't being reduced to an expr before the parser receives the "++" token?

Name: Anonymous 2012-01-14 4:15

Is this some hobby project to teach yourself something?

Name: Anonymous 2012-01-14 7:01

>>4
I've changed the addressof operator to @, and the dereference operator to $. They look more natural IMHO
That's ridiculous, why change something that works, why change something everybody knows what is and does?

Name: Anonymous 2012-01-14 10:14

>>10 Yes, it is being reduced to expr. =T

>>11 A hobby, yes. But I'm taking it seriously for real usage.

>>12 I tweaked the precedence a little, so they don't behave as & and *. I'm just trying to get something like $a.$b.$b = 1 which is equivalent to *(a->b->b) = 1;

Other toughts:
I haven't decided [yet] how to put template parameters on function calls and types. Functions definitely will be first-class citizens, but I'm not sure about types...

Name: Anonymous 2012-01-14 10:51

>>13
$a.$b.$b = 1
Pig disgusting and not well known.

*(a->b->b) = 1
Beautiful and well known to anyone who is familiar with C or C++.

Name: Anonymous 2012-01-14 11:05

>>13
I haven't decided [yet] how to put template parameters on function calls and types. Functions definitely will be first-class citizens, but I'm not sure about types...
Don't make the same mistake as C and C++:
1. Types after the variable: x τ or possibly x : τ.
2. Return type after the argument list: f(x : int) -> int.
3. In a type annotation, an unbound type is a type variable: f(x : α) -> int has type ∀α.α→int.
4. Distinguish the type declaration from the function declaration (which is why you should be using a colon between a variable and its type)
All of these should compile:
f(x : α, y : β) -> int { return 1; }
f : (α, β) -> int;
f(x : α, y : β) -> int { return 1; }

f : (α, β) -> int;
f(x, y) { return 1; }

6. Make structs take type variables:
struct list(α) { car : α; cdr : maybe(list(α)) }

Name: Anonymous 2012-01-14 11:10

>>15
That's disgusting.

Name: Anonymous 2012-01-14 11:14

>>14
Not really.
Some languages (i.e. sh) use $IDENTIFIER as variable names, so I'm using a similar approach.
It's also easier to spot when reading code, and provides an uniform syntax too (instead of C ->).

Name: Anonymous 2012-01-14 11:21

>>17
You're a fucking moron.

Name: Anonymous 2012-01-14 11:36

>>15
3. In a type annotation, an unbound type is a type variable: f(x : α) -> int has type ∀α.α→int.
Terrible! So instead of showing an error if you have a typo, it will accept it as a type variable. I like the way Haskell does this (types are capitalized, type variables are lowercase) but that will probably look pretty bad in a C-like language. Maybe the other way around? f : (int, A) -> A

Name: Anonymous 2012-01-14 11:44

I am going to be a contrarian... I like the @ and $ operators, OP. You reference and dereference so often in low level code that they really should have their own dedicated operators. It is an excellent idea.

Name: Anonymous 2012-01-14 11:44

>>19
Point taken.
I was thinking the same, the forced capitalization of types would look bad in a C-like language. I haven't thought of doing the reverse, though.

Name: Anonymous 2012-01-14 11:50

>>16 Not at all.

>>15
x : τ
I was using this before, it was just dropped to support the ?: operator. Also, var declarations looked strange:

  def f(x:α, y:β) {
    var c = x + y
    d:float = distance(x, y)
  }


There's a const declaration too(def c = 4), but I don't think this is really needed, a compiler should be able to detect the lack of assignments, right?

Damn, I'll just change the var declarations to ML let.
Oh! struct list(α) { ... } looks good, thank you for the advice.

Name: Anonymous 2012-01-14 11:51

>>22
If you were to continue with name:type I would do something like:


d:float = distance(x, y)
c := x + y

Name: Anonymous 2012-01-14 11:57

>>22
I was using this before, it was just dropped to support the ?: operator.
Make if..else an expression and not a statement. I beg you.
Also, var declarations looked strange:
var x:τ = 2+2?
Also, with type inference, that's not going to be much of an annoyance.
a compiler should be able to detect the lack of assignments, right?
Right, but mutability (or immutability) annotations help a whole lot.

Name: Anonymous 2012-01-14 12:02

>>18 Thank you, pal! Nice argument.
>>19,21 Maybe adding [another] operator? Like def f(x:T?) { return x }.
>>23 Not sure but I'll think about it.

Name: Anonymous 2012-01-14 12:06

>>24 Ok. What about {let} to annotate scoped constants?
{def f(x, y) {
    let s = log(x, y)
    for (var t in range(x)) {
        ...
    }
}}

Name: Anonymous 2012-01-14 12:09

>>19,21,25
Or def f(var x : τ) {...}
Or def f(x : var τ) {...}

Name: Anonymous 2012-01-14 12:18

>24
About if..else, I don't know, since blocks don't require exprs. Maybe return 0 or define an ifblock terminating in an expr?

Name: Anonymous 2012-01-14 12:20

>>25
Maybe adding [another] operator?
That would be the ML solution, where 'a -> int is the type ∀a.a→int. Personally, I find it disgusting.
Maybe an explicit forall quantifier, or f[a](x : a) -> int, but then the function declaration syntax becomes too bloated, and separate-type-declaration-unfriendly.

>>26
def was ok. I'd say to not introduce another keyword for function, variable, constant definitions. let or def for functions and constants, var for variables is my suggestion.

Name: kodak_gallery_programmer !!kCq+A64Losi56ze 2012-01-14 12:23

I think my IQ just dropped 20 points after reading this crap.

Name: Anonymous 2012-01-14 12:24

I like scala's approach, var for variables and val for constants (values).

Name: Anonymous 2012-01-14 12:27

It only took a couple of posts for this new language to fail horribly.

Name: Anonymous 2012-01-14 12:27

>>30
No you're a mental midget as well. How does it feel?

Name: Anonymous 2012-01-14 12:29

So this is a broken Frankenstein/Haskell of some sort?

Name: Anonymous 2012-01-14 12:29

>>28
The empty block returns either the unit type (that is, ()), or the void type, since this is a C-like language.
let x = if (true) {} else {} has the same effect of let x = f(), where f returns unit/void.

Also, remember: dangling else is considered harmful.

Name: Anonymous 2012-01-14 12:29

>>33
nice dubs

Name: Anonymous 2012-01-14 12:30

>>30
Dear mr. kodak_gallery_programmer!!kCq+A64Losi56ze,

Sorry, I can't give your lost IQ back, but instead, I would advise you to not spend your time on this textboard.

Thank you for your attention,
long long int

Name: Anonymous 2012-01-14 12:30

>>34
8/10

Name: Anonymous 2012-01-14 12:32

>>37
I spend more time on comp.lang.c because my gay lover Eric Sosman hangs out there.

Name: Anonymous 2012-01-14 13:10

>>39

Uncle Eric is mentioned in the following url...

http://www.redbooks.ibm.com/abstracts/sg247162.html

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List