I'm creating yet another fail C-family compiler.
Objectives: simple language, generics, type inference, and unambiguous grammar.
I would like to hear your opinions, any help is very appreciated.
Objectives: simple language, generics, type inference, and unambiguous grammar.
Sounds great but you're going to ruin it somehow since you're more likely than not a mental midget.
>>8 It's why I commented them out. So there isn't any other way?
Name:
Anonymous2012-01-13 23:24
>>9
They should already be unambiguous. What exactly is the problem? The typical way to disambiguate the operators is by placing spaces to separate the tokens, as in i++ + + ++j. Did you make sure the lvalue wasn't being reduced to an expr before the parser receives the "++" token?
Name:
Anonymous2012-01-14 4:15
Is this some hobby project to teach yourself something?
Name:
Anonymous2012-01-14 7:01
>>4 I've changed the addressof operator to @, and the dereference operator to $. They look more natural IMHO
That's ridiculous, why change something that works, why change something everybody knows what is and does?
>>11 A hobby, yes. But I'm taking it seriously for real usage.
>>12 I tweaked the precedence a little, so they don't behave as & and *. I'm just trying to get something like $a.$b.$b = 1 which is equivalent to *(a->b->b) = 1;
Other toughts:
I haven't decided [yet] how to put template parameters on function calls and types. Functions definitely will be first-class citizens, but I'm not sure about types...
Name:
Anonymous2012-01-14 10:51
>>13 $a.$b.$b = 1
Pig disgusting and not well known.
*(a->b->b) = 1
Beautiful and well known to anyone who is familiar with C or C++.
>>13 I haven't decided [yet] how to put template parameters on function calls and types. Functions definitely will be first-class citizens, but I'm not sure about types...
Don't make the same mistake as C and C++:
1. Types after the variable: x τ or possibly x : τ.
2. Return type after the argument list: f(x : int) -> int.
3. In a type annotation, an unbound type is a type variable: f(x : α) -> int has type ∀α.α→int.
4. Distinguish the type declaration from the function declaration (which is why you should be using a colon between a variable and its type)
All of these should compile: f(x : α, y : β) -> int { return 1; } f : (α, β) -> int;
f(x : α, y : β) -> int { return 1; } f : (α, β) -> int;
f(x, y) { return 1; }
6. Make structs take type variables: struct list(α) { car : α; cdr : maybe(list(α)) }
>>14
Not really.
Some languages (i.e. sh) use $IDENTIFIER as variable names, so I'm using a similar approach.
It's also easier to spot when reading code, and provides an uniform syntax too (instead of C ->).
>>15 3. In a type annotation, an unbound type is a type variable: f(x : α) -> int has type ∀α.α→int. Terrible! So instead of showing an error if you have a typo, it will accept it as a type variable. I like the way Haskell does this (types are capitalized, type variables are lowercase) but that will probably look pretty bad in a C-like language. Maybe the other way around? f : (int, A) -> A
Name:
Anonymous2012-01-14 11:44
I am going to be a contrarian... I like the @ and $ operators, OP. You reference and dereference so often in low level code that they really should have their own dedicated operators. It is an excellent idea.
>>19
Point taken.
I was thinking the same, the forced capitalization of types would look bad in a C-like language. I haven't thought of doing the reverse, though.
>>15 x : τ
I was using this before, it was just dropped to support the ?: operator. Also, var declarations looked strange:
def f(x:α, y:β) {
var c = x + y
d:float = distance(x, y)
}
There's a const declaration too(def c = 4), but I don't think this is really needed, a compiler should be able to detect the lack of assignments, right?
Damn, I'll just change the var declarations to ML let.
Oh! struct list(α) { ... } looks good, thank you for the advice.
Name:
Anonymous2012-01-14 11:51
>>22
If you were to continue with name:type I would do something like:
>>22 I was using this before, it was just dropped to support the ?: operator.
Make if..else an expression and not a statement. I beg you. Also, var declarations looked strange: var x:τ = 2+2?
Also, with type inference, that's not going to be much of an annoyance. a compiler should be able to detect the lack of assignments, right?
Right, but mutability (or immutability) annotations help a whole lot.
Name:
Anonymous2012-01-14 12:02
>>18 Thank you, pal! Nice argument. >>19,21 Maybe adding [another] operator? Like def f(x:T?) { return x }. >>23 Not sure but I'll think about it.
Name:
Anonymous2012-01-14 12:06
>>24 Ok. What about {let} to annotate scoped constants?
{def f(x, y) {
let s = log(x, y)
for (var t in range(x)) {
...
}
}}
Name:
Anonymous2012-01-14 12:09
>>19,21,25
Or def f(var x : τ) {...}
Or def f(x : var τ) {...}
Name:
Anonymous2012-01-14 12:18
>24
About if..else, I don't know, since blocks don't require exprs. Maybe return 0 or define an ifblock terminating in an expr?
>>25 Maybe adding [another] operator?
That would be the ML solution, where 'a -> int is the type ∀a.a→int. Personally, I find it disgusting.
Maybe an explicit forall quantifier, or f[a](x : a) -> int, but then the function declaration syntax becomes too bloated, and separate-type-declaration-unfriendly.
>>26 def was ok. I'd say to not introduce another keyword for function, variable, constant definitions. let or def for functions and constants, var for variables is my suggestion.
>>28
The empty block returns either the unit type (that is, ()), or the void type, since this is a C-like language. let x = if (true) {} else {} has the same effect of let x = f(), where f returns unit/void.
Also, remember: dangling else is considered harmful.
def energy(system) {
var e = 0.0
for (a in system) {
e += 0.5 * a.mass
* ( a.vx * a.vx
+ a.vy * a.vy
+ a.vz * a.vz )
for (b in system) {
let dx = body.x - b.x
let dy = body.y - b.y
let dz = body.z - b.z
let distance = sqrt(dx*dx + dy*dy + dz*dz)
e -= (a.mass * b.mass) / distance
}
}
return e
}
I miss sexprs here T-T: e += 0.5 * a.mass
* ( a.vx * a.vx
+ a.vy * a.vy
+ a.vz * a.vz )
C programmers will ask for multiple variable declarations, if they have the same type. I thought something like this. struct Body {
x, y, z, vx, vy, vz, mass : double
}
Name:
Anonymous2012-01-14 16:05
Does "proc" work like this? def outer() {
return proc (x) {
return x*2
}
}
outer()(10)
>>43
Not enough π decimals!3.1415926535897932384626433
It looks good, and now I see why * isn't the dereference operator, you can drop the semicolons and make a\n*b mean a*b.
Name:
long long2012-01-14 17:04
>>45
Yes, but I was thinking about currying: def mul(x) {
return proc (y) { return x*y }
}
def main() {
printf("%d\n", mul(2)(4))
} 8
But I didn't think about name binding yet.
>>46 :P
Also, I was seriously thinking in overloading the . operator to derefence pointers when accessing members, and turning $ to: lvalue: '$' identifier
| '$' '(' lvalue ')'
and these expressions would be equivalent: $a → *a a.b.c → a->b->c $(a.b.c) → *(a->b->c)
The verbose alternative can show what names are pointers in expressions like $a.b.$c.d, but I don't think this is a really great advantage, because if someone changes a member to a pointer, then we need to update all the code accessing it.
Name:
Anonymous2012-01-14 22:32
2012 not using off-side rule
Name:
Anonymous2012-01-14 22:51
>>48
If you're talking about indentation-based statement grouping, it has some implications that aren't good, at least for language tools.
Since often there isn't enough visual clue to distinguish different amount or types of whitespaces, I don't mind to ignore them entirely on the lexer.
Another valid program, mandelbrot, gave [or reassured] me some ideas:
* C-compatible ABI (a must)
* importing C headers direcly (it would be good)
* implicit conversion between pointer types / void pointer?
* cpp-like preprocessor, only for #pragmas?
* optional non-qualified imported identifiers if there aren't conflicts?
* OS/arch dependent modules, like the std.sse below?
* modules could specify proper compilation flags, like std.sse specifying -mfpmath=sse -msse openmp specifying -openmp gl specifying -lgl
which could also ease library dependency chains.
import openmp
import std.sse as sse
import posix
import stdlib
#omp parallel for schedule(static,1)
for (var i = 0; i < N; i++) {
calc_row(i)
}
printf("P4\n%d %d\n", N, N)
fwrite(bitmap, bytes_per_row, N, stdout)
free(bitmap)
free(Crvs)
return EXIT_SUCCESS
}
def calc_row(y) {
let row_bitmap = bitmap + (bytes_per_row * y)
let Civ_init : sse.v2df = { y*inverse_h-1.0, y*inverse_h-1.0 }
for (var x = 0; x < N; x += 2)
{
var Crv = Crvs[x >> 1]
var Civ = Civ_init
var Zrv = zero
var Ziv = zero
var Trv = zero
var Tiv = zero
var i = 50
var two_pixels : int
var is_still_bounded : sse.v2df
>>47
How about just '$' lvalue? Also, I think $a.b.$c.d is better.
>>51 * implicit conversion between pointer types / void pointer?
With parametric polymorphism, you can just return a polymorphic pointer: malloc : (int) -> ptr[α]
The type α can be inferred from context and use.
Now your code is type-safe, congratulations. * cpp-like preprocessor, only for #pragmas?
Pragma comments like Haskell.
Name:
Anonymous2012-01-15 8:49
>>51
This is a fucking nightmare, you're butchering this more and more, so every object file has to carry what it should be linked with? How else is the linker supposed to know what link with when you decide to merge all the object files into a shared object or something of that sort?
Name:
Anonymous2012-01-15 11:26
>>52
You can write a function and pass pointers or references to it: def getm(obj) { return obj.m }
var s = {1,2,3}
getm(s) // getm(s : array[int] ) → int
getm(@s) // getm(s : ptr[array[int]]) → int
>>51
I was thinking in compiling to some sort of intermediate representation [just one step back from the llvm-as], that could be portably executed in a VM [for easy distribution], and also natively compiled [for -march=native -mtune=native speed].
This representation also will help distributing libraries with generic functions/structs, whose types will be inferred at compile time, generating instances. Just include a simple “make” in the compiler: you can pass source and output directories, and the source tree is recursively searched for source files, which will be compiled if its or any of its imports timestamp [recursively and cached] are newer than the matching object file in the output directory. Then we can remember the template occurrences with same type parameters in the source tree and compile each just once.
I did this for my C++ projects, 200 lines of python [except the template caching part] and it runs very nice, I can even parallelize all sources until all memory is consumed by the compiler.
$ cd ~/foo
$ make
./make.py -ssrc -obuild -j3 gcc debug
compiling src/a.cc to build/gcc-debug/a.o
compiling src/b.cc to build/gcc-debug/b.o
compiling src/bar/b.cc to build/gcc-debug/bar_b.o
(waiting sources compilation)
linking build/*.o to build/gcc-debug/foo.exe
Without breaking a sweat. Compiler flags [for platform dependent CFLAGS/LDFLAGS] are put in a separate make.ini file, under different sections ([default] [win32] [linux64]).
whatever you'd like. Some people say end block are more readable, and other people find curlys to have a more definite structure, both for their eyes and for their text editors.
Name:
Anonymous2012-01-26 4:26
>>60
I don't know... I think that curlies could be used for something more useful, like another data structure literal (hashes?).
--
Updates:
Maybe type initializers and conversions looks better as “function calls”: var car = Vehicle(1, 2, "Mazda")
let x = byte(car.speed & 0b11110000)
Also, I don't know if it's better to use C++'s vtables or Clay's variant types. I designed something around the second, but there's some flaws here and there yet...