Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Any decent modern general-purpose languages?

Name: Anonymous 2012-07-25 10:55

Assembly: Unportable. No standardised syntax.
Classical Visual Basic: Some good parts. Shit overall.
C: Shitty standard library. Deficient type system. Can't into Unicode. ``Unportable assembly.''
D and C++: Obfuscated boilerplate languages.
Java and C#: Forced OOP.
Common Lisp: Archaic cons-based library. Writing complex macros is a PitA due to the unlispy quotation syntaxes.
Scheme: CL without namespaces.
Clojure and Erlang: Concurrency is unneeded outside of a few very specific applications. Parallelism is where it's at.
OCaml: Great language, only one, deficient, implementation.
Haskell: Academic sex toy.
Forth: Reinventing the wheel over and over.
Ruby: Implicit declarations. Slow as fuck.
Python: Implicit declarations. FioC.
Perl: Brain damage.
PHP: Pretty much shit.
JavaScript: "" == false

It's impossible to list them all but, please, what decent modern general-purpose languages exist?

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2012-08-01 7:24

The disadvantage of trying to take advantage of that is the cost of synchronization.
???
As I mentioned above all recent CPUs automatically parallelise when there are no dependencies between instructions.

For general graphs, finding an optimal coloring is NP hard, and finding a coloring using less than k colors is NP complete.
I've never really understood this obsession with graph colouring for RA. Perhaps it's too general for register allocation, or register-memory/register-immediate/memory-immediate architectures like x86 can avoid this issue. In the flow graph, think of a wire connecting two nodes for each variable's value that needs to be preserved. Then the "thickness" of the "bundle" between each node is how many units (bytes e.g. if each wire represents 8 bits) of data need to be preserved between them. Assign the ones used in the innermost loops/most often to registers first. The rest can go in memory, or swapped with another if there's a change in use (x86 conveniently provides an xchg instruction for this.)

This algorithm is quite different from any I've seen before (except perhaps linear scan, although that one is not particularly efficient) and has some traits of the optimise-first approach: instead of attempting to allocate everything to registers and "spill" afterwards, determine the maximum storage needed first, then allocate to both memory and registers. For example you have 80 bits of "real" local variables (i.e. values that actually get used and need preservation, which may not correspond to the number of declarations in the source code) and 56 bits of registers. A dumb graph-colouring algorithm is NOT going to see the 24 bits until it's too late. Then it goes back, "spills", tries again, etc. The algorithm I described above is going to allocate the 24 bits somehow (pre-allocation via stack, push/pop, etc.), and fit everything in naturally.

At function definitions you could specify a maximum time out for the function.
That's not what I meant. It's a compile-time optimisation constraint to prevent compilation from taking too long, at the expense of possible lost pre-computation. E.g. you write a program that computes 64K digits of pi. This is self-contained and should terminate, so theoretically could be computed by the compiler. With a low precomputation limit, it doesn't finish so at best the compiler can save the partial state and you end up with e.g. a program that contains 4K digits already computed and another 60K to go. Set it high enough and you get a program that only contains the 64K digits you want because the computation terminated in the compiler.

If the printf implementation prepares the "2178309\n" string by writing into a buffer, the compiler would need to understand this and evaluate the resulting state of the buffer at compile time.
It's just the collection of visible state changes induced by the program. You could just collect the library function calls it made, along with their associated data, and emit code to do only that, if the program depended on no input. Programs are deterministic. If they have no input they will always produce the same output (or not halt). If they halt with no input then capture their output and that is the embodiment of the process it performs.

Name: Anonymous 2012-08-02 0:59

As I mentioned above all recent CPUs automatically parallelise when there are no dependencies between instructions.
Yes, so it would be silly to explicitly use threads to evaluate expressions whose evaluations are only a few primitive opcodes long. But if the values are the returned values from procedures that take inputs and take a while, and have time of execution that varies with their inputs, spawning child threads to compute the values and then joining them for the calculation of the common dependency will allow for the dependencies to be calculated in parallel.

I've never really understood this obsession with graph colouring for RA. Perhaps it's too general for register allocation,
To answer this question, one must ask if it is possible to create a chunk of code that will yield a variable interference graph isomorphic to an arbitrary graph.

or register-memory/register-immediate/memory-immediate architectures like x86 can avoid this issue.
Only if the number of registers is two.

In the flow graph, think of a wire connecting two nodes for each variable's value that needs to be preserved. Then the "thickness" of the "bundle" between each node is how many units (bytes e.g. if each wire represents 8 bits) of data need to be preserved between them.
Larger data types can be represented as a series of independent variables, where each variable represents a fixed size portion of the data type. If allocated in memory, and extra restriction that each portion must be in successive memory addresses can be assigned.

Assign the ones used in the innermost loops/most often to registers first.
why? And what's a loop in this context? what code representation are you operating on?

The rest can go in memory, or swapped with another if there's a change in use (x86 conveniently provides an xchg instruction for this.)
Yes, if there is spillage, there will need to be registers reserved for swapping values in between the swapping register and the variables permanent memory address.

This algorithm is quite different from any I've seen before (except perhaps linear scan, although that one is not particularly efficient) and has some traits of the optimise-first approach:
Without a proof of optimality, it's crap.

instead of attempting to allocate everything to registers and "spill" afterwards, determine the maximum storage needed first, then allocate to both memory and registers. For example you have 80 bits of "real" local variables (i.e. values that actually get used and need preservation, which may not correspond to the number of declarations in the source code) and 56 bits of registers.
Multiple variables can still be allocated in a single memory address, much how multiple variables can be allocated in a single register. An optimal graph coloring will give you a smallest number of slots needed to allocate your variables. Some of these slots will be in registers and if there are too many slots, they will be in memory as well. But in any case, a minimal amount of slots and memory will be used.

A dumb graph-colouring algorithm is NOT going to see the 24 bits until it's too late. Then it goes back, "spills", tries again, etc.
The graph coloring algorithm doesn't know about registers or spills. It just gives you colors. How you interpret and use these colors in a separate issue.

The algorithm I described above is going to allocate the 24 bits somehow (pre-allocation via stack, push/pop, etc.), and fit everything in naturally.
naturally. I wish this word was stricken from anything technical. If it ain't optimal it's crap.

That's not what I meant. It's a compile-time optimisation constraint to prevent compilation from taking too long, at the expense of possible lost pre-computation. E.g. you write a program that computes 64K digits of pi. This is self-contained and should terminate, so theoretically could be computed by the compiler. With a low precomputation limit, it doesn't finish so at best the compiler can save the partial state and you end up with e.g. a program that contains 4K digits already computed and another 60K to go. Set it high enough and you get a program that only contains the 64K digits you want because the computation terminated in the compiler.
my bad. That's a good point to be aware of if pushing those computations to compile time.

It's just the collection of visible state changes induced by the program. You could just collect the library function calls it made, along with their associated data, and emit code to do only that, if the program depended on no input. Programs are deterministic. If they have no input they will always produce the same output (or not halt). If they halt with no input then capture their output and that is the embodiment of the process it performs.
Yeah, constant folding on function calls can go a long way. It would be most powerful if it could adapt to almost any user function, as well as library functions.

Anyways, I'm going to try to get optimal register allocation in polynomial time. Wish me luck!

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2012-08-02 7:22

why? And what's a loop in this context? what code representation are you operating on?
Obviously because they will be read/written the most and thus deserve to be in registers, at least for the duration of that loop. It should also be obvious what a loop is, it's whatever corresponds to a loop in the source language. A loop within a loop is also similarly self-explanatory. The representation doesn't matter.

Yes, if there is spillage, there will need to be registers reserved for swapping values in between the swapping register and the variables permanent memory address.
This is where it seems you're still stuck with "the old way": a variable does not ever need to be at a "permanent memory address". There doesn't even need to be any correlation between how variables are defined and used in the source and the output, as long as they are semantically equivalent. Think of a variable as just a label for a "flow" of data between two or more operations. That flow might not exist if the operations that it connects aren't important to the final result of the function.

Without a proof of optimality, it's crap.
That's what academics think. The real world says otherwise.

Multiple variables can still be allocated in a single memory address, much how multiple variables can be allocated in a single register.
But multiple registers OR memory addresses may be allocated to one "variable" too.

The graph coloring algorithm doesn't know about registers or spills. It just gives you colors. How you interpret and use these colors in a separate issue.
That's exactly the point I'm making: it's too general.

Anyways, I'm going to try to get optimal register allocation in polynomial time. Wish me luck!
According to the RA article on Wikipedia this is possible with SSA, and there's nothing about anything being NP-complete on its page. Unfortunately they're both full of academic verbiage so it's difficult to determine what they're really talking about.

Another two examples of why more "targeted" algorithms can work better: instructions that require operands in particular registers, and those that prefer particular registers. With traditional graph colouring, you can force allocation of a certain register, but there's no way to specify a preference. An Asm programmer can think "there's a divide coming up: let's arrange things so that its input will be in the right registers when the time comes" and override that choice with something more important when necessary. Traditional RA algorithms can't do that.

tl;dr: Register allocation is one area where compilers could improve significantly, but due to a reluctance to challenge existing norms, has not. It's only recently that stuff like linear-scan (closer to what someone with common sense, but not exposure to previous biases, would come up with) and SSA started getting visibility. These basics should've been where we started.

To whoever started the insistence that register allocation is as general a problem as graph colouring and propagated it: FUCK YOU.

Name: Anonymous 2012-08-02 23:36

>>215

Obviously because they will be read/written the most and thus deserve to be in registers, at least for the duration of that loop. It should also be obvious what a loop is, it's whatever corresponds to a loop in the source language. A loop within a loop is also similarly self-explanatory. The representation doesn't matter.
It sounds like you are operating on the source language directly. In the version I was thinking of, it was to take abstract assembly that uses an unbounded amount of variables, and then output the same assembly but using as few variables as possible. When operating at the assembly level, loop constructs can become very complex and general. I'm not sure which approach would be better. It feels like it would be better to do it after the compiler has done its other optimizations, but I could be wrong.

This is where it seems you're still stuck with "the old way": a variable does not ever need to be at a "permanent memory address". There doesn't even need to be any correlation between how variables are defined and used in the source and the output, as long as they are semantically equivalent. Think of a variable as just a label for a "flow" of data between two or more operations. That flow might not exist if the operations that it connects aren't important to the final result of the function.
Well that's interesting. Is there an advantage to allocating a single variable in multiple locations? Each location is essentially a variable. So this could be modeled as variable assignment. I could see this getting very interesting if each storage location had a different cost of access. But assuming each register is equally cheap to read and write to, all you have to worry about is swapping values in and out of memory and the behavior of the special purpose registers, in which case, you might not need to go there.

That's what academics think. The real world says otherwise.
The real world is satisfied with crap.

But multiple registers OR memory addresses may be allocated to one "variable" too.
I'm not sure if there is an advantage in doing that. The algorithm would be creating a new variable that is equivalent to another previously defined one. If the goal is to create an equivalent program using the fewest amount of variables, and hence, needing the fewest amount of storage locations, then creating new variables doesn't seem like a good idea. But if the goal is to increase performance, then maybe it's possible in some circumstances to use more storage locations. At that point you'd need to work with a model closely related to the processor. It wouldn't be enough to just assume that using the minimum amount of storage locations will give you the most speed, although you'd think that would be true if there was no spillage as a result.

That's exactly the point I'm making: it's too general.
Whether or not graph coloring is too general for register allocation depends the class of graphs created in the construction of the variable interference graph. If it can form an arbitrary graph, then optimal register allocation is just as hard as graph coloring. But if the class of graphs is actually more limited, then it can be possible to use a special purpose graph coloring algorithm that takes advantage of properties present in the restricted class of graphs.

According to the RA article on Wikipedia this is possible with SSA, and there's nothing about anything being NP-complete on its page. Unfortunately they're both full of academic verbiage so it's difficult to determine what they're really talking about.
Good news! In the version I am working with, I am getting lines in the assembly where each variable must be holding its current value. If in a single line there are multiple variables that must hold their values, then these variables cannot be allocated in the same place. This yields a set of cliques. Now finding a clique in a graph is NP Complete, and the encoding we get for the variable interference graph is all of its cliques. So if it is possible to optimally color a graph using knowledge of what all its cliques are, then that could be used to optimally color the variable interference graph. I got something working for one example, but once I apply it to general graphs I get lost. Maybe there's a structure to the variable interference graph that I'm unknowingly taking advantage of.

Another two examples of why more "targeted" algorithms can work better: instructions that require operands in particular registers, and those that prefer particular registers. With traditional graph colouring, you can force allocation of a certain register, but there's no way to specify a preference.
The problem becomes much more complex, but you can use utility functions to rank the optimality of a coloring. But this requires a calculation of a utility function.

An Asm programmer can think "there's a divide coming up: let's arrange things so that its input will be in the right registers when the time comes" and override that choice with something more important when necessary. Traditional RA algorithms can't do that.
They can understand that the registers used by the divide instruction must be used to store the variables resulting from the division. This can be represented by allocating variables for the result of the division and forcing their colors to represent the registers they are put in by the instruction.

tl;dr: Register allocation is one area where compilers could improve significantly, but due to a reluctance to challenge existing norms, has not. It's only recently that stuff like linear-scan (closer to what someone with common sense, but not exposure to previous biases, would come up with) and SSA started getting visibility. These basics should've been where we started.
Indeed. But that makes it more exciting doesn't it? The late start makes it more likely that you might discover something new.

To whoever started the insistence that register allocation is as general a problem as graph colouring and propagated it: FUCK YOU.
so sorry! But really graph coloring is just solving the same problem in a more abstract setting. If you want to build more restrictions from the platform into the algorithm, just add more information to your definition of the graph to reflect the extra restrictions. These additions are usually small, and the advantage of using graphs is that all research and developments that have gone into graph algorithms will be easy for you to leverage. But there is the problem of graphs being such general objects. It could be that your domain is much less general than graph coloring. In which case it may be more useful to operate within the domain itself and never create a graph.

Name: Anonymous 2012-08-03 1:06

>>213-217
tl;dr

Newer Posts

Any decent modern general-purpose languages?

1 Name: Anonymous 2012-07-25 10:55

213 Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2012-08-01 7:24

214 Name: Anonymous 2012-08-02 0:59

215 Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2012-08-02 7:22