Assembly: Unportable. No standardised syntax.
Classical Visual Basic: Some good parts. Shit overall.
C: Shitty standard library. Deficient type system. Can't into Unicode. ``Unportable assembly.''
D and C++: Obfuscated boilerplate languages.
Java and C#: Forced OOP.
Common Lisp: Archaic cons-based library. Writing complex macros is a PitA due to the unlispy quotation syntaxes.
Scheme: CL without namespaces.
Clojure and Erlang: Concurrency is unneeded outside of a few very specific applications. Parallelism is where it's at.
OCaml: Great language, only one, deficient, implementation.
Haskell: Academic sex toy.
Forth: Reinventing the wheel over and over.
Ruby: Implicit declarations. Slow as fuck.
Python: Implicit declarations. FioC.
Perl: Brain damage.
PHP: Pretty much shit.
JavaScript: "" == false
It's impossible to list them all but, please, what decent modern general-purpose languages exist?
C is too old, and its semantics were built on 1970s instruction sets. We need a language as close to the metal as C with modern design, but not going overboard with the shitty hacks like SEPPLES.
Go.
Statically typed, duck typing, no shitty OOP without sacrificing usability, simple, easy to write, looks good, compiled, quite fast, garbage collected, good library, easy to FFI with C, great concurrency, no boilerplate or garbage syntax, easy (and fast) compilation without the need to write Makefiles, ...
>>8
You can use everything except statements in Python's lambda. Python can be fully functional (in both meanings) with only expressions.
If you really want to use statements so much, just use a closure. Functions are first class objects, lambda is not really necessary.
>>12
I was talking about defining a function inside another function and then returning it.
This inner function has access to the scope of the outer function, so you can do something like this: def a(b):
def c(d):
return b + d
return c
f = a(3)
print f(2) # 5
print f(3) # 6
g = a(6)
print g(2) # 8
print f(2) # still 5
As you can see, the inner function closes over the variables in the outer one, thus ``closure''.
Basically, anything you could do with a lambda, you can also do with a standard function.
GCC implements taking the address of a nested function using a technique called trampolines. This technique was described in Lexical Closures for C++ (Thomas M. Breuel, USENIX C++ Conference Proceedings, October 17-21, 1988).
yes apparently
Name:
Anonymous2012-07-25 13:24
>>5
totally agree, C and C++'s day is over as general purpose programming language, they will be relegated to systems/embedded programming which they were meant for. Languages like D, Go and Rust are the future
Name:
Anonymous2012-07-25 14:02
>>1
fuck you for thinking javascript is shit. you're obviously not worthy of calling yourself a programmer
>>20 Languages like D, Go and Rust are the future
That's funny because they are all terribly useless. Absolutely no one uses them in production, for good reason. Go is a shit language; it's like C mixed with Javascript and Pascal, but crippled. Google doesn't even use it for anything important. The only reason anyone has heard of it is because of cat-v Plan 9 faggots astroturfing on the web. Rust and D are even more obscure and shitty. D is like C++ except for retards. Not to mention that none of these have compilers that produce code as fast as C++ compilers, or even the JVM for that matter.
Name:
Anonymous2012-07-25 14:12
D is like C++ except for retards
lol'd
have an upboat
Name:
Anonymous2012-07-25 14:58
>>24
Go is hardly useless. It it enough like C to feel familiar while not bothering to implement features irrelevant to modern programming. That makes it simple to program with (unlike stuf like C++) without any major drawbacks (such as Java's forced OOP or scripting languages' inability to do any lower level stuff, not to mention their egregiously slow speed). Go sources are usually more portable than your standard C or C++ code too.
For example, the lack of pointer arithmetic and manual memory management means that the programmer has a lot less stuff to worry about.
In addition to removing completely unnecessary (and outright dangerous) features, Go is also quite strict. You mustn't have unused variables, code style is enforced, no implicit type casting (including int and int32, despite being the same underneath, being separate and requiring explicit casting), etc. While that may seem to be excessively restrictive to your standard volatile C obfuctor, it inevitably results in a code that is generally cleaner, safer and better.
Garbage collection should be the norm in userland applications anyway, because the tiny amount of speed gained by manual memory management is utterly negligible in all but the most performance-critical code, while it makes segfaults and memory leaks all but gone (yes, >>22, GC is actually useful).
In speed, it's true that Go is somewhat slower than C/C++, but that's only to be expected with GC, type reflection and other features. It still at least matches, if not outright exceeds, the (both programming and execution) speed of most equivalent Java programs.
Google stated that they use Go in several of their internal applications. In addition to that, Go is a good choice for Google App Engine.
Conclusively, Go is a true current-generation programming language that's focused on ease of use and good programs, not on any irrelevant innovation that's ultimately destructive, since it precludes the good, simple and obvious ways of coding in order for the programmer to make use of the novelties.
>>28
Can I write uniform-looking and easily-readable code (from the perspective of someone who's not a Perl wizard) that also executes fast enough (Java-tier speed, not Python-tier) with Perl?
Name:
Anonymous2012-07-25 15:36
>>28
can i write a function that accepts two arrays as arguments?
>>33
If you change a variable afterwards, that's your fault.
Don't do it if you don't want that.
Name:
Anonymous2012-07-25 16:16
>>34
Which defeats the purpose of emulating closures in the first place.
Name:
Anonymous2012-07-25 16:19
And just for the record, java requires variable to be final when emulating certain types of closures.
Name:
Anonymous2012-07-25 16:21
Every programming language is crap unless it's Lisp, which is shit.
Programming is a disgusting practice and you should be ashamed.
Name:
Anonymous2012-07-25 16:55
>>24 That's funny because they are all terribly useless. Absolutely no one uses them in production, for good reason. Go is a shit language; it's like C mixed with Javascript and Pascal, but crippled. Google doesn't even use it for anything important. The only reason anyone has heard of it is because of cat-v Plan 9 faggots astroturfing on the web. Rust and D are even more obscure and shitty. D is like C++ except for retards. Not to mention that none of these have compilers that produce code as fast as C++ compilers, or even the JVM for that matter.
Languages like C and C++ had their day in the 80s when you had to write directly to hardware on singletasking OSless home computers. When you wrote a program back then, the program was the OS, everyone used asm back then. Using an archaic language like C++ now is sheer stupidity. Having to manually manage memory back when you were writing directly to hardware addresses and had full control of the CPU, but that kind of programming is gone with mondern OS's. You can call it astroturfing all you want, but languages like Go, D and Rust are going to take over and eliminate all the stupid time wasting errors that could be eliminated with garbage collection. Basically you should be able to look at code and know what it does, not poke around with a debugger and static anaylisis to try and find out what the program might do because the code is to unsafe to trust on its own. There are a lot of people who make their living coding C/C++ whos work will be made obsolete by modern compiled languages, languages that should have caught on 15 years ago.
Name:
Anonymous2012-07-25 16:58
Objective-C.
Name:
Anonymous2012-07-25 17:07
>>39
a language that has C as a strict subset with unsafe OO bolted on
gtfo
Name:
Anonymous2012-07-25 17:38
>>39
+1 for properly spelling out the name of Objective-C
>>38 Basically you should be able to look at code and know what it does, not poke around with a debugger and static anaylisis to try and find out what the program might do because the code is to unsafe to trust on its own.
Oh lawd the ironing. With native code you can see every single instruction from compiled output, knowing that this copy will not change at all. And you're really delusional. Hardware gets better not to make it easier for weak programmers, but to allow for better software. It's not like everyone is going to buy expensive hardware just so that shitty bloated GCs can run at a maximum. They will buy expensive hardware so that they can run software that delivers in parity with the hardware. For example, I'm not going to buy a new graphics card and use more power just to play a game that has graphics no better than last year. What a fucking waste. Apply the same idea to enterprise servers and we're talking about billions of dollars.
>>42
pure disinformation
As I said, the day when a program loaded in RAM and took over the computer are over, you cant track a program in memory anymore, its all artificial segmented memory hidden from you by the OS and CPU hardware, not to mention most of the program runs in dynamic libraries.
People keep trying to connect garbage collection with slow VM languages like Java and C#, but compiled gc languages are a whole new ballgame.
>>44
Go is slow as fuck. It has a broken conservative garbage collector. Try a language with a real GC instead of something designed to be retrofitted onto a language with unsafe pointers. IHBT
Name:
Anonymous2012-07-25 23:44
Try newLISP. It has pretty much everything you need for general purpose programming. Strings are UTF8, it has pattern matching, pretty battery-included standard library (sockets, regex, etc), trivial c interface and not to mention it has true code = data unlike scheme and cl. The only downside is it's gpl.
>>10,26,47
Pig disgusting! Pop infix syntax everywhere — worse than C. Terrible GC. And as I said of Clojure and Erlang, concurrency is not the next big thing, parallelism is.
You don't understand Lisp
Could you elaborate? I'm fine with the special casing of two-elements records for linked lists but I don't understand why is the user encouraged to use them for things such as association lists elements or trees.
Writing complex macros is a PitA due to the unlispy quotation syntaxes.
I want a language in which you can go as high or as low-level as you like. If you just want to write programs quickly you can use lambdas, dynamic types, etc. but you should also be able to drop down to the level of inline asm for those parts where the compiler isn't good enough. The closest to that at the moment is probably C++.
>>45
And it's withstood the test of time. Besides the classic 8-bits (6502, Z80, 8051, etc.) what other CPU architectures have lasted as long?
Name:
Anonymous2012-07-26 4:53
Racket
Name:
Anonymous2012-07-26 5:08
>>60 Could you elaborate? I'm fine with the special casing of two-elements records for linked lists but I don't understand why is the user encouraged to use them for things such as association lists elements or trees.
Association lists are sometimes useful. A proper Lisp (pun intended) would provide you with a library with at least hash tables and various trees (for speed, really).
>>61
I, too, want an winged ponycorn that shoots laser beams from its eyes, Cudder.
Name:
Anonymous2012-07-26 6:48
>>60 concurrency is not the next big thing, parallelism is
Go's purpose is not to be ``the next big thing'', it's to include already existing functionality in a well-designed way. Concurrency is just fine for real world programming.
>>61
That's utterly ridiculous. Why would you rather have a jack-of-all-trades language like the mess that C++ is than learn a few (even two would suffice) languages that would together cover the same field without any one of them being shit because it tries to be everything at once? Wanting something like that from a programming language shows that you don't have a clue what you want.
>>65 >>66
It's a perfectly reasonable desire. You can master one language and use it as much as possible. The only drawback is that the language becomes more complex like C++, but no one forces you to use all or even half of the features.
than learn a few (even two would suffice) languages that would together cover the same field
The point here is easy integration. For whatever reason, language designers often think their language exists in a void and no one else should be using anything other than their language. If they think otherwise, then almost always, access to "outsider" languages is "at a distance", via an API. Take Python as an example. I haven't used it, but a quick search reveals things like http://docs.python.org/extending/extending.html that shows you need to go through an elaborate process just to write a few lines in an alternate language.
An exact opposite example would be asm { } blocks in C/C++. If I'm writing C and want to manually optimise some parts, I can put in an asm { ... } right there. Trying to put some C in Python, or even Asm in Python? No easy way. You have to go through the whole process above. It's complex enough to be discouraging for simple cases, reducing the opportunities that programmers have.
I may seem biased towards C/C++, but that's naturally because I'm an experienced long-time user and understand the entire process behind its design. And yet, like >>68 mentions, I have not used all the features of C++ like advanced templating. I've found that those who complain about it don't understand why it's the way it is. Once you understand how C++ maps to C, you might find it quite interesting and enlightening. Again, it allows you to choose the level of abstraction you want. I can use cout if I feel... "classy" and want to use the (IMHO bloated) OOP output, printf if I want to be C-like, and int 128 function 4 if I want to be as thin as possible and hit the kernel directly.
True utility comes from being able to solve the problem in the way you want, not the way the language designer enforces. Also remember that programming languages are tools, and like physical tools, acquire dirt and wear from use. The only perfect, clean tool is one that is never used.
you're experience has maid you clothes minded. Going to asm for efficiency is something you should leaf to the compiler, as it can be done well by the compiler and by adding your own bits of assembly, you alienate other architectures. And hand coded asm could be sub optimal. You should only use asm to access features that are not otherwise available, like in a wrapper library that configures interrupts or something.
You should only use asm to access features that are not otherwise available
although it would be appropriate to use assembly for routines that can be implemented with special assembly ops that the compiler isn't clever enough to employ. I like to think that a sufficiently clever compiler can exist, but I don't think it can for every language. In particular C. So I actually take most of >>72 back, but only for advanced opcodes that can only be used in special situations, like SMD stuff.
>>71 Trying to put some C in Python, or even Asm in Python? No easy way. You have to go through the whole process above.
Do you ever do any actual research into your bullshit claims you fucking retard?
>>71 Once you understand how C++ maps to C, you might find it quite interesting and enlightening. And cry a little inside because you can't have your precious life back.
>>79
No matter how much Larry Wall and Guido van Rossum want you to think otherwise, the implementation is, in fact, not the language.
Else, every language is assembly and you have no excuse to write C.
Name:
Anonymous2012-07-27 5:56
>>79
So, it just means they're full of security flaws hidden in C undefined behaviour.
Name:
Anonymous2012-07-27 6:11
>>71
So you would rather use a terribly designed language than to write a few lines of code required to use the FFI? Seriously?
You shouldn't need inline assembly anywhere except in kernels and some
case of extreme optimizations. Requiring a modern language to support that is outright ludicrous.
By the way, check this: package main
// #include <stdio.h>
// int f(int a, int b) { return a * b; }
import "C"
func main() {
C.printf(C.CString("%d\n"), C.f(2, 3))
}
Yes, I just defined and called a C function in inline Go. See that? It's not that hard to use C stuff.
However, I see no reason at all to require my language to support anything more than that. Inline ASM? C as a subset of the language? What purpose would any of that serve in the long run except to complicate my job?
By the way, there's no such thing as asm block in C. What you're referring to is a C extension included in whatever C-based language you're using.
Name:
Anonymous2012-07-27 13:37
>>82
Weave as a part of scipy and Instant as a part of Fenics are just two Python modules which can do the same thing, they also allow easy access to numpy arrays.
Name:
Anonymous2012-07-27 14:04
>>72
Compilers don't do magic. They follow a set of rules like any software to turn common constructs into tighter code. But compilers are not aware of the context of the problem you're trying to solve, or the algorithm you're trying to express. Suppose you're in a very hot block of code. You find a more clever solution than the ASM output. That's where you use inline assembly.
>>84
However, it's extremely rare that such an optimization would be worth more than general portability. It certainly shouldn't be a requirement for a general-purpose language (note that ``general'' does not mean ``specific set of hardware''). So, when you REALLY have to use that, you can use whatever convoluted way you can think of, since it's quite unlikely that you'll be doing it regularly.
>>85
And that's exactly what I said. That block isn't a part of the C language, but is a very common compiler extension.
>>94 Order of evaluation of arguments, let may evaluate its clauses in any order
I don't know why some people thinks that 'undefined behaviors' are a bad thing. Actually it's good because it force you to think in a correct way
Name:
Anonymous2012-07-27 19:55
Why would such a minor freckle like this matter? Are you mutating states, /prog/?
Name:
Anonymous2012-07-27 20:04
>>101
No, it's not. Everything should be well-defined and nothing left up to the implementation.
>>102
In fact I am mutating states whenever I must, which happens about 1% of the time. And in that 1% of the time, only 1% of the time does the side-effect escape the function.
Name:
Anonymous2012-07-27 20:14
>>103
I think that you don't understand why something is undefined behavior
Name:
Anonymous2012-07-27 20:18
JACKSON 5 + 100 GET
20 TIMES BETTER THAN JACKSON 5 GET
Name:
Anonymous2012-07-27 20:27
>>103 Everything should be well-defined and nothing left up to the implementation.
No languages have ``nothing left up to the implementation'' unless they are ``defined'' by their implementation. See: Perl.
>>103
I guess you want Java, “run once, write everywhere”.
Name:
!L33tUKZj5I2012-07-27 23:53
The best programming language there ever was, or ever will be, is Spectrum BASIC.
Every machine out there should have an interpreter.
Name:
Anonymous2012-07-28 3:11
>>103
Who made up that rule? If I made a Shitgol and left the entire language as an exercise to the implementations and Shitgol compiler writers and Shitgol users were okay with that then it should be fine. Standard specs only exist to prevent excess incompatible extensions between vendors.
Name:
Anonymous2012-07-28 4:43
>>104
I do understand, it's because compiler writers want extra room for optimizations. For instance, a compiler could schedule the evaluation of a procedure's arguments in parallel.
>>106
Most languages aren't Lisp, thus they're crap. Pointing out that most languages have undefined behaviour means nothing.
>>107,109
I never thought I'd say this, but Java does have something good about it. I find reproducibility to be fairly important -- if a program is executed on two different implementations, it should have the same result (leaving aside things like reference->integer conversions which aren't even guaranteed to be the same across two runs of the same implementation). Leaving undefined behaviour in your language is a great way to make your users shoot themselves in the foot.
>>72,73,82
Clearly none of you are experienced with Asm, or looked at compiler output (or you have, but can't understand/see a way to do it better, because of the first point). Compilers are a good way to generate code quickly; even at the highest optimisation level none of them can reason about things like register allocation in the same way that a knowledgeable programmer can. People have been saying compilers are better at generating code than programmers, but that's only if you compare to average or below-average Asm programmers.
Go learn Asm and compile any program you want, then look at the Asm output. Find a function and see if you can do it better. Unless you have some ultra-powerful compiler I haven't seen before, chances are you can beat the compiler in size, speed, or both. Perhaps because of the source language semantics the compiler can't do something, but YOU (should) know how your program operates and can easily take advantage of things like cross-function register allocation, convenient invariants (you can prove a register will always have e.g. 0, and use it appropriately; the compiler can't), and advanced stack manipulation.
>>82
Saying a language is "terribly designed" when it has the capabilities I need (and then some) is simply opinion. And it's not "write a few lines of code required to use the FFI", it's having to do that plus all the other "administrative" cruft surrounding it. In other words the integration is not as seamless as it could be. You said it yourself -- "FFI". Not "embed code directly in the compiler/JIT's output".
>>83
This is interesting, but note that it's not a core language feature. Someone clearly had the need to do that.
>>88
True portability is a myth. You can never get that in a compiled language, no matter how hard the standards bodies try. The closest thing to it is Java, and everyone should be familiar with its characteristics. If portability is ALL I care about for an application, then that's what I'd choose. My definition of "general purpose language" is one that allows the programmer to use whatever level of abstraction he/she finds appropriate, and switch between them effortlessly. How many architectures out there, in active use, have 37-bit integers, 11-bit bytes, or other oddities that the portability advocates keep using as examples? If you were working on such an uncommon system, portability is going to be the least of your worries. Right now, if you're targeting regular desktops, there's basically two ISAs and two OSs: x86-32/x86-64, Windows/*nix.
>>89
This is the sort of stuff I was talking about.
For instance, a compiler could schedule the evaluation of a procedure's arguments in parallel.
Unless those arguments require extensive computation, chances are that the overhead of preparing to get that done would outweigh the cost of just running them all in 1 thread serially. Modern processors are now quite good at parallel instruction execution, IIRC these 3 instructions
add b[edx], 5
sub eax, ecx
sub ebx, ecx
are executed in parallel since ~Core 2, with the second two running while the first is waiting on two memory accesses.
>>111
While you could hyper-optimize that code, do you really need that? Just when was the last time you really needed something like that instead of the code that would work at least on x86 and ARM?
Name:
Anonymous2012-07-28 6:41
>>11
Except Python does not do tail call elimination. Or have it's own stack for function calls. Enjoy your 998 calls or
>RuntimeError: maximum recursion depth exceeded
Name:
Anonymous2012-07-28 6:51
>>113
CLISP doesn't either, yet it is still perfectly usable.
Name:
Anonymous2012-07-28 6:52
>>111
Compilers can track the state of numerous registers far more effectively than a human can. They can take a higher level approach to assigning code than a human would reasonably do with a pen.
>>111 This is interesting, but note that it's not a core language feature. Someone clearly had the need to do that.
If you drop numpy arrays you can also drop numpy as a dependency and then all that is left is distutils which is part of the core language.
Name:
Anonymous2012-07-28 12:33
>>84 Compilers don't do magic. They follow a set of rules like any software to turn common constructs into tighter code.
Depending on the set of optimization algorithms they employ, they can appear to be magical.
But compilers are not aware of the context of the problem you're trying to solve, or the algorithm you're trying to express.
Then you are using the wrong language. You should only express the program as a minimally described idea. The compiler can then find an implementation of the idea that obtains an optimal efficiency on that target architecture and platform.
Suppose you're in a very hot block of code. You find a more clever solution than the ASM output. That's where you use inline assembly.
I'd rather let my genetic peep hole simulated annealing optimizer run on it with test input for two days.
Name:
Anonymous2012-07-28 12:50
>>118 let my genetic peep hole simulated annealing optimizer run on it with test input for two days.
please be trolling
Clearly none of you are experienced with Asm, or looked at compiler output (or you have, but can't understand/see a way to do it better, because of the first point). Compilers are a good way to generate code quickly; even at the highest optimisation level none of them can reason about things like register allocation in the same way that a knowledgeable programmer can.
People have been saying compilers are better at generating code than programmers, but that's only if you compare to average or below-average Asm programmers.
Use a better compiler.
Go learn Asm and compile any program you want, then look at the Asm output. Find a function and see if you can do it better. Unless you have some ultra-powerful compiler I haven't seen before, chances are you can beat the compiler in size, speed, or both. Perhaps because of the source language semantics the compiler can't do something,
Use a better language. The only defined behavior present should be critical to the correctness of the program.
but YOU (should) know how your program operates and can easily take advantage of things like cross-function register allocation, convenient invariants (you can prove a register will always have e.g. 0, and use it appropriately; the compiler can't), and advanced stack manipulation.
Saying a language is "terribly designed" when it has the capabilities I need (and then some) is simply opinion. And it's not "write a few lines of code required to use the FFI", it's having to do that plus all the other "administrative" cruft surrounding it. In other words the integration is not as seamless as it could be. You said it yourself -- "FFI". Not "embed code directly in the compiler/JIT's output".
This is an implementation issue, not a language issue.
Unless those arguments require extensive computation, chances are that the overhead of preparing to get that done would outweigh the cost of just running them all in 1 thread serially.
This is a decision that can be made by the compiler. Although it would be annoying if in order to take advantage of this optimization, if you had to stuff large expressions to evaluate in parallel inside of function call parameter lists. In the purely functional setting, you can simply define your variables relative to other prior evaluations, and the compiler can create a dynamic evaluator that evaluates expressions as soon as their dependencies are finished, using a number of threads optimal for the target architecture.
Modern processors are now quite good at parallel instruction execution, IIRC these 3 instructions
add b[edx], 5
sub eax, ecx
sub ebx, ecx
are executed in parallel since ~Core 2, with the second two running while the first is waiting on two memory accesses.
cool stuff.
Name:
Anonymous2012-07-28 13:07
>>111 True portability is a myth.
I would say it's the least common denominator of architectures capabilities. Obviously, when you need maximum performance/power efficiency, you'll end writing one specialized version for each.
>>125 >>124-san has a point. “Clever” designs suffer from not-invented-here syndrome more often, i.e. more bugs and less support. Masters don't only know how but when they use each tool.
>>130
There are only 10 kinds of people in the world, those who fully understand octal, those who understand it but feign ignorance so their peers won't call them faggots, those who confuse it with hexadecimal, those who confuse it with decimal, those who confuse it with binary, those who refuse to understand it, those who are incapable of understanding it, and those who only use it for UNIX permissions.
I found it rather humorous, but I find you point assignment to be absolute enraging and I hope you slowly die in a fire in which you seek shelter in a pool of mud, and temporarily find relief, but are inevitably slowly cooked by the surrounding flames.
>>112
Actually, the majority of stuff I work on is x86, Windows or Linux. If I was using some other architecture then I'd write in Asm for that too. (Although MIPS and ARM are a bit... boring.)
>>115,118,120
I was expecting replies like this. You still put blind faith in compilers and think they're god-like and better than anyone else. That's what the academia always says. But consider that if it was true, I wouldn't be writing this today. I've probably read through more compiler output than anyone else here, and quite frankly, most of it sucks. ICC is the best I've seen, MSVC is a close second, and GCC is far behind. Delphi is... about the worst I've ever seen.
>>123
The majority of those are provided in different versions for each platform. Like I said above, if you really want portability, use Java.
>>124,125,126
If they're lazy idiots they can let the compiler hold their hand and do everything for them. If they're intelligent they'll find a way to do things more efficiently. It's about letting the programmer exploit her capabilities to the limit.
>>141
It can't produce code as well as something like ICC in some cases. Then again, ICC probably has a massive budget pumped into it compared, so it kinda makes sense.
I was expecting replies like this. You still put blind faith in compilers and think they're god-like and better than anyone else. That's what the academia always says. But consider that if it was true, I wouldn't be writing this today. I've probably read through more compiler output than anyone else here, and quite frankly, most of it sucks. ICC is the best I've seen, MSVC is a close second, and GCC is far behind. Delphi is... about the worst I've ever seen.
Yes, we don't have a qualifying implementation yet. But someday we might. And with the benefits it would bring, I don't understand why more people aren't concerned with improving the tools they use every day. Instead, they just crank out applications using what is currently available and popular. A possible solution to a problem brought up in this thread:
* include a construct in the language that places constraints on data types. This constraint can be formally checked with static code analysis, tested dynamically with asserts in testing, and then used as a hint for optimization when building the release version. The compiler could consider a violation of the constraint as undefined behavior, and assume its truth when generating optimal assembly. This way, you can get your architecture specific optimizations without creating undocumented and undetectable restrictions on the inputs.
Would be nifty:
* Automate profiling of generated machine code. Select constructs that give the best performance based upon empirical data collected from are large set test runs.
If they're lazy idiots they can let the compiler hold their hand and do everything for them. If they're intelligent they'll find a way to do things more efficiently. It's about letting the programmer exploit her capabilities to the limit.
Lazy and idiotic is all relative. Computer are idiots. But they are far from lazy. A VERY determined idiot may be able to write a better program than you if given a disproportionately larger amount of time to do it in.
exploit her capabilities to the limit.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
The problem of emitting efficient SIMD code has confounded
compiler-authors for many years; gcc at least does not appear to
attempt to use SIMD instructions.
I am offended that PHP is listed as a programming language. PHP is not a programming language. It's a miscarriage of engineering, a monstrously deformed abortion that by some cruel fluke of nature has managed to cling to life.
>>158
i don't see why javascript SHOULDN'T be up there, even if PHP weren't listed
Name:
Anonymous2012-07-29 16:53
>>157 >>158
I realize that it mostly works and is useful, and sometimes use it myself when I need a tiny server backend for something, what jostles my flaps is the horrible choices in its design that result in ephemeral and obscure bugs creeping in to large projects. Of the useful, widespread programming languages it has the worst intrinsic flaws and insanities, that aren't the result of deliberate design decisions so much as incompetence. And so I am buttfurious on the internet about it.
I don't understand why more people aren't concerned with improving the tools they use every day
Because they think compilers are somehow magical and mysterious entities. One of (many) somewhat-sluggish projects has been to write a better C compiler, one that generates code closer to how an assembly language programmer would write it---by necessity, not by pattern.
* Automate profiling of generated machine code. Select constructs that give the best performance based upon empirical data collected from are large set test runs.
MSVC has PGO, which is not bad, but is a bit of a pain to use and thus not used much except in special cases. Optimisation should be more "ubiquitous", not a special-case thing to be applied after you discover that things aren't quite right. I know they still teach that "premature optimisation" bullshit; but going back to change something that's been written already is a waste of resources. Optimisation should be integrated into design, thought of as "what's the best way to do this" throughout so that the final product doesn't need much more done to it.
>>151
That quote may be true for those who don't understand fundamentals like binary and machine instructions, but being clever---and thorough with the design---when writing code can in many cases completely avoid ever having to debug it, as you will have essentially proved it correct before ever running it.
>>156
Whatever you call it, it's become ubiquitous for websites for a reason. The same with all the other languages you complain about. "Purity" or whatever you want to call it has no real practical advantage. Pragmatism does.
>>172 Whatever you call it, it's become ubiquitous for websites for a reason.
The Jews at Zend bribed Web hosts.
Name:
Anonymous2012-07-30 7:35
Its only real selling point was that it had a foothold on at least 1% of all the internet's domains by 1998. In 1998 you had a choice of ASP classique, Coldfusion and PHP3 for 1. sticking your spaghetti code all over your HTML which was the fad back then and 2. run in-process in the web server instead of CGI. The next closest competitor had WSGI came about by 2003, mod_python came on a CD for a book in 2000, PHP was at version 3 by 1998. There were others a little later (JSP rushed out in 1999) but by then the Code-In-HTML language market was cornered by PHP on Linux and ASP on Windows.
Its other selling point, hosting companies liked PHP because of the shitty safe_mode when it came to cramming thousands of sites on a server.
>>172 "premature optimisation" bullshit Optimisation should be integrated into design
Truth: asm is less readable and thus less maintanable.
Ideally, programs should be executable algorithms, expressed in adequate abstractions. Optimization should be done just when necessary, because it's architecture-dependent and almost write-only.
Or are you telling me to follow master FrozenVoid? IHBT
>>137
I wanna have a threesome with you and Cudderspace
Name:
Anonymous2012-07-30 18:27
>>172 Optimisation should be integrated into design, thought of as "what's the best way to do this" throughout so that the final product doesn't need much more done to it.
"The best way to do this" is the most readable way to do it. If the optimized way is the most readable then do that, if the nieve way is the most readable then do that. If it profiles badly, THEN rewrite it in the optimized way and leave a small novel in the comments on why you made every choice you made and what its actually accomplishing.
I'm not saying that your way of coding is wrong or bad or anything, I'm just saying that it makes people not want to work with you.
>>180
Sometimes its needed. Sometimes.
But you're likely to spend much more time optimising ahead and them spend much more to fix bugs. See this: http://tibleiz.net/asm-xml/
3 in 4 releases are bugfixes.
>>179
OR I could just write it correctly the first time. Code readability is over-rated. As long as it was not intentionally obfuscated, a programmer worth his salt will be able to figure out what it does (because it's right there in front of his face).
>>183
You're now blurring "optimizing ahead" and "writing shitty code" into one thing. You can not optimize your design ahead-of-time and still have bug riddled code.
Of course, this partially does depend on your language. If you're gonna be writing a decently sizeable system in something like C, you will indeed need to do a fair degree of ``premature optimization'', simply because changing large chunks later takes so much time and will end up introducing more bugs in the process.
>>186
Nice attempt to avoid my argument. Have fun with your slow programs!
Name:
Anonymous2012-07-30 22:18
>>185
The whole point of planning ahead is when you haven't code yet, such as the asmxml project at its beginning. From the ground up it was designed to be optimised, but still had bugs and new feature requests. Maybe a new feature can disrupt your clever design, and you'll be walking on eggs to implement it. But now that the design is clever and everything is coded in asm, it's a pita.
When it comes to performance, no amount of tweaking can compensate for an inherently slow design.
Name:
1882012-07-30 22:40
But you check the "design" and see that it doesn't involve rewriting your app in asm because your compiler doesn't output optimal code.
Name:
Anonymous2012-07-30 22:48
>>184 OR I could just write it correctly the first time. Code readability is over-rated. As long as it was not intentionally obfuscated, a programmer worth his salt will be able to figure out what it does (because it's right there in front of his face).
What if you fucked up everything and are too stupid to realize it, and the superior programmer is reading your fucked up code, trying to infer what the fuck you were actually trying to do from your endless chain of tacit false assumptions?
>>172 That quote may be true for those who don't understand fundamentals like binary and machine instructions
That quote may be Brian W. Kernighan
Name:
Anonymous2012-07-31 2:12
>>172 I don't understand why more people aren't concerned with improving the tools they use every day Because they think compilers are somehow magical and mysterious entities.
People are probably just either satisfied with suboptimal machine code generation or they don't know any better. One of (many) somewhat-sluggish projects has been to write a better C compiler, one that generates code closer to how an assembly language programmer would write it---by necessity, not by pattern.
This would be very nifty. The only implementation strategy for machine code generation I've seen so far is to simply convert syntactical constructs to a stream of equivalent instructions and apply a series of optimizations that maintain equivalence throughout the process. Albeit, there can be other things going on like using a minimum amount of registers. In the whole process, the compiler doesn't really know what you are trying to do. It's just spitting out machine code it knows is equivalent to the higher level source code you gave it, and then simplifying it a bit. An optimizing compiler tries to find a highest performing program within the equivalence class of the program the programmer submitted. So it applies all these operations that it knows preserve equivalence, but will likely improve performance.
It seems like in order for the compiler to truly generate assembly with necessity, it would need to have a complete high level understanding of what you are trying to do. But as soon as I start to think about whether this is possible, or already implemented, it feels like it comes down to a philosophical question. If I turn the steering wheel of my car to the left, does my car realize that I want to go left, and turns its wheels by necessity? Or is it just doing what the mechanical engineer designed it to do, which was to allow rotation of the steering wheel was to directly affect the wheels?
>>177 Truth: asm is less readable and thus less maintanable.
Readability is subjective (size or speed is NOT), but to someone experienced, it's more readable. Even the format is consistent. The only thing is, it also tends to be more voluminous.
>>183
Look at the bugtrackers of other XML libraries and you'll find much worse.
When I was talking about design, I meant decisions that are not easy to reverse, and that can have huge consequences for efficiency. For example, instead of the obvious (and inefficient) "make objects out of everything and copy data into them" parsing algorithm, do it in-place so there's no unnecessary copying of data. Simplify the design to make it more "direct". I'll use the web browser as an example here since we're all familiar with them: how long is the path (in terms of # of instructions or call depth) between Firefox getting a WM_PAINT and its first API call into the OS to draw the page content into the window? In the browser design I'm planning, it's < 5. I'm not planning to do any memory allocation/deallocation there either.
Maybe "optimisation" is the wrong word for this, but what else do you call making something more efficient? If you plan your design like this and have it carefully thought out before writing a single line of code, the chances are high that you'll be ahead of those who "optimise later", without even needing to go to inline Asm. You could say that, despite the compiler's stupidity, everyone is using the same compiler but your design has already been optimised at a higher level, so its inefficient output is still going to be more efficient than the output from an inefficient design.
>>193 It seems like in order for the compiler to truly generate assembly with necessity, it would need to have a complete high level understanding of what you are trying to do.
That's not necessary, all it needs to understand is the semantics of the source language. What we agree is not statement-by-statement translation and applying optimisation afterwards (there's that anti-premature-optimisation again!), but my idea is to generate code "backwards", working from the desired result. E.g. in
int foo() {
int i = f();
int j = k();
int l = 2 + j;
... /* code that does a lot with l, but never has effect on i, j, nor other global variables */
return i + 2*j;
}
a "dumb" or "traditional" compiler would emit code for all that stuff in the middle, and optimisation might have a chance at removing (some of) it and moving variables into registers. An "intelligent" compiler could work backwards and "think" "This function's result depends on i and j. What do i and j depend on? f() and k(). What do f() and k() depend on? ... " Eventually it might determine that f() and k() are actually constant, and substitute that in, propagate the changes down, and reduce foo() itself to a constant. And of course, all of this might not even be done if foo() can never get called from any entry point. At every reference to a nonlocal variable, this process would need to be performed as their results are "visible" to other functions.
For things like register allocation (done after the above), the compiler could track how many variables are actually needed, and then choose how many extra "slots" are required in memory. It can alter their allocation to registers depending on their usage and loop nesting level. In size/balanced optimisation mode, if it sees certain variables having LIFO-like usage patterns, it can emit push/pop (single byte instructions) instead of explicitly doing a stack allocate and moves. If there's a long-running inner loop with frequently accessed variables, push the less frequently used variables on the stack. This is how an Asm programmer does register allocation by necessity.
Due to the halting problem it's not possible to prove that f() or k() terminate even if they do not depend on nonlocal variables, but the compiler can offer an option to simulate their execution for a limited number of cycles. No deep understanding (other than language semantics) required by the compiler, just a different way of approaching the problem.
Follow this process to its logical conclusion, even into library functions and such, and your printf("%d\n", fib(5)); becomes a fputs("2178309\n", stdio);. In the output binary too, there will be nothing more than what the compiler found was needed. That is the ultimate goal.
>>194
If you plan your design like this and have it carefully thought out before writing a single line of code, the chances are high that you'll be ahead of those who "optimise later", without even needing to go to inline Asm.
well the issue isn't that you're making things more efficient, efficiency is generally good, the issue is that it's your only priority.
take a contrived fibinacci example. you don't want to use the recursive O(2^n) algorithm...
but you also probably also don't want to use the O(1) algorithm unless you have a REALLY good reason to:
>>197 but you also probably also don't want to use the O(1) algorithm unless you have a REALLY good reason to
Why? The meaning of the function is captured by int Fibonacci(int n). How it is implemented is none of anyone's business.
Name:
Anonymous2012-07-31 12:42
>>197
what is the value of (int)std::numeric_limits<double>::infinity(), anyway?
>>199 How it is implemented is none of anyone's business.
Also you make a lot of statements like this, leading me to believe that you've never actually worked with other people before. Am I correct?
That's not necessary, all it needs to understand is the semantics of the source language. What we agree is not statement-by-statement translation and applying optimisation afterwards (there's that anti-premature-optimisation again!), but my idea is to generate code "backwards", working from the desired result. E.g. in
int foo() {
int i = f();
int j = k();
int l = 2 + j;
... /* code that does a lot with l, but never has effect on i, j, nor other global variables */
return i + 2*j;
}
a "dumb" or "traditional" compiler would emit code for all that stuff in the middle, and optimisation might have a chance at removing (some of) it and moving variables into registers. An "intelligent" compiler could work backwards and "think" "This function's result depends on i and j. What do i and j depend on? f() and k(). What do f() and k() depend on? ... " Eventually it might determine that f() and k() are actually constant, and substitute that in, propagate the changes down, and reduce foo() itself to a constant.
Yeah, this can actually be formulated as a simple graph algorithm. Every intermediate expression is a node. An expression X has a directed edge to an expression Y if and only if the evaluation of X depends on the result of the evaluation of Y. We then look at the node for the expression being returned from the subroutine. Perform a depth first traversal. In the post order phase, emit assembly instructions for evaluating the current node. This will ensure the dependencies for each expression are evaluated in time. Further more, expressions whose values do not contribute to the return value will not be reachable from the return expression node. So the depth first traversal will never reach, and never emit assembly for these unused expressions. And even further more, but the expression dependency graph also provides a scheme for evaluating dependencies in parallel. The disadvantage of trying to take advantage of that is the cost of synchronization. The time it takes to evaluate an expression node may vary very much with different inputs. Effective synchronization would likely need to be dynamic, and the inherent cost of managing this would only make it advantages when the expressions have intensive evaluation procedures.
And of course, all of this might not even be done if foo() can never get called from any entry point. At every reference to a nonlocal variable, this process would need to be performed as their results are "visible" to other functions.
Yeap.
For things like register allocation (done after the above), the compiler could track how many variables are actually needed, and then choose how many extra "slots" are required in memory. It can alter their allocation to registers depending on their usage and loop nesting level.
Yeah, register allocation can be stated as a graph coloring problem.
http://en.wikipedia.org/wiki/Graph_coloring#Register_allocation The textbook approach to this problem is to model it as a graph coloring problem.[29] The compiler constructs an interference graph, where vertices are symbolic registers and an edge connects two nodes if they are needed at the same time. If the graph can be colored with k colors then the variables can be stored in k registers.
For general graphs, finding an optimal coloring is NP hard, and finding a coloring using less than k colors is NP complete. But you can get your optimal register allocation if you are willing to wait. If the variable interference graph has certain characteristics, it could be possible to color it easier.
In size/balanced optimisation mode, if it sees certain variables having LIFO-like usage patterns, it can emit push/pop (single byte instructions) instead of explicitly doing a stack allocate and moves. If there's a long-running inner loop with frequently accessed variables, push the less frequently used variables on the stack. This is how an Asm programmer does register allocation by necessity.
Cool stuff. This could be stated as an optimization problem.
Due to the halting problem it's not possible to prove that f() or k() terminate even if they do not depend on nonlocal variables, but the compiler can offer an option to simulate their execution for a limited number of cycles. No deep understanding (other than language semantics) required by the compiler, just a different way of approaching the problem.
I saw a language that offered something similar. At function definitions you could specify a maximum time out for the function. This might be useful in some real time applications, but I could see it getting in the way otherwise.
Follow this process to its logical conclusion, even into library functions and such, and your printf("%d\n", fib(5)); becomes a fputs("2178309\n", stdio);. In the output binary too, there will be nothing more than what the compiler found was needed. That is the ultimate goal.
Yeap, that's all possible with function inlining and constant folding. If the printf implementation prepares the "2178309\n" string by writing into a buffer, the compiler would need to understand this and evaluate the resulting state of the buffer at compile time.
The disadvantage of trying to take advantage of that is the cost of synchronization.
???
As I mentioned above all recent CPUs automatically parallelise when there are no dependencies between instructions.
For general graphs, finding an optimal coloring is NP hard, and finding a coloring using less than k colors is NP complete.
I've never really understood this obsession with graph colouring for RA. Perhaps it's too general for register allocation, or register-memory/register-immediate/memory-immediate architectures like x86 can avoid this issue. In the flow graph, think of a wire connecting two nodes for each variable's value that needs to be preserved. Then the "thickness" of the "bundle" between each node is how many units (bytes e.g. if each wire represents 8 bits) of data need to be preserved between them. Assign the ones used in the innermost loops/most often to registers first. The rest can go in memory, or swapped with another if there's a change in use (x86 conveniently provides an xchg instruction for this.)
This algorithm is quite different from any I've seen before (except perhaps linear scan, although that one is not particularly efficient) and has some traits of the optimise-first approach: instead of attempting to allocate everything to registers and "spill" afterwards, determine the maximum storage needed first, then allocate to both memory and registers. For example you have 80 bits of "real" local variables (i.e. values that actually get used and need preservation, which may not correspond to the number of declarations in the source code) and 56 bits of registers. A dumb graph-colouring algorithm is NOT going to see the 24 bits until it's too late. Then it goes back, "spills", tries again, etc. The algorithm I described above is going to allocate the 24 bits somehow (pre-allocation via stack, push/pop, etc.), and fit everything in naturally.
At function definitions you could specify a maximum time out for the function.
That's not what I meant. It's a compile-time optimisation constraint to prevent compilation from taking too long, at the expense of possible lost pre-computation. E.g. you write a program that computes 64K digits of pi. This is self-contained and should terminate, so theoretically could be computed by the compiler. With a low precomputation limit, it doesn't finish so at best the compiler can save the partial state and you end up with e.g. a program that contains 4K digits already computed and another 60K to go. Set it high enough and you get a program that only contains the 64K digits you want because the computation terminated in the compiler.
If the printf implementation prepares the "2178309\n" string by writing into a buffer, the compiler would need to understand this and evaluate the resulting state of the buffer at compile time.
It's just the collection of visible state changes induced by the program. You could just collect the library function calls it made, along with their associated data, and emit code to do only that, if the program depended on no input. Programs are deterministic. If they have no input they will always produce the same output (or not halt). If they halt with no input then capture their output and that is the embodiment of the process it performs.
Name:
Anonymous2012-08-02 0:59
As I mentioned above all recent CPUs automatically parallelise when there are no dependencies between instructions.
Yes, so it would be silly to explicitly use threads to evaluate expressions whose evaluations are only a few primitive opcodes long. But if the values are the returned values from procedures that take inputs and take a while, and have time of execution that varies with their inputs, spawning child threads to compute the values and then joining them for the calculation of the common dependency will allow for the dependencies to be calculated in parallel.
I've never really understood this obsession with graph colouring for RA. Perhaps it's too general for register allocation,
To answer this question, one must ask if it is possible to create a chunk of code that will yield a variable interference graph isomorphic to an arbitrary graph.
or register-memory/register-immediate/memory-immediate architectures like x86 can avoid this issue.
Only if the number of registers is two.
In the flow graph, think of a wire connecting two nodes for each variable's value that needs to be preserved. Then the "thickness" of the "bundle" between each node is how many units (bytes e.g. if each wire represents 8 bits) of data need to be preserved between them.
Larger data types can be represented as a series of independent variables, where each variable represents a fixed size portion of the data type. If allocated in memory, and extra restriction that each portion must be in successive memory addresses can be assigned.
Assign the ones used in the innermost loops/most often to registers first.
why? And what's a loop in this context? what code representation are you operating on?
The rest can go in memory, or swapped with another if there's a change in use (x86 conveniently provides an xchg instruction for this.)
Yes, if there is spillage, there will need to be registers reserved for swapping values in between the swapping register and the variables permanent memory address.
This algorithm is quite different from any I've seen before (except perhaps linear scan, although that one is not particularly efficient) and has some traits of the optimise-first approach:
Without a proof of optimality, it's crap.
instead of attempting to allocate everything to registers and "spill" afterwards, determine the maximum storage needed first, then allocate to both memory and registers. For example you have 80 bits of "real" local variables (i.e. values that actually get used and need preservation, which may not correspond to the number of declarations in the source code) and 56 bits of registers.
Multiple variables can still be allocated in a single memory address, much how multiple variables can be allocated in a single register. An optimal graph coloring will give you a smallest number of slots needed to allocate your variables. Some of these slots will be in registers and if there are too many slots, they will be in memory as well. But in any case, a minimal amount of slots and memory will be used.
A dumb graph-colouring algorithm is NOT going to see the 24 bits until it's too late. Then it goes back, "spills", tries again, etc.
The graph coloring algorithm doesn't know about registers or spills. It just gives you colors. How you interpret and use these colors in a separate issue.
The algorithm I described above is going to allocate the 24 bits somehow (pre-allocation via stack, push/pop, etc.), and fit everything in naturally.
naturally. I wish this word was stricken from anything technical. If it ain't optimal it's crap.
That's not what I meant. It's a compile-time optimisation constraint to prevent compilation from taking too long, at the expense of possible lost pre-computation. E.g. you write a program that computes 64K digits of pi. This is self-contained and should terminate, so theoretically could be computed by the compiler. With a low precomputation limit, it doesn't finish so at best the compiler can save the partial state and you end up with e.g. a program that contains 4K digits already computed and another 60K to go. Set it high enough and you get a program that only contains the 64K digits you want because the computation terminated in the compiler.
my bad. That's a good point to be aware of if pushing those computations to compile time.
It's just the collection of visible state changes induced by the program. You could just collect the library function calls it made, along with their associated data, and emit code to do only that, if the program depended on no input. Programs are deterministic. If they have no input they will always produce the same output (or not halt). If they halt with no input then capture their output and that is the embodiment of the process it performs.
Yeah, constant folding on function calls can go a long way. It would be most powerful if it could adapt to almost any user function, as well as library functions.
Anyways, I'm going to try to get optimal register allocation in polynomial time. Wish me luck!
why? And what's a loop in this context? what code representation are you operating on?
Obviously because they will be read/written the most and thus deserve to be in registers, at least for the duration of that loop. It should also be obvious what a loop is, it's whatever corresponds to a loop in the source language. A loop within a loop is also similarly self-explanatory. The representation doesn't matter.
Yes, if there is spillage, there will need to be registers reserved for swapping values in between the swapping register and the variables permanent memory address.
This is where it seems you're still stuck with "the old way": a variable does not ever need to be at a "permanent memory address". There doesn't even need to be any correlation between how variables are defined and used in the source and the output, as long as they are semantically equivalent. Think of a variable as just a label for a "flow" of data between two or more operations. That flow might not exist if the operations that it connects aren't important to the final result of the function.
Without a proof of optimality, it's crap.
That's what academics think. The real world says otherwise.
Multiple variables can still be allocated in a single memory address, much how multiple variables can be allocated in a single register.
But multiple registers OR memory addresses may be allocated to one "variable" too.
The graph coloring algorithm doesn't know about registers or spills. It just gives you colors. How you interpret and use these colors in a separate issue.
That's exactly the point I'm making: it's too general.
Anyways, I'm going to try to get optimal register allocation in polynomial time. Wish me luck!
According to the RA article on Wikipedia this is possible with SSA, and there's nothing about anything being NP-complete on its page. Unfortunately they're both full of academic verbiage so it's difficult to determine what they're really talking about.
Another two examples of why more "targeted" algorithms can work better: instructions that require operands in particular registers, and those that prefer particular registers. With traditional graph colouring, you can force allocation of a certain register, but there's no way to specify a preference. An Asm programmer can think "there's a divide coming up: let's arrange things so that its input will be in the right registers when the time comes" and override that choice with something more important when necessary. Traditional RA algorithms can't do that.
tl;dr: Register allocation is one area where compilers could improve significantly, but due to a reluctance to challenge existing norms, has not. It's only recently that stuff like linear-scan (closer to what someone with common sense, but not exposure to previous biases, would come up with) and SSA started getting visibility. These basics should've been where we started.
To whoever started the insistence that register allocation is as general a problem as graph colouring and propagated it: FUCK YOU.
Obviously because they will be read/written the most and thus deserve to be in registers, at least for the duration of that loop. It should also be obvious what a loop is, it's whatever corresponds to a loop in the source language. A loop within a loop is also similarly self-explanatory. The representation doesn't matter.
It sounds like you are operating on the source language directly. In the version I was thinking of, it was to take abstract assembly that uses an unbounded amount of variables, and then output the same assembly but using as few variables as possible. When operating at the assembly level, loop constructs can become very complex and general. I'm not sure which approach would be better. It feels like it would be better to do it after the compiler has done its other optimizations, but I could be wrong.
This is where it seems you're still stuck with "the old way": a variable does not ever need to be at a "permanent memory address". There doesn't even need to be any correlation between how variables are defined and used in the source and the output, as long as they are semantically equivalent. Think of a variable as just a label for a "flow" of data between two or more operations. That flow might not exist if the operations that it connects aren't important to the final result of the function.
Well that's interesting. Is there an advantage to allocating a single variable in multiple locations? Each location is essentially a variable. So this could be modeled as variable assignment. I could see this getting very interesting if each storage location had a different cost of access. But assuming each register is equally cheap to read and write to, all you have to worry about is swapping values in and out of memory and the behavior of the special purpose registers, in which case, you might not need to go there.
That's what academics think. The real world says otherwise.
The real world is satisfied with crap.
But multiple registers OR memory addresses may be allocated to one "variable" too.
I'm not sure if there is an advantage in doing that. The algorithm would be creating a new variable that is equivalent to another previously defined one. If the goal is to create an equivalent program using the fewest amount of variables, and hence, needing the fewest amount of storage locations, then creating new variables doesn't seem like a good idea. But if the goal is to increase performance, then maybe it's possible in some circumstances to use more storage locations. At that point you'd need to work with a model closely related to the processor. It wouldn't be enough to just assume that using the minimum amount of storage locations will give you the most speed, although you'd think that would be true if there was no spillage as a result.
That's exactly the point I'm making: it's too general.
Whether or not graph coloring is too general for register allocation depends the class of graphs created in the construction of the variable interference graph. If it can form an arbitrary graph, then optimal register allocation is just as hard as graph coloring. But if the class of graphs is actually more limited, then it can be possible to use a special purpose graph coloring algorithm that takes advantage of properties present in the restricted class of graphs.
According to the RA article on Wikipedia this is possible with SSA, and there's nothing about anything being NP-complete on its page. Unfortunately they're both full of academic verbiage so it's difficult to determine what they're really talking about.
Good news! In the version I am working with, I am getting lines in the assembly where each variable must be holding its current value. If in a single line there are multiple variables that must hold their values, then these variables cannot be allocated in the same place. This yields a set of cliques. Now finding a clique in a graph is NP Complete, and the encoding we get for the variable interference graph is all of its cliques. So if it is possible to optimally color a graph using knowledge of what all its cliques are, then that could be used to optimally color the variable interference graph. I got something working for one example, but once I apply it to general graphs I get lost. Maybe there's a structure to the variable interference graph that I'm unknowingly taking advantage of.
Another two examples of why more "targeted" algorithms can work better: instructions that require operands in particular registers, and those that prefer particular registers. With traditional graph colouring, you can force allocation of a certain register, but there's no way to specify a preference.
The problem becomes much more complex, but you can use utility functions to rank the optimality of a coloring. But this requires a calculation of a utility function.
An Asm programmer can think "there's a divide coming up: let's arrange things so that its input will be in the right registers when the time comes" and override that choice with something more important when necessary. Traditional RA algorithms can't do that.
They can understand that the registers used by the divide instruction must be used to store the variables resulting from the division. This can be represented by allocating variables for the result of the division and forcing their colors to represent the registers they are put in by the instruction.
tl;dr: Register allocation is one area where compilers could improve significantly, but due to a reluctance to challenge existing norms, has not. It's only recently that stuff like linear-scan (closer to what someone with common sense, but not exposure to previous biases, would come up with) and SSA started getting visibility. These basics should've been where we started.
Indeed. But that makes it more exciting doesn't it? The late start makes it more likely that you might discover something new.
To whoever started the insistence that register allocation is as general a problem as graph colouring and propagated it: FUCK YOU. so sorry! But really graph coloring is just solving the same problem in a more abstract setting. If you want to build more restrictions from the platform into the algorithm, just add more information to your definition of the graph to reflect the extra restrictions. These additions are usually small, and the advantage of using graphs is that all research and developments that have gone into graph algorithms will be easy for you to leverage. But there is the problem of graphs being such general objects. It could be that your domain is much less general than graph coloring. In which case it may be more useful to operate within the domain itself and never create a graph.
In the version I was thinking of, it was to take abstract assembly that uses an unbounded amount of variables, and then output the same assembly but using as few variables as possible.
That's basically SSA. Determining loop nesting level in Asm isn't difficult either: count the maximum number of complete cycles in the flow control graph enclosing a use of a given "variable".
Is there an advantage to allocating a single variable in multiple locations?
As I said in what you quoted, at this phase of compilation you should forget about what variables were present in the source code, and think of them as only labels for a flow of data between two computational nodes. Registers and memory serve as temporary storage locations for this flow, and thus it may occupy different locations at different points in the code. The advantage is additional freedom in code generation. But assuming each register is equally cheap to read and write to
They're not. For example, the accumulator is preferred for many instructions. (e)DX must hold the remainder of a division. If a division is coming up but all the registers are already used for values that will be used afterwards, it makes sense to move whatever DX is storing into memory (possibly a push on the stack) and pull it back into a (possibly different) register sometime later when the compiler decides that it'll be used enough that it should get a register.
The real world is satisfied with crap.
Whatever you call it, that's the way things are.
I lost you at the graph colouring stuff, so I can't say much there.
The late start makes it more likely that you might discover something new.
It's only new to those who have buried their heads in the academic "literature" (propaganda?) all along. Try asking any Asm programmers if they colour graphs to allocate registers when they write code. I sure don't.
But really graph coloring is just solving the same problem in a more abstract setting.
Too abstract. If I were forced to choose an NP-complete problem to represent register allocation, it would be bin packing. At least that intuitively feels like the process of allocating items (values) to different bins (registers).
But there is the problem of graphs being such general objects. It could be that your domain is much less general than graph coloring.
I have a very strong feeling that it is: If all programs can be transformed into SSA form and doing so isn't NP-complete, and by doing so you can allocate registers in polynomial time, whereas doing it for the same program not in SSA is supposed to be NP-complete, then isn't that like saying P=NP? Following this to its conclusion, take any NP-complete problem, reduce it to graph colouring, reduce graph colouring to RA, derive a program from it, convert the program to SSA, RA that, and you've solved the NP-complete problem in polynomial time.
The other big drawback of thinking so abstractly is that only those who know much about graph theory (unlike me) can talk about RA like that. It creates a rather closed-minded environment and propagates the "thou shalt use graph colouring" myth. In contrast, the way I describe and think about it, it wouldn't be difficult at all for even a 10-year-old to understand and relate to the underlying problem. Getting rid of all that academic cruft and theory is a great way to get others thinking about these sorts of problems, and as a result drive innovation and progress through greater proliferation of practical, readily applicable ideas.
"Farmers don't want to learn about spherical cows. They just want the milk."
Name:
Anonymous2012-08-04 3:43
>>223 That's basically SSA. Determining loop nesting level in Asm isn't difficult either: count the maximum number of complete cycles in the flow control graph enclosing a use of a given "variable".
Cool stuff.
As I said in what you quoted, at this phase of compilation you should forget about what variables were present in the source code, and think of them as only labels for a flow of data between two computational nodes. Registers and memory serve as temporary storage locations for this flow, and thus it may occupy different locations at different points in the code. The advantage is additional freedom in code generation.
A limitation with the typical graph coloring approach is that assignments to registers are static and never change during the variable's life time. I think it could be developed to account for this though. At each time (at each line in the assembly) the live variables must be allocated in some known assignment to registers and memory. But this assignment is subject to change at any point in the program. So rather than assigning a variable v to a register r, a variable v together with a line in the assembly l will be assigned to a register. So rather than calculating a function mapping variable -> storage, it would be calculating a function (variable, time) -> storage. Although then you'd have to account for the extra instructions required for moving the values around. It's not quite the same as the register allocation problem I'm used to, which was to just relabel the registers to use as few as possible.
They're not. For example, the accumulator is preferred for many instructions. (e)DX must hold the remainder of a division. If a division is coming up but all the registers are already used for values that will be used afterwards, it makes sense to move whatever DX is storing into memory (possibly a push on the stack) and pull it back into a (possibly different) register sometime later when the compiler decides that it'll be used enough that it should get a register.
The limitation with DX can be expressed in graph coloring by assigning the variable representing the output remainder from the division instruction the color representing the DX register, and then forcing the colorer to deal with the predefined constraint. I'm not familiar enough with x86 to respond specifically about the accumulator, but if there are registers with faster access for certain instructions, variables used by those instructions should get priority for allocation in those registers. To calculate this, one can consider the instructions applied to a variable during the variable's lifetime, and store the instructions in an array. Depending on the assignment to register for the variable, each instruction will have a certain cost. The algorithm could then try to put the variable in a register that minimizes the sum cost. Now this is where the ability to move variables between registers comes in handy. If the first ten instructions would perform best if X was in R1, and then the next twenty performed best if X was in R8, it could be worth the extra mov instruction to move X from R1 to R8. This gets really complicated, especially when there are other live variables that must also share the registers. But it can be stated as an optimization problem, and then solved in a brute force manner or guessed well with heuristics.
Whatever you call it, that's the way things are.
No, I understand. People are interested in what is practical. It isn't practical to wait 2 months for your program to compile if the output program is 4% faster than a build that took 4 minutes to compile.
I lost you at the graph colouring stuff, so I can't say much there.
Sorry. The variable interference graph is a graph, where the nodes represent variables and there is an undirected edge connecting two variables if the variables are live at the same moment, at some point in the program. A clique is a set of vertices that are all mutually adjacent. In the context of register allocation, it is a set of variables that all cannot share a register in their register allocation. In the input to the register allocation algorithm I was working with, the algorithm would scan the assembly and calculate which variables were live and which were not, for each line in the assembly. So at each line, there is a set of live variables. All variables that are live in the same line cannot share any registers in their allocation (assuming the allocation is static). So at each line of the assembly, I get a clique in the variable interference graph. Normally one gets encodings of graphs in the form of an adjacency list, which is just a list of all the nodes, were each node contains a list of all adjacent edges. Since in this register allocation problem you get a list of cliques instead, doing graph coloring might be easier.
It's only new to those who have buried their heads in the academic "literature" (propaganda?) all along. Try asking any Asm programmers if they colour graphs to allocate registers when they write code. I sure don't.
Graph coloring is sort of similar to sudoku, cross word puzzles, etc. It's an abstract representation of some of the basic constraint satisfaction problems people solve themselves every day. But people usually use creative methods to solve the problems. Computers are more brute force, although heuristics can be used as well.
Too abstract. If I were forced to choose an NP-complete problem to represent register allocation, it would be bin packing. At least that intuitively feels like the process of allocating items (values) to different bins (registers).
Yeah, maybe. Graph coloring is useful for a lot of constraint satisfaction problems. The register allocation problem is to allocate variables to registers with the constraint that variables live in the same moment are not allocated to the same register. So it translates well for that.
I have a very strong feeling that it is: If all programs can be transformed into SSA form and doing so isn't NP-complete, and by doing so you can allocate registers in polynomial time, whereas doing it for the same program not in SSA is supposed to be NP-complete, then isn't that like saying P=NP? Following this to its conclusion, take any NP-complete problem, reduce it to graph colouring, reduce graph colouring to RA, derive a program from it, convert the program to SSA, RA that, and you've solved the NP-complete problem in polynomial time.
I agree. And that's a nice proof right there. I was very confused when I read about that and I was wondering why someone hadn't written something that got some more consistency on this.
The other big drawback of thinking so abstractly is that only those who know much about graph theory (unlike me) can talk about RA like that. It creates a rather closed-minded environment and propagates the "thou shalt use graph colouring" myth. In contrast, the way I describe and think about it, it wouldn't be difficult at all for even a 10-year-old to understand and relate to the underlying problem. Getting rid of all that academic cruft and theory is a great way to get others thinking about these sorts of problems, and as a result drive innovation and progress through greater proliferation of practical, readily applicable ideas.
It's true. Abstraction in general is this way. The further you get from the original problem, the less intuitive it will appear. But there is the possibility that there are other problems out there with very different details, environments, and jargon, but are actually the same problem. By abstracting away the details and preserving only the concept of the problem, you can unify all these applications. And then development and solutions for the abstraction will apply to all of its applications. This can be useful sometimes, but it always creates a barrier to understanding. Especially when the abstract terminology begins to stack on top of itself and no one other than a researcher would be interested in learning it all. The abstraction can also be lossy. There might be details in the original problem that are lost in the abstraction. In this case, solutions to the abstraction will need to be extended to account for the new details before they can be applied.
"Farmers don't want to learn about spherical cows. They just want the milk."
Indeed.
Personal: C, Brainfuck, KSH93, AWK, and RISC's ASM
Name:
Anonymous2012-08-04 7:51
OCaml actually has a second implementation: F#. Considering it's Microsoft's, it really isn't that much better than the INRIA one but it does have some pros for real world use. Like Unicode support.
Otherwise, polymorphic, strongly typed, higher order, functional, right-to-left call-by-value is the best "modern GP language" has to offer now.
At each time (at each line in the assembly) the live variables must be allocated in some known assignment to registers and memory. But this assignment is subject to change at any point in the program.
From what I've read, that's part of the principle behind SSA: to split up the idea of a "variable" into multiple ones, and each is only used once, so they correspond exactly to labelling the flow of data between operations. However that still clings to the notion of a "variable", which should be considered source-code-only. When generating code, it's better to think of how data flows between different operations in order to finally arrive at one or more final results.
Although then you'd have to account for the extra instructions required for moving the values around. It's not quite the same as the register allocation problem I'm used to, which was to just relabel the registers to use as few as possible.
A move is not necessary in many cases when the operation does not increase the total storage needed, especially with more flexible architectures. This is better illustrated with an example:
reduce total storage; one less slot needed:
(C = A + B, A and B not used afterwards)
A B AX BX
| | | | | |
+---*---+ -> *-------+ or +-------*
| | <free> <free> |
C AX
(generates ADD AX, BX (generates ADD BX, AX
and frees BX) and frees AX)
unchanged total storage:
(C = A + B, B is unused afterwards)
A B AX BX
| | | |
+---*---+ -> +-------*
| | | |
A C AX BX
(generates ADD BX, AX
and AX continues, BX is new flow)
increase total storage:
(C = A + B, both A and B are used afterwards)
A B AX BX
| | | |
+---*---+ -> + *---+ (MOV CX, BX)
| | | | | |
A C B AX CX BX
| | |
+---* | (ADD CX, AX)
| | |
Because of the flexibility of x86, CX above could be replaced with a memory location (e.g. [BP+4]) if the compiler determined that CX was still in use and should stay there, and so should all the other registers, or move its contents into memory before the move, if circumstances favour it being used for the result of the addition.
Using the fewest registers is also not necessarily the best policy; processors can parallelise better when multiple registers are used, and the instructions are not dependent. For example, although the minimum number of registers for memory-to-memory copies is 1, when there are several of these to be done, you'll see e.g. mov eax, [esp+8]
mov ecx, [esp+16]
mov edx, [esp+64]
mov ebx, [esp+96]
mov [esi+4], eax
mov [esi+8], ecx
mov [esi+12], edx
mov [esi+16], ebx
instead of repeatedly using eax. Thus the algorithm should, when the minimum number of registers used is below the amount available, shuffle the order of operations (while maintaining their dependency ordering) to increase it. The optimal solution isn't the minimum number of registers, but the exact amount available.
Graph coloring is sort of similar to sudoku, cross word puzzles, etc. It's an abstract representation of some of the basic constraint satisfaction problems people solve themselves every day. But people usually use creative methods to solve the problems. Computers are more brute force, although heuristics can be used as well.
Thus we should examine those "creative methods" and quantify the exact steps and decisions they take, in order to implement an algorithm that can behave like a programmer thinks, with the added advantage of computer speed.
I agree. And that's a nice proof right there. I was very confused when I read about that and I was wondering why someone hadn't written something that got some more consistency on this.
Because of the "accepted fact" that graph colouring is the way to do register allocation, and that it's proved to be NP-complete, few academics have challenged those norms and entertained the possibility that RA might not be. It's necessary to "think outside the box" here.
There might be details in the original problem that are lost in the abstraction.
By using abstraction to transform RA to graph colouring you have transformed a very specific problem into a very general one, and increased its complexity. Because I am quite convinced that RA is not NP-complete I do not believe in using this abstraction.
In this case, solutions to the abstraction will need to be extended to account for the new details before they can be applied.
Why not forget about it completely and think of the solution to the specific problem from the ground up, using first principles? It seems people are more interested in building abstractions on top of abstractions than solving the actual problem, and I do not think that is a good use of resources. By indoctrinating others in this way, it becomes harder for them to think freely and innovate. The fact that linear-scan and the interest in SSA stuff is so recent clearly illustrates this.
In summary, it's an issue of perspective. Asm programmers do not think of variables much, but more about results of computation being held in registers/memory, and probably wouldn't care about assigning names to them. It's not "eax is currently holding j", it's "eax is holding the result of ...", and they're free to move that data around without having to think about what it's called. The flow model I talk about is a lot closer to this, and SSA is a step in the right direction. It's a little sad to see all this research being spent on these traditional directions, and not gaining much, when we could've had much better compilers back then, if they had chosen to think about these things "bottom-up".
>>223 They're not. For example, the accumulator is preferred for many instructions.
There hasn't been a real "accumulator" in Intel CPUs since the Pentium Pro. x86's non-uniform accumulator architecture was obsolete before it was designed (compare PDP-11, 68K, VAX and every RISC). It helped make the 8086 cheaper than 68K back in 1978 when the accumulator was attached to the ALU but now it just makes things harder for programmers and hardware designers. Learn an assembly language other than x86, and wonder why anyone would willingly use Intel crap (especially the ugly 32-bit and 64-bit hacks) if IBM didn't choose it for the original PC (for its cheap price, not for technical merit) and the Wintel monopoly didn't shove it down OEMs' throats.
>>228 It's a little sad to see all this research being spent on these traditional directions, and not gaining much, when we could've had much better compilers back then, if they had chosen to think about these things "bottom-up".
Or if IBM had chosen a CPU with a uniform register set for their PC.
From what I've read, that's part of the principle behind SSA: to split up the idea of a "variable" into multiple ones, and each is only used once, so they correspond exactly to labelling the flow of data between operations. However that still clings to the notion of a "variable", which should be considered source-code-only. When generating code, it's better to think of how data flows between different operations in order to finally arrive at one or more final results.
The variables can always be closer to the assembly level than the source. For example, divisor and modulus result variables can be introduced for ever divide instruction.
In response to the in place add example
I think the best way to take advantage of this would be to generate code in an abstract assembly where addition always created a new variable to follow SSA style. Then apply the optimal register allocation algorithm, which will collapse many of the variables together. Then expand the add instructions as:
add X, Y -> C becomes:
MOV C, X
ADD C, Y
add X, Y -> Y becomes:
ADD Y, X
add X, Y -> X becomes:
ADD X, Y
Using the fewest registers is also not necessarily the best policy; processors can parallelise better when multiple registers are used, and the instructions are not dependent. For example, although the minimum number of registers for memory-to-memory copies is 1, when there are several of these to be done, you'll see e.g.
mov eax, [esp+8]
mov ecx, [esp+16]
mov edx, [esp+64]
mov ebx, [esp+96]
mov [esi+4], eax
mov [esi+8], ecx
mov [esi+12], edx
mov [esi+16], ebx
>instead of repeatedly using eax. Thus the algorithm should, when the minimum number of registers used is below the amount available, shuffle the order of operations (while maintaining their dependency ordering) to increase it. The optimal solution isn't the minimum number of registers, but the exact amount available.
That's a good point. The compiler will need to take this parallel execution into account as well. If you had an arbitrarily large amount of registers, you would never store two variables in the same register, to avoid locking up the parallelism. Sequential operations that use the same register will cost more than sequential operations that do not depend on each other. This cost can be incorporated by taking into account the previous instructions when calculating the cost of the current instruction. If the current instruction depends on the execution of close previous instructions, it should cost more.
Thus we should examine those "creative methods" and quantify the exact steps and decisions they take, in order to implement an algorithm that can behave like a programmer thinks, with the added advantage of computer speed.
The problem is that people rarely have a proof of optimality for their solution if the problem is very large. People instead invest effort into the solution until it is good enough. Once it is no longer worth continued effort, they stop and use what they have.
Because of the "accepted fact" that graph colouring is the way to do register allocation, and that it's proved to be NP-complete, few academics have challenged those norms and entertained the possibility that RA might not be. It's necessary to "think outside the box" here.
If RA is not NP complete, then either of two things are probably true: the graph to color has a restrictive structure and can be optimally colored by a polynomial time algorithm, or the graph is arbitrary but the assembly code encodes accessible information about the structure of the graph that can be used to get a coloring in polynomial time, and the coloring becomes difficult only if you ignore this information.
By using abstraction to transform RA to graph colouring you have transformed a very specific problem into a very general one, and increased its complexity. Because I am quite convinced that RA is not NP-complete I do not believe in using this abstraction.
Translating RA to graph coloring directly will be lossy. Kind of like taking a cryptographic hash of a string then then trying to recover the original string. But the information in RA that makes it easier than general graph coloring could also appear in other contexts. So it is worth abstracting. It's just that the conventional abstraction is misguided.
Why not forget about it completely and think of the solution to the specific problem from the ground up, using first principles? It seems people are more interested in building abstractions on top of abstractions than solving the actual problem, and I do not think that is a good use of resources. By indoctrinating others in this way, it becomes harder for them to think freely and innovate. The fact that linear-scan and the interest in SSA stuff is so recent clearly illustrates this.
The theoretical developments indirectly pay off to a lot of things. Just think, where did SSA come from?
In summary, it's an issue of perspective. Asm programmers do not think of variables much, but more about results of computation being held in registers/memory, and probably wouldn't care about assigning names to them. It's not "eax is currently holding j", it's "eax is holding the result of ...", and they're free to move that data around without having to think about what it's called. The flow model I talk about is a lot closer to this, and SSA is a step in the right direction. It's a little sad to see all this research being spent on these traditional directions, and not gaining much, when we could've had much better compilers back then, if they had chosen to think about these things "bottom-up".
A lot of things that appear to be very different can turn out to be equivalent. But graph coloring appears to have more research because it is a more general problem, and has a broader range of applications. But even then, I wouldn't consider it that popular. But then register allocation is a step further into the specific. No one really cares about compilers. Just look at databases, networking, and graphics. Then look at compilers.
It would probably be more productive to design a better architecture and use that, or to use a better designed architecture already in existence, but the complications in x86's design make for a more difficult code generation problem. If you are only interested in productivity, it is better if the problem to solve never exists. But is ok if all you want to do is solve problems, even if the best solution is not to solve it.
>>237
It's already been solved. In fact, it's been solved in architectures created even before Intel licensed the 4004 from Busicom. x86 is the Malbolge of CPU architectures. It manages to be painful for both programmers and chip designers.
>>229,230
The accumulator is still preferred, while it may not be directly faster many instructions are at least 1 byte shorter when operating on it which translates into better code density and cache utilisation. I've worked with more than half a dozen different architectures and x86 is one of my favourites (among others such as the Z80 and 8051), it's quite "balanced" and has many opportunities for optimisation. I have less nice things to say about the 64-bit "extensions".
The theoretical developments indirectly pay off to a lot of things. Just think, where did SSA come from?
SSA is not new, it was already around at the time of the Dragon Book. And Asm programmers have been implicitly using that sort of thinking in allocating registers before that.
No one really cares about compilers. Just look at databases, networking, and graphics. Then look at compilers.
That is unfortunate, because unless written in Asm, all software depends on them and a highly optimising compiler has the potential to make almost all software more efficient. Then again, the "optimise later" or "forget optimisation" bullshit that CS students are brainwashed with doesn't help either.
>>236,237
You can't gain something without losing something else. What you gain in flexibility, you lose in instruction size and the delays introduced by additional datapath circuitry. Going back to the diagram above, how many of the operations in a program, on average, need to preserve both of the inputs? Or put it another way, what is the average number of "outputs" needed for all these operations? This corresponds directly to the best architecture choice. If all 3 of them need to be preseved most of the time, then a 3-address architecture makes sense. If only 1 of them needs to be preserved, then 1 address architectures will be most suitable. From the general proliferation of 2-address architectures, it's probably the case that this number is around 2, and maybe slightly less. Either way it's a good middle-ground choice.
I will agree that x86 code generation is not as simple as it is with other architectures. But at the same time, it also presents more opportunities for optimisation so that the hardware can also run at its full capabilities. Consider the other extreme, a 3-address "pure RISC" with so many registers (let's say... 256) that "register allocation" becomes trivial and having to "spill" means there is something very unusual about your program. Instructions are going to have to be at least 24 bits just to hold the register specfications alone; round that up to 32, so an instruction is at least 32 bits. Forgetting for the moment how you can have that many registers at such clock speeds, consider running a typical application program on such a processor, where there are never more than ~10 live variables and the majority of the operations only need 2-address type instructions. Now each time it fetches an instruction it's wasting 8+ bits on a redundant register field, and 95%+ of the registers are going to be unused. Clearly this is a huge waste of bus bandwidth and die area.
The advantage that CISCs like x86 have is exemplified by the systems of today, where memory is much slower than the core and there are multiple cores vying for its use. Here, you want the CPU to be able to run quickly inside the core while sipping instructions from memory (including caches) at a slower rate. This is where high code density is crucial. Expand instructions into RISC-like micro-ops inside, taking less bandwidth than pulling them over the memory bus.
Name:
Anonymous2012-08-06 5:04
>>241
So the ideal architecture is MIPS64 with destination = first operand?
Name:
Anonymous2012-08-06 6:19
If you want something done, you better do it yourself. Just start making and using your own language.
Name:
Anonymous2012-08-06 6:36
This thread made me discover Go and I'm quite pleased. I realize I'm not quite a fan of promoting Google or it's products but Go seems quite nice.
>>242
No, more like IA32 with a 64-bit extension closer to how x86 was extended to 32 bits. MIPS is optimised for silicon area and not much else. Things like branch delay slots(!) and fixed-width instructions make the ISA very rigid and there isn't a whole lot the software can do. In other words the ISA is so tightly coupled to the original hardware implementation that there is little opportunity for optimisation. In contrast, with x86 where many instructions perform multiple basic operations, parallelism almost suggests itself. All these improvements can happen inside the core and immediately benefit existing applications, without changing anything else. If anyone here remembers the days of Socket 370, that's exactly what happened.
The RISCs that have huge instructions that only do 1 thing will require more memory bandwidth to do superscalar/OOE, because by necessity they will need to fetch instructions faster. A 32-bit wide core-memory bus will either need to be widened to 64-bit or have its speed doubled. Not needed with x86: 32 bits is already enough to hold up to 4 instructions and they just need to change the core itself to execute them in parallel.
>>245
Cutting off half the opcode space to do that is hardly desirable, not to mention 256 registers is still way too much for the majority of use cases.
x86 has several 3-address instructions (some of the multiplies and divides), and they were likely added because an analysis of existing code showed that those specific instructions were much more likely to benefit with them present.
Name:
Anonymous2012-08-07 5:20
>>246
What exactly is wrong with AMD64? The retarded space-eating REX prefixes?
>>246
We knew you're shit at programming, and now we know you're shit at computer architecture. Are there more things you're shit at that we should know?
>>247
That's part of it. They got rid of a whole row of instructions when one of the existing unused ones could be repurposed for a size override/high-bank register select, like they did when going from 16 to 32 bits. It was overall too disruptive of a change.
>>246
>The RISCs that have huge instructions that only do 1 thing will require more memory bandwidth to do superscalar/OOE, because by necessity they will need to fetch instructions faster. A 32-bit wide core-memory bus will either need to be widened to 64-bit or have its speed doubled. Not needed with x86: 32 bits is already enough to hold up to 4 instructions and they just need to change the core itself to execute them in parallel.
Are you seriously brain-dead enough to rehash arguments that have been dead for 20 years out of your ignorance? PROTIP, you moronic piece of shit, modern x86 can only function by breaking down the complex instructions to a reduced set for the hardware first.
It's really sad that in the time you spent showing the world how pathetically uneducated you were, you could have maybe read and learned something. Well, maybe a word or two of it with your displayed intellectual capacity so far.
>>246 Cutting off half the opcode space to do that is hardly desirable,
I don't hear you bitch about the ?ax-specialized opcodes in x86, though. Or multiple encoding for the same instruction. Or the 66h/67h prefixes. not to mention 256 registers is still way too much for the majority of use cases.
Make them 32, 4 bit for each register, add r1 r2 r3 is now 20 bit, add r1 r2 is now 16 bit. They're, saved you're space again.
Name:
Anonymous2012-08-08 13:27
C:\TEXT\KOPIPE.TXT
God says...
For years, people have been predicting--hoping for--the demise of assembly
language, claiming that the world is ready to move on to less primitive approaches to
programming...and for years, the best programs around have been written in assembly language.
Why is this? Simply because assembly language is hard to work with, but--properly used--
produces programs of unparalleled performance. Mediocre programmers have a terrible time
working with assembly language; on the other hand, assembly language is, without fail, the
language that PC gurus use when they need the best possible code.
Seriously, ASM fucking sucks(x86 atleast) and using ASM in this age for anything really serious is fucking stupid.
Its not the point that ASM is or was powerfull, ASM was and is shit, its just that programmers of the old were fucking demi-gods that yet many of us could imagine to be real.
Some DOS games back in the time were way more innovative and creative then most games of today. Plus they had much lower budgets, less computing power, less help both personel wise and computer assistance wise.
And lots of them were writen in ASM and/or some obscure shitty language.
>>246 http://bitsavers.org/pdf/datapoint/2200/2200_Programmers_Man_Aug71.pdf
Why is the original description of the embedded terminal architecture that Intel licensed from Datapoint to make the 8008/8080 so much more readable than Intel's description? And why are we still using extensions to extensions to extensions to an embedded CPU architecture designed to be what could cheaply fit on one chip in the 70's?
>>250 modern x86 can only function by breaking down the complex instructions to a reduced set for the hardware first.
And that happens in the core, where clock speeds can be pushed much higher. Here's an analogy that makes the situation clearer: suppose you can choose two types of slaves (this is 4chan after all), one can do everything really quickly but also only knows really simple commands, and another that isn't that quick but you can tell them to do something in one word what would take a few dozen with the other, and they'll take their time doing it. You can only talk to one slave at a time. Which type will you choose a few dozen of if you're the master and want to get them to do something like build a house?
With the first type, all your time is going to be spent telling them one-by-one what to do in excruciating detail, while all the others are just waiting for their next little command since they finished their last one almost immediately after you told them. Not a good use of resources and overall throughput is going to suck. With the second, you'll spend much less time telling them what to do, and they'll spend much more time actually doing something rather than waiting. You might even have enough time to go spend some time with your friends. You are the memory. The slaves are cores of a multicore processor. Your friends are peripheral devices. The first type is RISC. The second is CISC. Get it now?
I'm not keen on turning this into personal insults but it appears that your definition of "educated" is more akin to "brainwashed with the beliefs of the academics", and it really shows the lack of actual thinking going on in your head when you can't even seem to understand my argument enough to formulate a defense. I hope the analogy helps.
>>252
The accumulator opcodes are designed for use in calculations involving a long chain of accumulated results, which is relatively frequent. Function return values go there too. It's quite clear that a long-lived variable or set of variables should almost always stay in (e)ax. Multiple encodings are always an issue, they're hard to get rid of but relatively speaking x86 has a lot less of them (and empty space) than some other architectures.
No idea what you're getting at about the Osize/Asize prefices, but that's much better than x86-64's replacing an entire row of register increment/decrement instructions with 64-bit-only prefices. One of the most common operations done in loops. I'm almost willing to bet Intel was NOT pleased with AMD doing that.
>>257
It's probably just you. I have the 8008 manual and it's not bad. Maybe you like the octal notation for opcodes.
We're still using x86 because it's what works the best for general purpose CPUs at the moment. The i432 failed to unseat it, Itanium was pretty dismal, ARM and MIPS are filling the low end of the spectrum where die size/transistor count and power consumption matter more than anything else, POWER, SPARC and the "big RISCs" are in the massively parallel TOP500 supercomputers where they belong (and in applications that can actually make use of all those registers, with specialised hardware/software that ensures all the cores don't fight each other for memory), which leaves x86 to fill in the huge general-purpose/value-performance market.
>>258
>"educated" is more akin to "brainwashed with the beliefs of the academics"
Hahahahah oh wow. Sure, reality is brainwashing, and all the processors operate otherwise because all the engineers Intel and AMD employ are unaware of the scientific breakthroughs a special brain dead piece of shit made in his basement. Clearly no one was aware that one CISC operation could do more in a slower run on a system that couldn't scale, and no one analysed that it ended up being overall much much slower than running multiple commands on a fast architecture, they must all have put CISC->RISC subsystems on modern x86 processors out of sheer ignorance! If only they had employed a clinically retarded turd nugget who thought understanding one CISC operation doing more was an expression of genius and relevant, one that couldn't imagine these concepts are really simple to non-retarded folk and his intellectual humiliation came from not understanding the facts! All these people brainwashed by "logic" and "mathematics" will never have your vision, will they, retard? Oh, and these aren't insults, I am making factual statements as I always do.
>>260
Firstly, learn about the processor-main memory bottleneck. Your notions of algorithmic complexity and such are useless at such a small level. Secondly, go back to the imageboards, ``please''!
Name:
Anonymous2012-08-09 6:39
>>260
Well your attack is not credible since you don't provide any argument yourself, you only echo other peoples beliefs like a sheep.
Name:
Anonymous2012-08-09 7:14
>>261
Irrelevant to the topic, and the fact that algorithmic complexity doesn't come into play is to the point of my argument although I didn't even bother touching that subject as it is a topic that has a conclusive answer. Are you sure you read the right post, or are do you just have some pre-made answers in case of butthurt that you chug forward regardless of the content? Sorry for educating you so far.
>>262
So saying 2+2=4, and that it is recognized as such by all experts and all hardware at the modern date is implemented as such puts the burden of proof on me, and "BAWWWW ACADEMICS ARE BRAINWASHED!!11 ONE INSTRUCTION CAN DO TWO THINGS MEANS IT WILL BE FASTER EVEN IF THE OTHER PROCESSOR CAN DO A HUNDRED INSTRUCTIONS IN THE SAME TIME" is a credible counterargument worth discussing? Are you sure you aren't a certain brain-dead excrement leftover tripfag samefagging who would be uneducated enough to take this seriously?
(function() {
var posts = document.getElementsByClassName('post');
for (var i = posts.length - 1; i >= 0; --i) {
var trip = posts[i].getElementsByTagName('span')[4];
if (trip.textContent == '!MhMRSATORI!FBeUS42x4uM+kgp' ||
trip.textContent == '!MhMRSATORI!fR8duoqGZdD/iE5' ||
trip.textContent == '!MhMRSATORI!vzR1SbE7g/KHqrb')
posts[i].parentNode.removeChild(posts[i]);
}
})();
He changes secure trips occasionally,† so expect to adjust accordingly.
________________ †sqlite> select count(*), trip from posts where author = 'Cudder ' group by trip order by count(*) desc;
204|!MhMRSATORI!FBeUS42x4uM+kgp
61|!MhMRSATORI!fR8duoqGZdD/iE5
31|!MhMRSATORI!vzR1SbE7g/KHqrb
9|!MhMRSATORI
1|!MhMRSATORI!cbCOMSATORIM/pr
>>267
So? If he were forced into anonymity, he'd certainly post less, because he craves attention, but it won't even come to that, because most people wouldn't be using >>266-kun's script.
Why do Americans always think that if a measure doesn't solve 100% of the problem it addresses, it's not worth doing at all?
>>260 that it ended up being overall much much slower than running multiple commands on a fast architecture
Look at when those studies were published. Back then, they thought they could keep raising clock speeds indefinitely, and memory bandwidth was plentiful. The technology of the time was such that the disparity between CPU core speeds and memory speeds was much smaller. A lot has changed since then.
CISC->RISC subsystems on modern x86 processors
I already answered that in the first part of >>258. I wouldn't call it "CISC->RISC" though, it's just a parallel microsequencer.
All these people brainwashed by "logic" and "mathematics" will never have your vision
The problem is the academics parroting information from out-of-date or flawed studies and failing to actually look at the reality and think. We saw this from the discussion of register allocation earlier.
From the RISC wiki page:
"The goal was to make instructions so simple that they could easily be pipelined, in order to achieve a single clock throughput at high frequencies."
Anyone reading that should automatically think "what about the rest of the system? Can the memory keep up? Can the clock frequency be raised indefinitely?" The huge assumption they made there was that memory would always be fast enough --- and that assumption, while it may have been true in the 80s and early 90s, is not anymore. The laws of physics also weigh in on this, limiting clock frequencies. Someone who accepts that statement without questioning its assumptions is like one who believes in a religion; as that quote from one of your "idols" himself says, it's "unscientific and ultimately destructive".
When your opponent starts becoming half-unintelligible it's a good sign that he's unable to think anymore. P.S. calling someone a "retard" is far from "factual" :)
>>263 EVEN IF THE OTHER PROCESSOR CAN DO A HUNDRED INSTRUCTIONS IN THE SAME TIME
But it can't, given the same memory bandwidth. That's the whole point. A simple calculation shows it even more clearly: Suppose you have 10GB/s of memory bandwidth. On a classic RISC with 4-byte instructions and 1 instruction per cycle (e.g. MIPS), that corresponds to 2.5GIPS. That is the maximum speed you can execute instructions at, and would be equivalent to a clock speed of 2.5GHz. You cannot raise the clock speed more without needing to insert wait states, even it the silicon allowed going up to 4GHz. 100 instructions is going to take 40ns, no matter what. Now consider a typical modern x86, up to 4 instructions per cycle, as low as 1 byte per instruction, and a clock speed of 2.5GHz. Assuming that 4 1-byte instructions can do as much work as the 100 RISC instructions, and runs in 100 clock cycles, they can be fetched in 0.4ns, leaving the memory bus idle for the remaining 39.6ns while the instructions execute in the core for 99 more clock cycles. Even ignoring the 4x superscalar speedup, here is where x86 has the advantage: the silicon can do more than 2.5GHz, so let's increase it to e.g. 4GHz. You can't do this for the RISC because the memory can't go any faster. The x86 is now fetching in 0.4ns and takes 24.75ns to execute, total of 25.15ns for the four instructions of 100 clock cycles each, running in parallel. The core can be run at the highest frequency it can, not limited by how fast instructions can be fed into it. In practice, memory is a bit slower (10GB/s quoted above is a peak burst rate), so the difference is even bigger.
>>265
If by "killing" you mean repealing old misconceptions and beliefs and encouraging more critical thinking, I'm in complete agreement. /prog/ isn't a church. Fuck this pseudo-religion.
>>273 We saw this from the discussion of register allocation earlier.
Welp, it surely was someone with no background whatsoever in graph theory that noted that all the graphs generated by SSA are chordal and can be colored in polytime, right?
Name:
Anonymous2012-08-10 12:36
>>273
>"HURRR SUPPOSE 2=3, THEN 2+2=6!!!11 LOOK I AM NOT RETARDED NO MORE, WHERE IS YOUR 4 NOW!1! RELIGION OF SCIENCE!!111 I HAVE ANSWERED YOUR ARGUMENT ALREADY EVEN THOUGH I NEVER DID, AND NEVER COULD UNSURPRISINGLY BECAUSE WHATEVER I CRY DOESN'T CHANGE THE FACT THAT CISC INSTRUCTIONS ARE STILL CONVERTED TO RISC IN MODERN PROCESSORS"
Well you being clinically retarded is not up to discussion, you prove it by every sentence you manage to shit out. Notice that you are so incredibly brain dead that you are still talking about imaginary old "published studies" while the cold hard facts I've brought to the table were the implementations by profit-seeking corporations that are bound to the x86 CISC architecture. Sure, all the top level engineers at Intel, everyone working in alternatives in CISC and RISC based devices, along with everyone in academia and anyone knowing mathematics and logic is unaware of your scientific breakthrough that makes CISC magically relevant after 80's, perhaps you should name your theory after yourself "Retard Processing" (well, you actually haven't managed to provide content for it other than spouting baseless numbers that don't relate to reality, but I'm sure even a retard such as yourself has backed your notions despite not telling us, right? right?)...
Ah, also, it's funny that you are talking about "an opponent" of yours in "an argument" - I am merely putting some brainless piece of shit to his place by attempting to grind facts into his vacuous mind, there is nothing to argue here for anyone who knows basic mathematics or logic. You might even be crying "2+2=5" while bleeding out of your anus like this. You should read more instead of crying, you won't learn anything with your level of retardation, but you'll get points for effort.
If you seriously want to make the claim that RISC is superior, you'd better have the fucking willpower to optimize the living fuck out of your code or be able to shell out for a proprietary compiler that can.
x86 was shitty as fuck when it came out, but our requirements have changed - things that impacted performance back in the day don't impact it now, and being able to code in a way that has hardware-level optimizations for certain operations is a useful innovation/
``i used the superior speed of assembly to program a lifetime supply of semen for all my holes!'' -- cudder
Seriously why is this discussion still going on, it's less relevant to programming than which 2hu I'd fuk. At the very least please sage consistently so you stop bumping this crap.
>>275
It doesn't take any knowledge of graph theory to see that register allocation just ain't that hard. The academics took this huge long detour into graph theory, SSA, etc. before finally beginning to realise that.
>>276 CISC INSTRUCTIONS ARE STILL CONVERTED TO RISC IN MODERN PROCESSORS
That's one way to think of it, but the "RISC inside" is more of a marketing strategy than a description of what really happens, as uops are a lot more elementary than a typical RISC instruction. If you consider uops "RISC", then even the original 8086 would be doing that "conversion"; the only difference being the microsequencers back then were sequential-only, whereas they're more parallel now. In case it wasn't clear from the beginning, we're arguing about RISC vs CISC externally, i.e. instruction sets like MIPS, ARM, etc. vs x86. Whether you consider a processor to be RISC or CISC internally (and you seem to be saying that they're all RISC, fair enough) isn't the point.
imaginary old "published studies"
I'm talking about the original papers in the 80s and early 90s. Possibly the only thing "imaginary" was you at that time.
You might even be crying "2+2=5"
Are you going to argue against maths now? Do I have to teach you basic arithmetic first?
baseless numbers that don't relate to reality
Then how about YOU try to come up with a counterexample, or at the very least point out exactly what you have issue with? If you can't do that, then it means you do not even have sufficient familiarity with your side to defend it!
Your repeated incessant personal attacks only show that you cannot come up with any better argument. You have lost.
(This is not unlike arguing with a religious fundamentalist about evolution. Seemingly educated, mild-mannered people quickly turn into surprisingly bad-tempered screaming insult-flinging children.)
>>279
Or at least provide some nice convincing numbers to inspect.
I'll conclude this with a few words from our sponsorssupporters:
>>291
ARM has low transistor count and power consumption, and is easily embedded as an IP core into an SoC. Memory bandwidth doesn't matter as much (they run the core at a low enough speed, for power consumption reasons.) These characteristics are more suited to a mobile application.
Name:
Anonymous2012-08-12 13:01
>>292
Care to qualify your overly broad statement about x86 "beating" everything else then?