Surely you could have taken the five minutes to actually benchmark that code. I suspect you just wanted to shit up /prog/ with your puerility.
Ah, well. With a bit of luck, you would find that in your example it doesn't make a lick of a difference, due to one of two factors:
1. Any modern CPU keeps a history buffer with taken branches, so no matter how you organize your code, it will predict a branch to the common case for frequently executed branches.
2. If you don't have a modern CPU, i.e., one with OoOE, your example should compile to a conditional move, eliminating the branch altogether.
To give some actual numbers: the branch misprediction penalty on a Cortex-A8 is 13 cycles, and on a Cortex-A9 it's 8 cycles.
I found some interesting data on other processors at http://www.7-cpu.com/ , though I can't vouch for its accuracy.
Name:
Anonymous2011-09-26 18:57
>>1
Theoretically, branching is prejudicial to performance in general, because it tends to force pipeline flushes or stalls. The predicting circuitry avoid great part of the penalty. Thus, a branch miss hits performance somewhat bad in the machine-level, but not as bad as a cache miss (except, evidently, if the branch miss also causes an instruction cache miss).
This means that you'll have visible performance differences in certain situations; for example, if you're branching dozens of thousands of times in a tight loop. Otherwise, it won't even contribute to the execution time noise; such latencies are typically in the nanosecond range. Also, depending on the final code layout, the static predictor always hits if certain conditions are met (for example, conditional backwards jumps are always seen as taken). In C, you can't decide the code layout, except by hinting the compiler with builtin directives (which he'll probably ignore anyway, since programmers are very bad at predicting bottlenecks).
Thus, "branch misses" are a concern far, far, far away from the average application code, specially in the kind of conditional you've suggested in the snippet. In Java (and probably every language which is not C), it's even more true: Java has an ocean of low-level overhead between source code and the final instruction stream, making any attempt of justifying some "optimized" construct look just plain ridiculous.
Also, x86 processors have an optimization circuitry called "loop stream detector" which is able to optimize very heavily the execution of small loops up to 16 bytes in code length. (Remember that a loop always have a conditional branch associated with it.)
Ultimately, people who're paranoid about such issues rarely understand anything about the subject. They typically have bad programming practices (like littering the code with __builtin_expect() directives), kludges data structures with "optimized" expressions, and overinline function calls, causing an even greater performance loss (due to the forementioned and imminent instruction cache miss caused by big thunks of linear code). People don't profile, because profiling makes you look uncool, since it always proves that all of "those highly and expertly optimized sites in your C code" is just girlyish bullshit.
IOW: don't waste energy "optimizing" branch sites in Java. In C, it'll make sense only in very specific scenarios, with which you'll probably never meet.
Name:
Anonymous2011-09-26 19:14
why dont you compile each and run them 9 billion times each and compare
ONE WORD: THE FORCED PROFILE-GUIDED OPTIMIZATION OF THE CODE
THREAD OVER
>>3 http://www.7-cpu.com/
Cortex-A9
Whoah, I didn't expect toy CPUs to perform that well. OTOH at these low frequencies relative RAM latency is much lower and this benchmark is highly dependent on that, but still it's kinda impressive to have a CPU which can compete with x86 in IPC for once. Now they just need to put 12 cores in and bump the clock threefold.
Name:
Anonymous2011-09-26 19:21
>>3 Surely you could have taken the five minutes to actually benchmark that code. I suspect you just wanted to shit up /prog/ with your puerility.
what a faggot you sound like
>>4
Not OP, but this post was highly informative. If you could give a good source in which I can read about the subject to help me with my girlish optimization fantasies I would be thankful.
Name:
Anonymous2011-09-26 19:44
>>8
There's only one reliable source regarding assembly-level code optimization: the processor manuals. Everything else is either bullshit, or just resaying what the manuals already say.
But it'll probably be of more use if you look for profiling techniques. Search for oprofile and gprof for a starting point. These tools are typically enough to detect real bottlenecks on profilable code.
>>6
According to ARM's figures, the Cortex-A15 will have increased IPC, goes up to 2.5 GHz, and is designed for quad-core.
So far, so good, but if they want the desktop and server market they'll have to introduce a 64-bit architecture. LPAE is just a stopgap measure.
Name:
Anonymous2011-09-26 21:31
>>4
Okay, I can't tell if you're just plain fucking stupid, a jew, trolling, or any combination of the three. Anyways, from the view of the C abstract machine, there is no difference in the code.
BTW faggot, you're still confusing the abstract machine with the implementation itself. Now go scrub another toilet.
Name:
Anonymous2011-09-26 21:33
>>4
And your description really breaks down for Java. I'm suspecting this is because you don't know the difference between the Java language itself and the JVM.
>>20
Listen little girl, there is a reason why those of us program who living preach the standards. This is because some of us have to compile the same fucking thing on 8 different platforms. If you have non standard/non conforming code, then the works becomes a lot more difficult.
But you wouldn't know anything about this. Now go scrub another toilet and keep studying SICP you fucking nowhere bitch.
Name:
Anonymous2011-09-26 22:00
it's nothing compared to memory allocation, deallocation, and cache coherency.
>>15
You're visibly a complete illiterate, in the most obscene basic level. What the fuck are you talking about? Have you understood or even read anything that's been said?
Seriously, I fear you're retarded to a point way past of even minor recovery.
Name:
Anonymous2011-09-26 23:22
>>11
This is indeed an interesting source of information. I disagree on a couple of points described by the author, however; in some of them it seems to me that he's completely wrong.
For example, the author states that some common boolean arithmetic is typically implemented with branch trees. This seems rather absurd to me. Also, dynamic_cast is -extremely- costly if a type analysis is actually performed; in particular, much more costly than a virtual call -- maybe he should have stressed that. And, while pointer dereferencing is typically fast, sometimes it's slower than direct access by a considerable factor (pointer load, LEA and friends). The situation is much worse if pointer aliasing exists.
The material is nonetheless really valuable in the whole. Almost everything described is neutral knowledge to the assembly programmer, but he exposes it on a more tractable language. Despite that, some of his tips are really infeasible in my opinion (messing with code legibility or semantics to 'hint' an optimization to the compiler is particularly out of question to me).
if (val) is much faster than if (val != 0) I benchmarked the old code with the faster if test and it resulted in a 230% speedup.
Name:
FrozenVoid2011-09-27 12:06
>>39
If your code is so inefficient that one 'if' results in 230% speedups post it here, i could save you at least 50% more speed.
Name:
Anonymous2011-09-27 12:46
>>40
The code is in >>1 all I did was to tightly loop it and then replace if (val != 0) with if (val).
Name:
Anonymous2011-09-27 13:31
>>41
Could you post the complete code, with the loop?
Name:
Anonymous2011-09-27 13:35
>>26
I was pointing out what you're talking about is non standard. You're more than welcome to try and cite any passage in one the the various ANSI/ISO C standards or JSP that supports anything that is being talked about in this thread.
So to put this in terms your jewish ass can understand, branch misses are platform specific, and hence, not covered by any of the standard. Get it hourly worker boy?
>>43
Most code you write for actual devices that are not personal computers is written for a specific platform. Get back to your enterprise-grade cubicle. I'll be here with my robots.
>>43
Why do you keep trying to sell us this "C standards" nonsense? We know it's snake oil, completely useless for real programming. Nobody here is buying any, so you should go spam somewhere else.
Name:
Anonymous2011-09-27 15:03
>>46
Have you ever written a single like of production level code for any firm?
>>43
You don't need to point out the grotesquely obvious, you retarded midget. There's nothing in this thread that even marginally touches portability issues. There's nothing in this thread that could suggest, even in a fanciful gay dream your feeble brain could produce, that anything so far discussed could ever be portable, or standardized, or even preferred over portable code. It is horribly clear from the context that everything cited so far is non-standard, but you seem to lack very basic text comprehension skills.
Also, don't worry about citing standards. I know more, much more about standards than they actually deserve. In fact, I probably know more about every programming subject than you could ever hope to understand, and it's likely that I'm not alone. After all, you surely haven't correctly grasped a single bit of information from anything you've ever read in your life -- considering you have actually read anything --, since you just haven't been properly alphabetized.
Name:
Anonymous2011-09-27 19:47
>>50 You don't need to point out the grotesquely obvious, you retarded midget.
What do you do again for a living? What's your job title and company you work for? How many millions of people use the software that you write?
There's nothing in this thread that even marginally touches portability issues.
The context of the question revolved around C and Java. Ya know you fucking nigger, most people who ask questions not related to any standard would word the question differently.
I know more, much more about standards than they actually deserve. In fact, I probably know more about every programming subject than you could ever hope to understand
Oh really? So have you actually served on a committee? If so, can you please share the committee name and dates.
After all, you surely haven't correctly grasped a single bit of information from anything you've ever read in your life
You're projecting. Go scrub another toilet you fucking jew.
Name:
Anonymous2011-09-27 22:20
>>50
I remember one time he entered a discussion about assembly and argued that it was outside the scope of standard C and consequently threw an autistic fit when he was informed that the discussion was not about C at all.
He uses the standard like his bible and is completely unable to understand anything beyond its scope, like basic reading comprehension. I guess you could say that he's a, Creationist.
You are the slightly more intellectual equivalent of 'my dad works at Nintendo'. What is this wonderous software you write, and can you answer your own requirements of providing the name, product, and proof that you worked on it?
But I suspect you cannot, because none of your answers actually made sense at any point in the thread. Resorting to ad hominems and spouting inappropriate jargon that has nothing to do with the subject (USE MAKEFILES!) just marks you as either a delusional skiddie or a troll which I am currently feeding.
Have you ever met a musician in a band that is ripping-off the sound of some trend? They are totally clueless that unoriginal is worthless and they are delsuionally proud because they think they're good.
The secret to my success is compatibility. The fewer the possible sources of incompatibility, the better. Interrupts are not as robust as polling.
I tried my OS on about 4 machines over the years and it didn't work on any one without modification! Just like "it's the economy stupid", "it's compatibility stupid."
God says...
C:\TEXT\Brief\AUGUST.TXT
be hidden from it, it wills not. But the contrary is requited
it, that itself should not be hidden from the Truth; but the Truth
is hid from it. Yet even thus miserable, it had rather joy in truths
than in falsehoods. Happy then will it be, when, no distraction
interposing, it shall joy in that only Truth, by Whom all things are
true.
See what a space I have gone over in my memory seeking Thee, O Lord;
and I have not found Thee, without it. Nor have I found any thing
concerning Thee, but what
Name:
Anonymous2011-09-28 12:02
Not working on any one begs the question, "Does it work for anybody." I get no emails but I'm #1 google rank. That's a little odd, don't ya think?
God says...
C:\TEXT\BIBLE.TXT
And when they had sung an hymn, they went out into the mount of
Olives.
26:31 Then saith Jesus unto them, All ye shall be offended because of
me this night: for it is written, I will smite the shepherd, and the
sheep of the flock shall be scattered abroad.
26:32 But after I am risen again, I will go before you into Galilee.
26:33 Peter answered and said unto him, Though all men shall be
offended because of thee, yet will I never be offended.
>>52
Yes, the guy's so inept he failed to see that there are about three or four different people arguing with him (or, more precisely, being heavily trolled by him). Yet he took all of them for a single one. I guess this is indeed a sign of autism.
>>56
Fuck off, misfit. You trolled us hard already.
>>64 >>65
There's a very important point on creating and conforming to standards. However, as far as compliance goes, there are very few important implementations that fully comply to a given document. This makes people inevitably depend on extensions (since they won't and shouldn't bother, for example, sprinkling #ifdefs to detect particular differences between C compilers), specially because most people don't even know about standards, or don't bother reading long, prolix documents that, in practice, only diminishes their productivity.
Also, as per definition a standard encompasses the minimum common set of features for a given set of architectures, they always tend to deny support for newer technology (and by newer I actually mean "concurrent execution" and "networking" in pure C, as incredible as it seems), sticking only with long-dated primitivisms.
All of this means that one can expect much less than he wished for when dealing with standards, since they fail grossly on solving the single problem they were supposed to solve: portability.
For example, there is a ridiculous number of compilers that -fully- comply to C99: GCC and MSVC ain't two of them. (C99 is a ten-year-old document already, and C1X is already being baked.) The situation is much worse for C++98.
I was using the CPUID to check for 64-bit capable and print a message if not. I was reading abn OSDev story and they had code that checkded first if the bit to check for 64-bit was present. I was like, "Oh, I guess I could add that." NExt thing I know, they're wanting credit--like my operating system is written by me and John Smith. I said "fuck that". I removed the extra test for ancient processors.
To this day, it doesn't have it. Fortunately OSDev has been pretty useless. I joke -- "Whatever you do, don't ever read GPL code or you cannot ever work for a real company cause you will have seen things first, there, and cannot use them!"
The bigger question is, "How the fuck have people been watching me and what an annoyance that it's not the same people who would know that I wrote everything and they're pestering me!"
I guess it's Heaven--Monty Python argument clinic.
>>69 Also, as per definition a standard encompasses the minimum common set of features for a given set of architectures, they always tend to deny support for newer technology (and by newer I actually mean "concurrent execution" and "networking" in pure C, as incredible as it seems),
Do you realize how much legacy code would break if you would attempt to incorporate "concurrent execution" and "networking" into the standard? I bet you don't.
>>75
Legacy code would still be single-threaded, suffering no harm from such situations. (I'm not talking about automatic parallelism.) For new code, reentrant versions of classic routines would exist (as in POSIX).
There would be new semantic requirements for memory fencing, atomic operations and guarantee of execution progress, similar to those documented by POSIX and C++11, but still legacy code wouldn't be affected, except by potential symbol name collision.
probably only need to do this once you've -near- finished optimizing a performance-critical code =) and your ifis sitting in a big loop (/gets used a lot)
but yeah, it is faster to skip a branch rather than take it i think...
>>97 and your ifis sitting in a big loop
You mean a small loop - if it's a big loop, a smaller percentage of the time inside the loop is spent on the conditional.
>>39
This is not related to branch misses at all. And seriously, which compiler are you using? Not optimizing such a trivial case? >>4 loop stream detector
sounds interesting. but 16 bytes? how many instructions is that?
_,. .--::::::::::::- .、
.,.':::::::::::::::::::::::::::::::::::\
.,':::::::::::,:::::::::::::::::::::::::::::::::::'.,
/::::::::/:::::::::::::::::::::::::::::::::::::::',
,':::::::/::、__/............λ 、::::ヽ:::i
i::::::,'::::::::::/-─-/ i:::/i_;::::::::i::::i
.i::::::::`iヽi.,.--- '、 レ' i::`イ/:::::|
.|:::::::::/|::::i "" '"ヽ/ヘ/::/ > tfw i was the original creator of this thread
|:::::::,'::::i....i '., 、_ ",'i:::|:;/
|::::::::::::::',::',/へ、 __,,.イ::|:::|
,':::::::,- '´ヽiヽ、 ~〈ヽ;;;;::|::,'::::| ( ::)
ノ:::::/ ヽ、`ヽy / ヽレ'|:::::| ( ::)
,':::::/、. ',/^ー:r ̄ ̄ ̄i:|
/::::', / ノ、___ノ 〉
.,'::::::::i,へ/ 「 ̄ヽrー´i l  ̄iイ::|
Name:
Anonymous2012-03-29 20:27
>>6
ARM Cortex-A9: 0.2875 MIPS/MHz/thread
SPARC Sun UltraSPARC II: 0.7 MIPS/MHz/thread
MIPS ICT Loongson 3A: 0.72 MIPS/MHz/thread
POWER IBM POWER7: 0.76 MIPS/MHz/thread
IA64 Itanium 2: 0.93 MIPS/MHz/thread
x86 AMD K5: 0.92 MIPS/MHz/thread
x86 Intel i5-2400: 1.24 MIPS/MHz/thread
WHERE IS YOUR RISC GOD NOW?
>>126 x86 Intel Pentium 4 0.447 MIPS/MHz/thread
When a company has as many orders of magnitude of dollars more than their competitors as Intel does, they could make any CPU architecture run fast. If /prog/ had 100 times more money than Intel, /prog/ could make a 10 THz Brainfuck CPU and then buy out all of the Intel fabs (like Intel did to DEC/COMPAQ and HP). Who needs multiplication instructions when you have 300 million transistors for ``BrainFuckFusion'' to analyze Brainfuck loops?
Name:
Anonymous2012-03-29 22:07
>brainfuck chip
>WANT
Name:
Anonymous2012-03-29 22:37
>>127
Did you notice the AMD chip there? AMD has nowhere near the amount of money Intel does. And yet... AMD Athlon 64 X2 (K8): 0.90 MIPS/MHz/thread
There's also this dark horse: MCST Elbrus-3S: 1.25 MIPS/MHz/thread
which is apparently "VLIW" but also "x86 compatible".
>>129
``MIPS/MHz'' is an absolutely meaningless statement. What processor are these ``MIPS'' measured for? What is considered an ``instruction''? Are all ``instructions'' weighted equally? Are page faults and cache misses taken into account? Is the CPU run in 32-bit or 64-bit mode (if applicable)? MCST Elbrus-3S (1891WM5AyA) CPU, 500 MHz, 218M transistors, 90 nm, 9-metal layer, 142 mm2. Power: 13-20W.
Power usage is also important. This CPU is much slower than a 1.6 GHz ARM Cortex which uses less than two watts.
>>129
This measurement would make sense only if the object code was identical on all processors. Otherwise your ``MIPS/MHz'' is saying more about the compiler's optimizations than it is about the CPUs it runs on.
>>130,131
It's a benchmark of total throughput per clock. "MIPS" is not a good term for this, they should call it 7marks or something like that. If you read the fine print at the bottom you'll see that the Intel Core 2 is taken as a reference at 1MIPS/MHz/thread. Then more efficient than it would be >1, less would be <1.
The best RISC one there is the POWER7 at 0.76 MIPS/MHz/thread, and it's not exactly power (lol) efficient either -- 200W TDP!
I've been saying this for years: CISC is going to have IPC and total throughput advantage. You can pump the RISCs to go really fast but then you're limited by the less dense code (more cache misses) and physical limits like power dissipation. In comparison a CISC won't have as many instructions and this makes for less cache misses and lower memory bandwidth, sort of like having compressed instructions in memory and decompressing them in the on-die decoder. Intel still has a lot of improvement they can do with x86 microarchitecture optimization like moving more instructions out of microcode, they just haven't seen the need to do it yet. But as memory bandwidth becomes the limiting factor for throughput it is clear that CISC-like designs are coming back and going to be the future.
Thumb is a step in the right direction, but it's still more limited than x86. It's missing things like divide/multiply and generalised addressing modes.