make unnecessary memory writes.
If it's to the stack, they can be treated as registers; they're almost as fast due to store/load buffers.
It's computed with every single arithmetic instruction just to be thrown away.
It doesn't cost anything extra except <1K transistors on a chip with approximately 6 orders of magnitude more.
Why not add things to the CPU to help with languages instead of the other way around?
HLLs will never exploit the full functionality of the CPU.
Look at source for GCC, the JVM, or any other portable compiler. They have to add kludges for x87 compared to all other supported targets
So what? Every architecture is unique, that's what makes them special. A bounded stack isn't any more difficult to generate code for, unless the compiler writers were completely idiotic in their approach and tried to think of all architectures as the same... which they aren't. It's very easy to write a dumb compiler (see otcc, tcc); it's hard to write one that makes use of the architecture effectively.
ADD and INC are different, SUB and DEC are different[...]just because that's how the 8008/8080/Z80 did it
They had a very good reason to make them different, and in a very specific way. Hint: ADC, SBB.
You get compilers that do add eax, 1 instead of inc eax to avoid a partial flags stall.
A few extra cycles on Nehalem, which is NOTHING in comparison to the HUNDRED or more of a cache miss caused by larger code when there are many of those. Those extra cycles don't matter the majority of the time even if there isn't a cache miss, since these CPUs are superscalar and out-of-order; they'll just find some other instructions to execute in the meantime. The issue is pretty much gone in Sandy Bridge. Same for many other instructions that used to be slower, they've realised that fetch/decode bandwidth is important so they're working on making the instructions faster --- as they should be. They probably delay disclosing this information, but make the appropriate changes to their compiler, so they can stay ahead.
Someone from Intel talks about the block move instructions here, 6 years ago:
http://software.intel.com/en-us/forums/topic/275765
and from what I've seen, they have managed to make MOVSD come out on top again starting with Nehalem, so now everything makes sense and there's no longer a need for bloated unrolled and aligned memcpy implementations when 2 bytes suffice. Aligned or not, rounded sizes or not, it's all done in hardware now. If my predictions are right, lods/stos may make a comeback too.
Intel/AMD dropped most of segmentation to phase out the "legacy" modes.
AMD did it, Intel had no intention to.
It's fast, but you're only using 1/1000 of your RAM and have no hardware support (floppy disk and ISA cards are gone).
Processing power and RAM are not necessarily correlated, some applications can benefit greatly from raw execution speed but don't need more than a few K of RAM, and others are the opposite. If you don't have an FDC then you bought the wrong mobo.
>>41
because it didn't have paging
Paging is useless for most embedded applications, which need guaranteed latency and may not even have mass storage to page to.
You may point to the 386EX as a success story, but every ODM that went with that chip got burned because Intel was disinterested and provided no upgrade path for them.
Ditto for the i860 and i960... and yet MCS-51 lives on.
tl;dr: While you academics are intellectually masturbating over impractical "elegant" architectures the real world has shown that a complex ISA is no obstacle to performance, quite the contrary as it allows for a lot of hardware optimisation, and as memory bandwidths become the bottleneck, dense CISC is the future.