Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Instruction Set Micro-Optimizations

Name: Anonymous 2012-11-28 18:01

I've always been interested in these, even if few people have a chance to experiment with them, so, yeah, let's do it.

Some 6502 derivatives feature dedicated store 0 and load 0 instructions, which saves a byte and (possibly, I don't remember) a cycle for a very common operation. Neat.

A lot of RISC architectures, on the other hand, feature a dedicated 0 register for this purpose -- iirc, in MIPS, register 0 is always 0. It seems at first glance that this would be more general and useful than load/store 0 instructions, but I'm having trouble thinking of a use for the register besides loading and storing 0 values. On second thought, maybe it's a waste of a register.

Name: Anonymous 2012-11-29 13:38

>>23
I assume they're not so common because they add complexity to exception handling
Good point. I'm more familiar with microcontrollers that lack things like exceptions and virtual memory support, so I forgot all about that aspect of CPU architecture.

For what it's worth, the HuC6280 has several block move instructions that must finish completion before interrupts can be serviced. If you had something interrupt critical going on in the background, you would need to break large transfers up into several small ones. Otherwise, say, a kilobyte block transfer could cause you to be over 6000 cycles late to servicing an interrupt! Ouch.

By comparison, I don't think waiting for 16 registers to be finish being pushed to the stack would be so terrible for interrupt latency, or at least not for desktop/server/mobile workloads.

In the original implementations, the zero register would have been implemented as a physically read-only register which doesn't need special handling.
I'm aware of that, I'm saying that in any processor with a pipeline (pretty much anything non-embedded since the Pentium), it's a waste of a valuable pipeline stage to actually add 0 to the value of some register, so you need a special case in the instruction decoder to actually treat add destination, source 0 as an actual move destination, source instruction. Implementing all the special cases like that in the instruction decoder is probably more expensive in the grand scheme of things than simply implementing a real move opcode.

This probably is negligible for hardware design in 2012 (given that Intel manages to beat everyone's pants off in raw performance with such a crufty ISA), but then again, this thread is about micro-optimizations.

>>20,21
For an approach from the opposite angle, you can try programming for simple 6502 systems and reading about how they work.
Once you've achieved some level of familiarity, Agner Fog's manuals about the microarchitecture of various x86 processors get very interesting.
http://www.agner.org/optimize/
http://www.agner.org/optimize/blog/

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List