/prog/, I'm working on a RISC instruction set that is more RISC than any RISC out there. I've devised an instruction format, plus fifteen core instructions that should suffice for any programming out there. The instruction format, instruction forms, and instructions can be found here:
Seems fine, although you're wasting 4 bits on the opcode there, however they're probably well-wasted as it makes tools (assembler/disassembler/...) easier to write.
Fucking terrible. Have you even attempted to implement any nontrivial algorithm in it? Have you simulated it? Implementing a synthesizable Verilog/VHDL simulation would be educational for you.
Name:
Anonymous2011-09-22 9:54
>>3
I'm pretty sure that I'll be missing some important instructions, so a few more bits of opcode headroom is a good idea. This is my first venture into instruction set design, after briefly studying the x86 and PowerPC instruction sets. Do you know of any 'holes' in my instruction set? For example, I saw a 'system call' instruction, mnemonic sc, in PowerPC. I don't understand how that works at all, but I can gather that it's fairly important.
Fucking terrible.
As I would expect, as this is my first design, and I know next to nothing about how things really work.
Have you even attempted to implement any nontrivial algorithm in it? Have you simulated it? Implementing a synthesizable Verilog/VHDL simulation would be educational for you.
That is a good idea.
Name:
Anonymous2011-09-22 10:00
>>4
Care to explain why it's ``fucking terrible''? I don't disagree with you, but more detail and elaboration would be appreciated.
>>5
Depends on what you plan on using the instruction set for. syscall instructions are usually used for dealing with privilege levels (security) and providing a simple way of calling kernel or hypervisor code.
I would also like an indirect jump (to a register) instruction.
Name:
Anonymous2011-09-22 10:13
>>8
The j instruction is an 'indirect' jump, to a register, as you describe. The address field in that instruction is a register reference, from which the address value is pulled.
linux on the intel uses an interrupt with a value of 80, I think. It provides user programs with a method for invoking services from the operating system, like reading and writing to files, opening a file, and the like. Using the interrupt, the user program triggers the interrupt with the value of 80, and then I think the interrupt stops everything running, backs up the state of the processor, goes to a table of code pointers for handling interrupts and executes the 80'th one, which is the system call handler. Then the system call handler looks are the values in the registers, and executes an appropriate service as the operating system. When it is over, the interrupt ends and the processor is restored to its previous state. I think return values are passed in the registers. I would have to double check though. It has been a while.
So I write s 0, 0, 255 to write out 2^255 bytes to memory. The processor takes an exception halfway through. How does it resume execution after the exception handler completes?
write out 2^255 bytes to memory. The processor takes an exception halfway through.
Considering how "halfway through" would occur long after the heat death of the universe, I doubt it will make any difference.
Could you compress instructions?
Eg while app X is running, cpu frequently gets instructions A followed by B, C, etc... so instruct 'a' -> A + B + C...?
Name:
Anonymous2011-09-23 0:19
...might make little difference?
Programmable instruction sets then?? ...build your own SSE-n?
>>22
There are CPUs that incorporate a degree of configurability, and there's always FPGAs, but those will always be slower and use more power than their hard-wired counterparts.
Name:
Anonymous2011-09-23 8:01
Hello again, /prog/.
After some rethinking, I've redesigned the instruction set architecture, this time with various changes, including
- instructions are now 16 bits long
- the opcode field is six bits long
- the register reference fields are five bits long
- for simplicity, ops like `A = B op C' are now `A = A op B'
- there is now a status register, currently only used for c/j
- none of that `[size]' bullshit anymore in the load/store/move ops
Overall, a hopefully cleaner and better designed instruction set.
>>25
Adding SIMD instructions, let alone any instruction that can be completed equally with a combination of other instructions, will defeat the idea of this architecture being RISC.
Name:
Anonymous2011-09-23 8:44
And now for an (untested) emulator in ~50 lines.
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#define MEM 67108864
#define BITS 64
#define uw uint64_t
#define sw int64_t
int main() {
uint8_t *mem = calloc(MEM, 1);
uint16_t *memi = (uint16_t *) mem, ins, insd, insa, insb;
uint32_t stat = 0;
uw *reg = calloc(32, BITS / 8), cia = 0, nia;
while (1) {
nia = cia + 1;
ins = memi[cia];
insd = ins & 0x3ff;
insa = insd >> 5;
insb = insd & 0x1f;
switch (ins >> 10) {
case 0: break;
case 1: mem[reg[insa]] = mem[reg[insb]]; break;
case 2: reg[insa] = ((reg[insa] >> 8) << 8) | mem[reg[insb]]; break;
case 3: mem[reg[insa]] = reg[insb]; break;
case 4: reg[insa] = (insb >> 1) << ((insb & 1) ? 4 : 0); break;
case 5: reg[insa] = ~reg[insa]; break;
case 6: reg[insa] &= reg[insb]; break;
case 7: reg[insa] |= reg[insb]; break;
case 8: reg[insa] ^= reg[insb]; break;
case 9: reg[insa] += reg[insb]; break;
case 10: reg[insa] -= reg[insb]; break;
case 11: reg[insa] <<= reg[insb]; break;
case 12: reg[insa] >>= reg[insb]; break;
case 13: reg[insa] = (uw) (((sw) (reg[insa])) >> reg[insb]); break;
case 14:
stat = (stat >> 3) << 3;
if (reg[insa] == reg[insb])
stat |= 1;
else if (reg[insa] < reg[insb])
stat |= 2;
else
stat |= 4;
case 15:
if (((insb >> 1) & 7) & (stat & 7))
nia = reg[insa] + ((insb & 1) ? cia : 0);
}
cia = nia;
if (cia > MEM / 2 - 1)
break;
}
free(mem);
free(reg);
return 0;
}
It takes 12 instructions to load 0x12345678 into the first register.
Process
0. load the value 0x8 into the second register
1. load the value 0x1 into the high half of the lowest byte of the first register
2. load the value 0x2 into the low half of the lowest byte of the first register
3. shift the first register left by the second register (8)
4. load the value 0x3 into the high half of the lowest byte of the first register
5. load the value 0x4 into the low half of the lowest byte of the first register
6. shift the first register left by the second register (8)
7. load the value 0x5 into the high half of the lowest byte of the first register
8. load the value 0x6 into the low half of the lowest byte of the first register
9. shift the first register left by the second register (8)
10. load the value 0x7 into the high half of the lowest byte of the first register
11. load the value 0x8 into the low half of the lowest byte of the first register
You can't even copy the value of one register into another. Instead, you have to write the first register out to memory one byte at a time, read the data back into the register one byte at a time and then read the data into the target register one byte at a time.
Also, no way of detecting overflow/underflow in arithmetic operations, or perform bit rotation etc. etc.
>>31
Except Thumb was designed by people who had a fucking clue.
You can't even copy the value of one register into another. Instead, you have to write the first register out to memory one byte at a time, read the data back into the register one byte at a time and then read the data into the target register one byte at a time.
Oh, shit. That's a gaping implementation hole.
Also, no way of detecting overflow/underflow in arithmetic operations, or perform bit rotation etc. etc.
Those are good ideas. How common are their use? Should they be included?
Name:
Anonymous2011-09-23 22:21
>>34
Go download and read through a bunch of CPU datasheets for as many architectures you can find. That should give you an idea of what practical instruction sets look like.
Name:
Anonymous2011-09-23 22:30
>>35
Wouldn't I just end up creating a large instruction set like the rest? Even PowerPC's ``RISC'' ISA seems very large.
>>34
As I said earlier, try implementing some nontrivial algorithms and you'll notice what's missing and what's just badly designed.
Also, the goal of RISC is to have simple instructions that can be executed quickly and without micro-code. Having a small number of instructions is a fallacy.
In the 32-bit version you could just OR or AND the register with itself into a new register, but since you decided to switch to 16-bit two-operand instructions, a register-register move should be added so you can fit the equivalent to a three-operand operation into 32 bits.
NOP could be removed because you can OR a register with itself for the same effect. On MIPS, NOP is sll $0, $0, 0 (4 bytes). On x86, it's XCHG AX, AX (1 byte). Some other architectures use "branch never" or a "zero register" for this. You'd only need a dedicated NOP if you're using variable-length instructions and it's impossible to make an instruction that does nothing in the minimal instruction length.
NOT could be replaced by a two-operand "move complemented" or a NOR or NAND (NOT is just doing them on a register with itself), or you can make the immediate specify one of 32 possible one-operand ALU functions or be some sort of mask for altering individual bytes in the register. There are all sorts of ideas, but it's pretty inefficient to just leave 5 unused bits.
Another issue I see is immediate values. Using a 16-bit instruction to load a 4-bit immediate is terribly inefficient. It wouldn't be so bad if you were trying to make an esoteric language that's designed to be hard to program, but for a practical CPU, you need better ways to load immediates.
Since all jumps are based on registers, the system of loading immediates would affect jumps too. You should also have a "jump and link" or "call" that saves the return address in a register before making a jump in order to call subroutines. You could even use the spare bit in j and use a special register for return (like $31 in MIPS). But if you don't care about practicality or position-independent code, you could save the return address manually by loading immediate values into a register.
Name:
Anonymous2011-09-24 1:36
>>42
Thanks for the insightful advice! Do you have any ideas for loading immediates with 16-bit ops?
>>44
fuck, why didn't I think of using that ASCII back when tdavis was around,
Name:
Anonymous2011-09-24 1:55
>>43
You could use three opcode bits to make an 8-bit immediate and have the instruction shift the register left by 8 bits before loading the immediate into the lower 8 bits. That way 8 instructions is the maximum needed to load any 64-bit value instead of 25 (3 per byte+1 for shift count) like before.
Name:
Anonymous2011-09-24 1:58
>>47
Sounds good. However, wouldn't that force me to a maximum of eight instructions, as my opcode field is now 3 bits long? Or are you suggesting I use some sort of context-dependent/variable-length opcode field?
Name:
Anonymous2011-09-24 2:00
>>47
(On second thought, I could split the six-bit opcode field into two three-bit opcode fields, and sometimes only use the first opcode field, and sometimes use both fields)
Name:
Anonymous2011-09-24 2:08
>>49
Yes, I meant something like this. For immediate (and whatever others), 3 bits from the opcode field are combined with the other immediate field to form an 8-bit immediate. For the other instructions, the whole 6 bits is used as the opcode.
>>42
Tricks like using instructions that happen to have no side-effects is not a good idea in the long run. When you're trying to get more performace later on these things invariably cause problems. For instance MIPS32 added a dedicated NOP instruction.
>>43
Fixed-width instruction sets usually allow loading only small immediates, and use PC-relative addressing and constant pools for larger values. See SuperH for one 32-bit instruction set using 16-bit fixed-width instructions.
Name:
Anonymous2011-09-24 2:46
ekuwap/2 has a five bit address field & is using 8 bit numbers, so is limited to 32 Bytes of memory?
Name:
Anonymous2011-09-24 2:48
>>53
The address field is a register reference. There are 32 registers (which is why each register reference takes 5 bits). Each register can hold 32, 64, 128, ... bits depending on the CPU's bit mode.
Name:
Anonymous2011-09-24 3:07
>>52
MIPS manual: The NOP instruction is actually encoded as an all-zero instruction. MIPS processors special-case this encoding as performing no operation, and optimize execution of the instruction. In addition, SSNOP instruction, takes up one issue cycle on any processor, including super-scalar implementations of the architecture.
ALPHA reference manual: Implementations are free to optimize these into no
action and zero execution cycles.
MIPS's dedicated NOP (SSNOP) is for filling coprocessor or FPU delay slots. Nearly all RISCs and some CISCs use NOP as a synonym for some other do-nothing instruction and then special-case it in the hardware since they know there is no other reason for a programmer to use that instruction.
The all-zero MIPS NOP is actually sll $0, $0, 0. PowerPC (ori r0,r0,0), ALPHA (LDQ_U R31,0(Rx) for "UNOP", BIS R31,R31,R31 for "NOP", and CPYS F31,F31,F31 for "FNOP"), SPARC (sethi 0,%g0), ARM (MOV r0,r0) and S/360 (BC 0, "branch never") are other architectures that do similar things as MIPS and x86 regarding NOPs.
RISC design includes "synthetic instructions" which are practical because of the fixed-length instructions. In something like 68k there is both CLR.L and MOVE.L because instructions are variable length. In RISCs, there's no point in making a separate CLR instruction because it would be the same size and speed as XORing the register with itself or loading immediate 0 and would just complicate decoding and waste opcode space.
With only a maximum of 64 opcodes, explicit compare, and no mention of any delay slots or coprocessors, I don't think a dedicated NOP would be necessary for this particular CPU. Even if there was an FPU with exposed pipeline, you could special-case OR R0, R0 for the no cycle NOP and use OR with any other registers for the one-cycle NOP, so there's still no need to waste opcodes for a dedicated NOP.
Being a virtual machine, you could simulate L1/L2 cache & etc?
Name:
Anonymous2011-09-24 3:28
>>58
Yes. Well, it is currently implemented only as a very, very basic emulator with direct, physical memory and 32 registers, but I'm sure it could be improved. It could also, theoretically, be physically manufactured, but only if I improve this ISA significantly.
Name:
Anonymous2011-09-24 3:41
The super-H // 32-bit over a fixed 16-bit instruction set just uses the space of two instructions for some/all instructions, yeah?
So you could call it an 'Aligned Variable-width' instruction set?
Is there much benefit in squeezing more instructions into a bit less space like this?
>>55
68k has redundancies because orthogonality was one of the design goals. For XORing a register with itself to be fast it needs to be special-cased in the implementation and handled as a CLR internally, otherwise you're just adding pipeline stalls.
>>60
All SuperH instructions are 16 bits. Small instructions can reduce code size, and performs better on a narrow data bus. There's plenty of material analyzing ARM vs. Thumb from different perspectives. Thumb-2 tries to pack the most used instructions into 16 bits.
...a 32 bit instruction set where all instructions are 16 bit..?
so just the numbers / registers are 32 bit?
Name:
Anonymous2011-09-24 4:28
>>62
The instructions are always 16 bits long.
The CPU architecture is not fixed to any word length; it can run in 16-bit, 32-bit, 64-bit, 128-bit, ...
Name:
Anonymous2011-09-24 4:50
@ OP
Why dont you try out a non instruction set computer (NISC)?
- the opcode field is split into two halves; instructions that want 8-bit immediates can use a three-bit opcode
- the `i' instruction now shifts a register 8 bits to the left, and loads an 8-bit immediate into it
- `j' has been renamed to `jc' and a new, unconditional `j' is born because there is no way to do an unconditional jump with the old `j'
- a new `r' instruction is added to move among registers
- the comment column is cleaned up with clearer meanings
Name:
Anonymous2011-09-24 5:33
Example: loading 0x12345678 into the first register takes only four instructions now.
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#define MEM 134217728
#define BITS 64
#define uw uint64_t
#define sw int64_t
#define DPRINT printf
int main() {
uint8_t *mem = calloc(MEM, 1);
uint16_t *memi = (uint16_t *) mem, ins;
uint32_t stat = 0;
uw cia, nia, insoa, insob, insi, insa, insb;
uw *reg = calloc(32, BITS / 8);
memi[0] = (1 << 13) | (0x12 << 5) | (0);
memi[1] = (1 << 13) | (0x34 << 5) | (0);
memi[2] = (1 << 13) | (0x56 << 5) | (0);
memi[3] = (1 << 13) | (0x78 << 5) | (0);
while (cia < MEM / 2) {
nia = cia + 1;
ins = memi[cia];
insoa = ins >> 13;
insob = (ins >> 10) & 7;
insi = (ins >> 5) & 255;
insa = (ins >> 5) & 31;
insb = ins & 31;
if (insoa && insob)
DPRINT("0x%016x:", ins);
switch (insoa) {
case 0: switch (insob) {
case 0: break;
case 1: reg[insa] = ~reg[insa]; break;
case 2: reg[insa] &= reg[insb]; break;
case 3: reg[insa] |= reg[insb]; break;
case 4: reg[insa] ^= reg[insb]; break;
case 5: reg[insa] <<= reg[insb]; break;
case 6: reg[insa] >>= reg[insb]; break;
case 7: reg[insa] = (uw)(((sw)(reg[insa])) >>
reg[insb]); break;
}
case 1: reg[insb] = (reg[insb] << 8) | insi; break;
case 2: switch (insob) {
case 0: reg[insa] = ((reg[insa] >> 8) << 8) |
mem[reg[insb]]; break;
case 1: mem[reg[insa]] = reg[insb]; break;
case 2: mem[reg[insa]] = mem[reg[insb]]; break;
case 3: reg[insa] = reg[insb]; break;
}
case 3: switch (insob) {
case 0: reg[insa] += reg[insb]; break;
case 1: reg[insa] -= reg[insb]; break;
}
case 4: switch (insob) {
case 0: stat = ((stat >> 3) << 3) |
((reg[insa] == reg[insb]) ? 1 :
(reg[insa] < reg[insb]) ? 2 : 4); break;
case 1: nia = ((insb & 1) ? cia : 0) + reg[insa]; break;
case 2: if ((stat & 7) & ((insb >> 1) & 7))
nia = ((insb & 1) ? cia : 0) + reg[insa]; break;
}
case 5: break;
case 6: break;
case 7: break;
}
if (insoa && insob) {
int i;
for (i = 0; i < 31; i++)
DPRINT(" %lx", reg[i]);
DPRINT("\n");
}
cia = nia;
}
free(mem);
free(reg);
return 0;
}
>>43
16-bit opcode followed by 8, 16, or 32-bit immediate value. 16 is already too large for most common operations.
RISC architecture was initially designed to allow higher clockspeeds, but we now know that the laws of physics don't let us go much faster than what we have today. Thus we're shifting back to more powerful instructions that can execute multiple operations and enhance parallelism. It's not about single-cycle instructions and boosting clock frequency anymore --- it's about decoding and executing more instructions per clock. RISC design has fared better in embedded systems, where a simple CPU core that can be integrated into a SoC with low cost is more important than absolute execution speed.
>>42,55
If you have register-register move there is already plenty of opportunities for NOPs (one per register). One way to alleviate this redundancy is to not allow moves between two same registers by either putting other instructions in those places or splitting the registers into two blocks so a move must go from one block to the other.
20 years ago, when the RISC fad was just starting, I predicted it would end eventually and architectures would start moving in the other direction again. I also foresaw the use of RISC in embedded applications.
m would seem to work given that the two register references are the same, and thus their register values are equal, and thus the effective addresses are equal. However, what if the register value represents an effective address that is outside the available memory? It would have undefined behaviour or crash.
and's `src' is a register reference, not an immediate value, so you can't just `and r*, 0b11111'. Same goes for or.
Therefore, it's 1120, with nop, r and jc.
Name:
Anonymous2011-09-24 6:24
Logarithmic step rotations ?
for bits used as step// rot max
1 bit -> (Rot 0?? // nop?) & Rot 1
2 bit -> (Rot 2) & Rot 4
3rd bit -> (Rot 8) & Rot 16
...
>>75
From the instruction set, your CPU has absolutely no memory protection/paging/etc. so I'm assuming it's a simple "open" type like a Z80 or 6502. If you access memory that doesn't exist you would just read the value from a floating databus (FFs if there's termination/pullups, other values are possible but it doesn't matter here) and try to write it to the same nonexistent location, so nothing actually happened.
and's `src' is a register reference, not an immediate value, so you can't just `and r*, 0b11111'. Same goes for or.
Read up on boolean algebra identities. Specifically idempotence.
Name:
Anonymous2011-09-24 6:34
>>75
Both the src and the dest for AND and OR are registers, right? As I'm sure you know, ANDing or ORing a number with itself produces that same number, which is why so many RISCs use them as NOPs or (three-operand versions) as register-register moves.
Name:
Anonymous2011-09-24 6:36
>>78
Whoops, I had a major blank of the mind there. Sorry about that.
Of course, `a = a & a' has no effect. Neither does `a = a | a'. Therefore, `and r*, r*' and `or r*, r*' are no-ops when the registers are the same.
Plenty of room!
make sl and sr Opcode 101 and 110 respectively, and you can use
opcode#2 as the log-step variable
can do a Rot 127 (if possible) in 7 instructions using just one... and in 3 instructions using both? shiftRight:64 + shiftRight:64 + shiftLeft:1...
Name:
Anonymous2011-09-24 6:45
>>78 If you access memory that doesn't exist you would just read the value from a floating databus (FFs if there's termination/pullups, other values are possible but it doesn't matter here) and try to write it to the same nonexistent location, so nothing actually happened.
This is usually but isn't always true because of memory-mapped I/O. Sometimes reading from a memory location and writing a value back (even the same value) has side-effects. Especially since that's probably how this CPU will do I/O since there are no IN/OUT port instructions. I know that on the SNES there are some memory-mapped I/O ports that are "open bus" on read but valid on write or that have destructive reads. An m instruction with the same register on those is definitely not a NOP!
Name:
Anonymous2011-09-24 7:10
>>83
If this CPU didn't have multiple-bit shifts, this would be a good idea (the IBM 5110 can do any 8-bit rotate in 3 instructions by doing this), but it can already shift by any number of bits in one instruction, so a hardware implementation would use a barrel shifter. In that case rotate can use the same format as shifts and then it could rotate by any number of bits in one instruction. It wouldn't make sense to use multiple rotate instructions in an architecture that already needs a barrel shifter and can already do them in one cycle.
>>84
When I wrote "memory that doesn't exist" I was referring to the truly nonexistent parts of the memory address space, which have no hardware on the bus to respond to.
>>85
To be complete, the whole family of shift/rotates is
* left shift 0-pad
* right shift 0-pad
* left shift 1-pad (not as useful but included for completeness)
* right shift 1-pad (aka "arithmetic"/sign-preserving shift)
* left rotate
* right rotate
* left rotate through carry
* right rotate through carry
...which is conveniently encoded in 3 bits.
That also raises another point: Your CPU is missing multiple-precision arithmetic (ADC/SBC) instructions.
Name:
Anonymous2011-09-24 7:42
>>86
I only quoted part of your post. You also counted m instructions using the same source and destination registers as NOPs (by assuming it's either RAM or nothing). If the register points to an I/O address, it isn't a NOP. That sort of thing is why C has the volatile keyword.
Name:
Anonymous2011-09-24 7:53
>>82
could use something like this if you ran out of op-codes
i think most of the instructions so far would fit in just five bits of opcode, bar that 'i, so what op/? could do is use the very first bit as a 16-bit / 24-bit length instruction signal?
...full new 23 bit space to use//larger address spaces + more opcodes?
...it's a single instruction stack? (It only holds the current instruction..?)
Reg Address compressed instructions?
eg using some signal/marker to specify re-using the last used register? / L1 / etc? [special cased repeats? reuse entire last instruct?] /&/ SIMD-like instructions (single op-code with many source/dest's)?
Name:
Anonymous2011-09-25 3:19
...Also, there are no base-level [array / memory in general]-specific instructions..?
stack push / pop..?
...mnemonic i could be a push (stack size is unknown // first in last out access order?) and takes 8 * >>1 left rot's to return a val? // can only push 8-bits at once/ and pop one bit per instruct?
Barrel rotate (?) means speed is probably not an issue..? But you still use 16 bytes of code for a rotate?
...special pre-emptive source/destination range instructions?
the idea being to cut out a bunch of fairly plain source/dest fields in the usual instructs, instead using a bit larger preempt code followed by x micro 8bit-all-opcode(?) instructions..
>>100
Not really. That's hardly "complicated", and multibyte instructions/multiple single-byte ones could be easily decoded in a single clock if the databus is wide enough.
Look at how the x86 decoder works. It can determine the instruction length in 1 cycle, and do it for multiple instructions at once. i7s are not "slow as fuck" either.
>>102
If you can determine the total number of instruction bytes from the first byte then sure. This was one of the limiting factors of the VAX. Intel also spends a lot of resources hand-optimizing transistors. It's not really babby's first CPU material.
you could also build, then use, a single instruction to specify the length of the next n instructions? /depending on how much they vary might be as little as one bit per instrct.
can always compress the most common/smallest? length(s) with "shannon's entropy"(?) i think its called, right down to one bit regardless of the number of items, as long as you don't mind a bit of expansion..
1, 01, 001, ... // 01, 10, 001, 110, ... <<this
reminds me of an old run-code compression i tried to invent after reading once =) never got that to work..
Name:
Anonymous2011-09-25 8:49
...think i was doing it backwards?
2 bit -> 1 / 01 / 001 / 000 (typically expands... But because it is complete in both directions, can be used in either direction eg [2 bit] <---> [1 / 01 / 001 / 000]
for comparison a block of data can be broken up into 2 bit pieces and represented as [1 / 01 / 001 or 0001] but not the other way round..
same for [01 / 10 / 001 / 110 / 000 / 111] -> 3 bit but 3bit -/-> [01 / 10 / 001 / 110 / 000 / 111]
a good three bit
[3bits] <--> 01 / 10 / 001 / 110 / 0001 / 1110 / 0000 / 1111
>>102
i7 still uses RISC representations internally, right?
To counter a earlier argument you made, the RISC philosophy is still very applicable when it comes to doing more per clock cycle.
Fancy superscalar tricks like out-of-order execution require very efficient decoding, and IIRC Intel's processor does all of that after the conversion stage.
>>108
Micro-ops, which are even simpler representations than RISC instructions. The "RISC philosophy" you're referring to is different from what I was referring to; the latter would be the single-cycle-instruction pipelined CPU model that they still teach in CS classes (unfortunately), often along with now-outdated points like "we can increase clock speed if we make instructions simpler". Superscalar and OOE requires multiple decodes per clock too, which is facilitated by short/variable-length instructions. [Assuming a 32-bit databus, a fixed-length 32-bit instruction RISC would be able to decode one instruction per fetch, while e.g. x86 could decode 4 1-byte instructions --- and execute them in parallel if they're things like 4 independent register increments.]