Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

RISC

Name: Anonymous 2011-09-22 9:29

/prog/, I'm working on a RISC instruction set that is more RISC than any RISC out there. I've devised an instruction format, plus fifteen core instructions that should suffice for any programming out there. The instruction format, instruction forms, and instructions can be found here:

http://jsbin.com/ekuwap

Any comments or suggestions?

Name: Anonymous 2011-09-24 0:05

>>39
>>40
Fuck off, nenti.

Name: Anonymous 2011-09-24 1:11

In the 32-bit version you could just OR or AND the register with itself into a new register, but since you decided to switch to 16-bit two-operand instructions, a register-register move should be added so you can fit the equivalent to a three-operand operation into 32 bits.

NOP could be removed because you can OR a register with itself for the same effect. On MIPS, NOP is sll $0, $0, 0 (4 bytes). On x86, it's XCHG AX, AX (1 byte). Some other architectures use "branch never" or a "zero register" for this. You'd only need a dedicated NOP if you're using variable-length instructions and it's impossible to make an instruction that does nothing in the minimal instruction length.

NOT could be replaced by a two-operand "move complemented" or a NOR or NAND (NOT is just doing them on a register with itself), or you can make the immediate specify one of 32 possible one-operand ALU functions or be some sort of mask for altering individual bytes in the register. There are all sorts of ideas, but it's pretty inefficient to just leave 5 unused bits.

Another issue I see is immediate values. Using a 16-bit instruction to load a 4-bit immediate is terribly inefficient. It wouldn't be so bad if you were trying to make an esoteric language that's designed to be hard to program, but for a practical CPU, you need better ways to load immediates.

Since all jumps are based on registers, the system of loading immediates would affect jumps too. You should also have a "jump and link" or "call" that saves the return address in a register before making a jump in order to call subroutines. You could even use the spare bit in j and use a special register for return (like $31 in MIPS). But if you don't care about practicality or position-independent code, you could save the return address manually by loading immediate values into a register.

Name: Anonymous 2011-09-24 1:36

>>42
Thanks for the insightful advice! Do you have any ideas for loading immediates with 16-bit ops?

Name: Anonymous 2011-09-24 1:38

>>39,40

    ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
    █     ___                        █
    █    //  7                       █
    █   (_,_/\       WORSHIP THIS    █
    █    \    \   YOUR THROBBING GOD █
    █     \    \                     █
    █     _\    \__                  █
    █    (   \     )                 █
    █     \___\___/                  █
    █                                █
    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

Name: n3n7i 2011-09-24 1:39

please play with my balls

Name: Anonymous 2011-09-24 1:48

>>44
fuck, why didn't I think of using that ASCII back when tdavis was around,

Name: Anonymous 2011-09-24 1:55

>>43
You could use three opcode bits to make an 8-bit immediate and have the instruction shift the register left by 8 bits before loading the immediate into the lower 8 bits. That way 8 instructions is the maximum needed to load any 64-bit value instead of 25 (3 per byte+1 for shift count) like before.

Name: Anonymous 2011-09-24 1:58

>>47
Sounds good. However, wouldn't that force me to a maximum of eight instructions, as my opcode field is now 3 bits long? Or are you suggesting I use some sort of context-dependent/variable-length opcode field?

Name: Anonymous 2011-09-24 2:00

>>47
(On second thought, I could split the six-bit opcode field into two three-bit opcode fields, and sometimes only use the first opcode field, and sometimes use both fields)

Name: Anonymous 2011-09-24 2:08

>>49
Yes, I meant something like this. For immediate (and whatever others), 3 bits from the opcode field are combined with the other immediate field to form an 8-bit immediate. For the other instructions, the whole 6 bits is used as the opcode.

Name: Anonymous 2011-09-24 2:13

>>50
Thanks a lot!

Name: Anonymous 2011-09-24 2:18

>>42
Tricks like using instructions that happen to have no side-effects is not a good idea in the long run. When you're trying to get more performace later on these things invariably cause problems. For instance MIPS32 added a dedicated NOP instruction.

>>43
Fixed-width instruction sets usually allow loading only small immediates, and use PC-relative addressing and constant pools for larger values. See SuperH for one 32-bit instruction set using 16-bit fixed-width instructions.

Name: Anonymous 2011-09-24 2:46


ekuwap/2 has a five bit address field & is using 8 bit numbers, so is limited to 32 Bytes of memory?

Name: Anonymous 2011-09-24 2:48

>>53
The address field is a register reference. There are 32 registers (which is why each register reference takes 5 bits). Each register can hold 32, 64, 128, ... bits depending on the CPU's bit mode.

Name: Anonymous 2011-09-24 3:07

>>52
MIPS manual:
The NOP instruction is actually encoded as an all-zero instruction. MIPS processors special-case this encoding as performing no operation, and optimize execution of the instruction. In addition, SSNOP instruction, takes up one issue cycle on any processor, including super-scalar implementations of the architecture.
ALPHA reference manual:
Implementations are free to optimize these into no
action and zero execution cycles.

MIPS's dedicated NOP (SSNOP) is for filling coprocessor or FPU delay slots. Nearly all RISCs and some CISCs use NOP as a synonym for some other do-nothing instruction and then special-case it in the hardware since they know there is no other reason for a programmer to use that instruction.

The all-zero MIPS NOP is actually sll $0, $0, 0. PowerPC (ori r0,r0,0), ALPHA (LDQ_U R31,0(Rx) for "UNOP", BIS R31,R31,R31 for "NOP", and CPYS F31,F31,F31 for "FNOP"), SPARC (sethi 0,%g0), ARM (MOV r0,r0) and S/360 (BC 0, "branch never") are other architectures that do similar things as MIPS and x86 regarding NOPs.

RISC design includes "synthetic instructions" which are practical because of the fixed-length instructions. In something like 68k there is both CLR.L and MOVE.L because instructions are variable length. In RISCs, there's no point in making a separate CLR instruction because it would be the same size and speed as XORing the register with itself or loading immediate 0 and would just complicate decoding and waste opcode space.

With only a maximum of 64 opcodes, explicit compare, and no mention of any delay slots or coprocessors, I don't think a dedicated NOP would be necessary for this particular CPU. Even if there was an FPU with exposed pipeline, you could special-case OR R0, R0 for the no cycle NOP and use OR with any other registers for the one-cycle NOP, so there's still no need to waste opcodes for a dedicated NOP.

Name: Anonymous 2011-09-24 3:14

>>54 Ah

also, do you pad out the 'Immediate operand' with zeros, or does the next instruction directly follow a nop instruction opcode?

Name: Anonymous 2011-09-24 3:15

>>56
Could you elaborate on your question?

Name: Anonymous 2011-09-24 3:23

>>57 actually nevermind
...Fixed width instructions obviously

Being a virtual machine, you could simulate L1/L2 cache & etc?

Name: Anonymous 2011-09-24 3:28

>>58
Yes. Well, it is currently implemented only as a very, very basic emulator with direct, physical memory and 32 registers, but I'm sure it could be improved. It could also, theoretically, be physically manufactured, but only if I improve this ISA significantly.

Name: Anonymous 2011-09-24 3:41

The super-H // 32-bit over a fixed 16-bit instruction set just uses the space of two instructions for some/all instructions, yeah?

So you could call it an 'Aligned Variable-width' instruction set?

Is there much benefit in squeezing more instructions into a bit less space like this?

Name: Anonymous 2011-09-24 3:54

>>55
68k has redundancies because orthogonality was one of the design goals. For XORing a register with itself to be fast it needs to be special-cased in the implementation and handled as a CLR internally, otherwise you're just adding pipeline stalls.

>>60
All SuperH instructions are 16 bits. Small instructions can reduce code size, and performs better on a narrow data bus. There's plenty of material analyzing ARM vs. Thumb from different perspectives. Thumb-2 tries to pack the most used instructions into 16 bits.

Name: Anonymous 2011-09-24 4:10

...a 32 bit instruction set where all instructions are 16 bit..?
so just the numbers / registers are 32 bit?

Name: Anonymous 2011-09-24 4:28

>>62
The instructions are always 16 bits long.
The CPU architecture is not fixed to any word length; it can run in 16-bit, 32-bit, 64-bit, 128-bit, ...

Name: Anonymous 2011-09-24 4:50

@ OP

Why dont you try out a non instruction set computer (NISC)?

http://www.ics.uci.edu/~jelenat/pubs/TR05-09.pdf

This baby allows you to make the best use of your fukken transistors, if that is, you can write a compiler good enough for it.

Name: Anonymous 2011-09-24 5:11

Presenting the third version of my RISC ISA:

http://jsbin.com/ekuwap/3

Changes:

- the opcode field is split into two halves; instructions that want 8-bit immediates can use a three-bit opcode
- the `i' instruction now shifts a register 8 bits to the left, and loads an 8-bit immediate into it
- `j' has been renamed to `jc' and a new, unconditional `j' is born because there is no way to do an unconditional jump with the old `j'
- a new `r' instruction is added to move among registers
- the comment column is cleaned up with clearer meanings

Name: Anonymous 2011-09-24 5:33

Example: loading 0x12345678 into the first register takes only four instructions now.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#define MEM 134217728
#define BITS 64
#define uw uint64_t
#define sw int64_t
#define DPRINT printf
int main() {
    uint8_t *mem = calloc(MEM, 1);
    uint16_t *memi = (uint16_t *) mem, ins;
    uint32_t stat = 0;
    uw cia, nia, insoa, insob, insi, insa, insb;
    uw *reg = calloc(32, BITS / 8);
    memi[0] = (1 << 13) | (0x12 << 5) | (0);
    memi[1] = (1 << 13) | (0x34 << 5) | (0);
    memi[2] = (1 << 13) | (0x56 << 5) | (0);
    memi[3] = (1 << 13) | (0x78 << 5) | (0);
    while (cia < MEM / 2) {
        nia = cia + 1;
        ins = memi[cia];
        insoa = ins >> 13;
        insob = (ins >> 10) & 7;
        insi = (ins >> 5) & 255;
        insa = (ins >> 5) & 31;
        insb = ins & 31;
        if (insoa && insob)
            DPRINT("0x%016x:", ins);
        switch (insoa) {
            case 0: switch (insob) {
                case 0: break;
                case 1: reg[insa] = ~reg[insa]; break;
                case 2: reg[insa] &= reg[insb]; break;
                case 3: reg[insa] |= reg[insb]; break;
                case 4: reg[insa] ^= reg[insb]; break;
                case 5: reg[insa] <<= reg[insb]; break;
                case 6: reg[insa] >>= reg[insb]; break;
                case 7: reg[insa] = (uw)(((sw)(reg[insa])) >>
                    reg[insb]); break;
            }
            case 1: reg[insb] = (reg[insb] << 8) | insi; break;
            case 2: switch (insob) {
                case 0: reg[insa] = ((reg[insa] >> 8) << 8) |
                    mem[reg[insb]]; break;
                case 1: mem[reg[insa]] = reg[insb]; break;
                case 2: mem[reg[insa]] = mem[reg[insb]]; break;
                case 3: reg[insa] = reg[insb]; break;
            }
            case 3: switch (insob) {
                case 0: reg[insa] += reg[insb]; break;
                case 1: reg[insa] -= reg[insb]; break;
            }
            case 4: switch (insob) {
                case 0: stat = ((stat >> 3) << 3) |
                    ((reg[insa] == reg[insb]) ? 1 :
                    (reg[insa] < reg[insb]) ? 2 : 4); break;
                case 1: nia = ((insb & 1) ? cia : 0) + reg[insa]; break;
                case 2: if ((stat & 7) & ((insb >> 1) & 7))
                    nia = ((insb & 1) ? cia : 0) + reg[insa]; break;
            }
            case 5: break;
            case 6: break;
            case 7: break;
        }
        if (insoa && insob) {
            int i;
            for (i = 0; i < 31; i++)
                DPRINT(" %lx", reg[i]);
            DPRINT("\n");
        }
        cia = nia;
    }
    free(mem);
    free(reg);
    return 0;
}


Output:

0x0000000000002680: 1234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0x0000000000002ac0: 123456 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0x0000000000002f00: 12345678 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2011-09-24 5:42

>>43
16-bit opcode followed by 8, 16, or 32-bit immediate value. 16 is already too large for most common operations.

RISC architecture was initially designed to allow higher clockspeeds, but we now know that the laws of physics don't let us go much faster than what we have today. Thus we're shifting back to more powerful instructions that can execute multiple operations and enhance parallelism. It's not about single-cycle instructions and boosting clock frequency anymore --- it's about decoding and executing more instructions per clock. RISC design has fared better in embedded systems, where a simple CPU core that can be integrated into a SoC with low cost is more important than absolute execution speed.

>>42,55
If you have register-register move there is already plenty of opportunities for NOPs (one per register). One way to alleviate this redundancy is to not allow moves between two same registers by either putting other instructions in those places or splitting the registers into two blocks so a move must go from one block to the other.

20 years ago, when the RISC fad was just starting, I predicted it would end eventually and architectures would start moving in the other direction again. I also foresaw the use of RISC in embedded applications.

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2011-09-24 5:50

>>65
1280 of your instructions are NOPs (find them all!)

Name: Anonymous 2011-09-24 5:56

>>68
nop (reg0=idgaf, reg1=idgaf) = 1024 instructions
r (reg0=idgaf, reg1=idgaf, reg0=reg1) = 32 instructions
jc (reg0=idgaf, mask=0b000, r=idgaf) = 64 instructions

I've found 1120 of the 1280 instructions.

Name: Anonymous 2011-09-24 5:58

>>68
Care to share the other 160 of them? I can't find any others that work regardless of register state.

Name: Anonymous 2011-09-24 6:09

mnemonic 'i' could use a single 'on' bit as an opcode
// nop could be made to be a half-length instruction
whether that would be useful though...?

Name: Anonymous 2011-09-24 6:10

>>71
Get out, n3n7i.

Name: Anonymous 2011-09-24 6:13

Fixed instruction jump addressing and incorrect fall-throughs on switches:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#define MEM 134217728
#define BITS 64
#define uw uint64_t
#define sw int64_t
#define DPRINT printf
int main() {
    uint8_t *mem = calloc(MEM, 1);
    uint16_t *memi = (uint16_t *) mem, ins;
    uint32_t stat = 0;
    uw cia, nia, inso, insoa, insob, insi, insa, insb;
    uw *reg = calloc(32, BITS / 8);
    while (cia < MEM / 2) {
        nia = cia + 1;
        ins = memi[cia];
        inso = ins >> 10 & 63;
        insoa = ins >> 13;
        insob = (ins >> 10) & 7;
        insi = (ins >> 5) & 255;
        insa = (ins >> 5) & 31;
        insb = ins & 31;
        if (inso) {
            int i = 16;
            DPRINT("%016lx: ", cia);
            while (i--)
                DPRINT("%c", ((ins >> i) & 1) ? '1' : '0');
            DPRINT(":");
        }
        switch (insoa) {
            case 0: switch (insob) {
                case 0: break;
                case 1: reg[insa] = ~reg[insa]; break;
                case 2: reg[insa] &= reg[insb]; break;
                case 3: reg[insa] |= reg[insb]; break;
                case 4: reg[insa] ^= reg[insb]; break;
                case 5: reg[insa] <<= reg[insb]; break;
                case 6: reg[insa] >>= reg[insb]; break;
                case 7: reg[insa] = (uw)(((sw)(reg[insa])) >>
                    reg[insb]); break;
            } break;
            case 1: reg[insb] = (reg[insb] << 8) | insi; break;
            case 2: switch (insob) {
                case 0: reg[insa] = ((reg[insa] >> 8) << 8) |
                    mem[reg[insb]]; break;
                case 1: mem[reg[insa]] = reg[insb]; break;
                case 2: mem[reg[insa]] = mem[reg[insb]]; break;
                case 3: reg[insa] = reg[insb]; break;
            } break;
            case 3: switch (insob) {
                case 0: reg[insa] += reg[insb]; break;
                case 1: reg[insa] -= reg[insb]; break;
            } break;
            case 4: switch (insob) {
                case 0: stat = ((stat >> 3) << 3) |
                    ((reg[insa] == reg[insb]) ? 1 :
                    (reg[insa] < reg[insb]) ? 2 : 4); break;
                case 1: nia = (((insb & 1) ? cia : 0) + reg[insa]) >> 1; break;
                case 2: if ((stat & 7) & ((insb >> 1) & 7))
                    nia = (((insb & 1) ? cia : 0) + reg[insa]) >> 1; break;
            } break;
            case 5: break;
            case 6: break;
            case 7: break;
        }
        if (inso) {
            int i;
            for (i = 0; i < 32; i++)
                DPRINT(" %lx", reg[i]);
            DPRINT("\n");
        }
        cia = nia;
    }
    free(mem);
    free(reg);
    return 0;
}

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2011-09-24 6:16

>>70
Nevermind, I double-counted some.

nop 1024
and 32
or 32
m 32
r 32
jc 64

Total is 1216.

Name: Anonymous 2011-09-24 6:19

>>74

m would seem to work given that the two register references are the same, and thus their register values are equal, and thus the effective addresses are equal. However, what if the register value represents an effective address that is outside the available memory? It would have undefined behaviour or crash.

and's `src' is a register reference, not an immediate value, so you can't just `and r*, 0b11111'. Same goes for or.

Therefore, it's 1120, with nop, r and jc.

Name: Anonymous 2011-09-24 6:24

Logarithmic step rotations ?

for bits used as step// rot max

1 bit -> (Rot 0?? // nop?) & Rot 1
2 bit -> (Rot 2) & Rot 4
3rd bit -> (Rot 8) & Rot 16
...

Name: Anonymous 2011-09-24 6:26

Bunged that up hey

3rd bit -> Rot 8 / 16 / 32 / 64 ..?

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2011-09-24 6:28

>>75
From the instruction set, your CPU has absolutely no memory protection/paging/etc. so I'm assuming it's a simple "open" type like a Z80 or 6502. If you access memory that doesn't exist you would just read the value from a floating databus (FFs if there's termination/pullups, other values are possible but it doesn't matter here) and try to write it to the same nonexistent location, so nothing actually happened.

and's `src' is a register reference, not an immediate value, so you can't just `and r*, 0b11111'. Same goes for or.
Read up on boolean algebra identities. Specifically idempotence.

Name: Anonymous 2011-09-24 6:34

>>75
Both the src and the dest for AND and OR are registers, right? As I'm sure you know, ANDing or ORing a number with itself produces that same number, which is why so many RISCs use them as NOPs or (three-operand versions) as register-register moves.

Name: Anonymous 2011-09-24 6:36

>>78
Whoops, I had a major blank of the mind there. Sorry about that.

Of course, `a = a & a' has no effect. Neither does `a = a | a'. Therefore, `and r*, r*' and `or r*, r*' are no-ops when the registers are the same.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List