Make your own instruction set for your own made up processor. How many bits can it access? What kind of tasks will it be used for? Bonus points for making an implementation in sepples, FIOC, lisp, or other GENERAL PURPOSE SCALABLE COST-EFFECTIVE ENTERPRISE QUALITY PROGRAMMING LANGUAGES(such as java)
a set of 256 stacks, shared between 16 cores.
each stack element is 128 bits, and can be used as the real and imaginary parts of a complex number (64-bits each), a single 128-bit signed integer or floating point value, or a machine instruction. each core pops and executes the instruction on top of it's instruction stack, which is stack 0 for core 0, stack 1 for core 1, etc. the instruction set is as follows: 0x00000000 000000xx 00000000 00000000
clear x removes all elements from x (stack). 0x00000001 000000xx yyyyyyyy yyyyyyyy
drop x y removes the top y (64-bit unsigned integer) elements from x (stack). 0x00000002 00xx00yy zzzzzzzz zzzzzzzz
cdrop x y drops the top z (64-bit unsigned integer) values from x (stack) if the top two values on y (stack) are equal, otherwise does nothing. the top two values on y are popped and discarded. 0x00000003 00xx00yy zzzzzzzz zzzzzzzz
move x y z copies z (64-bit unsigned integer) elements from x (stack) to y (stack). 0x00000004 00xx00yy zzzzzzzz zzzzzzzz
copy x y z copies z (64-bit unsigned integer) elements from x (stack) to y (stack). 0x00000005 00xx00yy 00000000 00000000
swap x y swap x (stack) and y (stack). 0x00000006 000000xx yyyyyyyy yyyyyyyy
dup x y duplicates the top y (64-bit unsigned integer) elements on x (stack). 0x00000007 000000xx yyyyyyyy yyyyyyyy
rot x y rotates the top y (64-bit unsigned integer) elements on x (stack). 0x00000008 00xx00yy zzzzzzzz zzzzzzzz
uadd x y z pops and sums (as unsigned integers) the top z (number) elements from x (stack) and pushes the result onto x. if an overflow occurs, pushes the value 1 onto y (stack), otherwise pushes the value 0 onto y. 0x00000009 00xx00yy zzzzzzzz zzzzzzzz
usub x y z like uadd, but with subtraction instead of addition, and underflow instead of overflow. 0x0000000A 00xx00yy zzzzzzzz zzzzzzzz
umul x y z like uadd, but with multiplication. 0x0000000B 000000xx yyyyyyyy yyyyyyyy
udiv x y like uadd, but with division, and no possibility of underflow or overflow. 0x0000000C 00xx00yy zzzzzzzz zzzzzzzz
sadd x y z like uadd, but with signed integers instead of unsigned. if an underflow occurs, -1 is pushed onto y. 0x0000000D 00xx00yy zzzzzzzz zzzzzzzz
ssub x y z like usub, but with signed integers. if an overflow occurs, -1 is pushed onto y 0x0000000E 00xx00yy zzzzzzzz zzzzzzzz
smul x y z like sadd, but with multiplication. 0x0000000F 000000xx yyyyyyyy yyyyyyyy
sdiv x y like udiv, but with signed integers. 0x00000010 000000xx yyyyyyyy yyyyyyyy
fadd x y like sadd, but with floating point numbers instead of signed integers and no possibility of overflow. 0x00000011 000000xx yyyyyyyy yyyyyyyy
fsub x y like ssub, but with floating point numbers and no possibility of overflow. 0x00000012 000000xx yyyyyyyy yyyyyyyy
fmul x y like smul, but with floating point numbers and no possibility of overflow. 0x00000013 000000xx yyyyyyyy yyyyyyyy
fdiv x y like sdiv, but with floating point numbers. 0x00000014 000000xx yyyyyyyy yyyyyyyy
cadd x y like fadd, but with complex numbers instead of floating point numbers. 0x00000015 000000xx yyyyyyyy yyyyyyyy
csub x y like fsub, but with complex numbers. 0x00000016 000000xx yyyyyyyy yyyyyyyy
cmul x y like fmul, but with complex numbers. 0x00000017 000000xx yyyyyyyy yyyyyyyy
cdiv x y like sdiv, but with floating point numbers. 0x00000018 000000xx yyyyyyyy yyyyyyyy
and x y pops the top y (64-bit unsigned integer) values from x (stack), performs a bitwise AND on them, and pushes the result onto x. 0x00000019 000000xx yyyyyyyy yyyyyyyy
or x y like and, but with OR instead of AND. 0x0000001A 000000xx yyyyyyyy yyyyyyyy
xor x y like and, but with XOR. 0x0000001B 00xx00yy zzzzzzzz zzzzzzzz
pushm x y z pops a memory location from x (stack), and pushes z (64-bit unsigned integer) values at that location onto y (stack). 0x0000001C 00xx00yy zzzzzzzz zzzzzzzz
popm x y z pops a memory location from x (stack), then pops z (64-bit unsigned integer) values from y (stack) and stores them at that memory location.
Name:
Anonymous2008-12-01 3:07
>>43
Did you give any thought to synchronization at all? How are you supposed to do process management and content switches when your code's on a stack?
Name:
Anonymous2008-12-01 4:38
>>43
i'm not sure what you mean by "content switches"... and how would having the code on a stack possibly make it any more difficult? are you one of those people who doesn't understand stacks?
Name:
Anonymous2008-12-01 5:17
>>45
*context switches, but never mind. I was thinking the stacks would use memory at a fixed location, which doesn't need to be the case. Though then it'd be harder to cache the top of each stack in core registers, without which it'd run like a slow ass.
And while you're wasting bits on stack element count, literals and offsets seem ignored to such a degree that even basic memory access requires elaborate trickery. Slow, slow, slow.
To further assert my belief that the ISA is far too stacky, I shall now commit the fallacy of guilt by association;
You know who else uses stacks? That's right, Java does! And if he were alive, why, Hitler would as well. Nothing like pushing on a couple of jews with your good friend Qosling and popping them into the ovens, is there, you fucking Nazi‽
Though then it'd be harder to cache the top of each stack in core registers, without which it'd run like a slow ass.
it makes a lot more sense to have a large (maybe about 64MB) cpu cache to hold all of the stacks. 16384 128-bit stack elements per stack should be more than enough for most purposes. and you'd only need 28 128-bit registers to hold all the stack element counts. since we're talking about made up processors, why not have, say, 512 registers? that'd be plenty to cache the top few elements from stacks that are used a lot, and even elements that aren't cached in registers would be pretty fast to access since they're always in the cpu cache. certainly a lot faster than a machine with fewer than 32 registers.
also, read this: http://en.wikipedia.org/wiki/Burroughs_large_systems#Stack_speed_and_performance
literals and offsets seem ignored to such a degree that even basic memory access requires elaborate trickery.
yeah, sure, adding numbers is "elaborate trickery". and literals aren't ignored. literals can be handled like so (puts four literal values onto stack 16): move s0 s16 4
.data ( 0x48000000650000006c0000006c
0x6f0000002c0000002000000057
0x6f000000720000006c00000064
0x21000000000000000000000000 )
>>11
Dynamically sized registers are tricky to implement in hardware, since each bit in the register corresponds to a digital circuit, so you could say, that all resources in a CPU are preallocated(once you decide what they are, they become a fixed number of cells when synthethized). So effectively, you'll have to compromise to a maximum register size such as 32,64,128,... If you'd want to have more than the default, you'll need to emulate the operations(for example using some specific microcode and some RAM), this will probably end up incredibly slow, and the additional needed circuitry would pobably lower the speed, that said, most of Intel's CPUs(including the Core2 line) are quite the custom design where everything is hand optimized to achieve the needed performance, and it's quite unlikely (if not almost impossible) you can make something given your specs, similar to the complex x86 CPU and get better speeds than Intel, if you use the same manufacturing tech. You could achieve better speeds with a much simplified RISC architecture w/ many optimizations, full custom design, and rely on a good compiler to generate fast code.
Name:
Anonymous2009-02-26 22:49
>>59
Yep. And my magic CPU would have a functional assembly (as hinted by somebody else above) with hardware garbage collection.
Name:
Anonymous2009-02-26 23:55
>>15
He meant variable bit-length instructions, not variable bit-length registers. Sort of like how offsets/lengths in plain LZ12/4 compression are always 12 and 4 bits, but the actual data is bit-level packed.
Name:
Anonymous2009-03-06 11:53
The Desert of Indentation Wars between the Guidans and THE peculiarities of each architecture down to a grinding halt?