/prog/ - Merging CPU cores

Name: Anonymous 2013-08-02 2:12

I have an idea to increase single-thread performance.
CPU cores should be merged into one macrocore which give any core any available resources.
Example: you want to multiply 100 vectors in one non-multithreaded function.
Normally:one core will execute the operations using its own resources.
With macrocore:the multiplication will be assigned to all available SIMD/FPU/ALU units in the macrocore(all core units).

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-08-03 5:58

>>25
Unfortunately async logic is another one of those concepts that sounds really nice in theory, but in reality is very difficult to make work correctly and verify especially for circuits the size of modern CPUs. I read this quite recently:
http://www.amazon.com/Principles-Asynchronous-Circuit-Design-Perspective/dp/0792376137
Basically, the issue is that while things look very nice and workable from a digital perspective (e.g. C-gates), real circuits are analogue and signals don't just sharply transition between 0 and 1. In synchronous circuits the clock helps to prevent any of this analogue-ness from having too much of an effect on a signal as it propagates through a circuit (e.g. bursts of noise between clock edges simply disappear), but an async one can easily get into a state where small amounts of injected noise or variation in signal levels due to the process gradually accumulate and break everything. With synchronous designs special care is taken to ensure the clock signal is glitch-free and routed for as equal delays as possible; with asynchronous designs, every signal is a clock.

>>27
Amusing, as x86 has been leading innovation for the past 3 decades or so.

>>28
That's several exceptions. See just about every smartphone/tablet/router/etc. produced. You think Broadcom makes chips only for the Pi?

CPU innovation will be about adding more cores, adding more opcodes or making opcodes faster
Precisely. And x86 has a LOT of "room for improvement" particularly for the latter two. CISC is inherently more efficient when memory bandwidth is the bottleneck, early (pre-P5) x86 was slow because of microarchitecture and not instruction set; e.g. 808{6,8} had only a single-ported register file and ALU, thus 3 cycles for alu reg, reg: read source to ALU, read dest to ALU, write result to dest. One shared databus for everything meant lots of "siding" stuff in temporary registers. But with more modern microarchs like Nehalem there are multiple read/write ports and internal databuses, so they're no longer slow.

Merging CPU cores

1 Name: Anonymous 2013-08-02 2:12

33 Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-08-03 5:58

Name: Anonymous 2013-08-02 2:12

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-08-03 5:58