Looking into the future, Dave Moon says: “The illusion of random access memory is becoming increasingly unconvincing on modern hardware. Although dereferencing a pointer takes only one instruction, when the target of the pointer is not cached in the CPU that instruction can take as long to execute as 1000 ordinary instructions executed at peak speed.
...
the advantage of C++ and other conventional programming languages is being eroded in the same way. It is not unreasonable to predict that we will see widespread abandonment of the illusion of random access memory in the next two decades. The IBM Cell processor used in video games is the first crack in the dam.”
1000 instructions? Yeah, on an obscenely overclocked 1.8ghz Netburst perhaps. There, peak performance would be 3 single-µop instructions per cycle for the 333-ish cycles a slow-ass memory access on a 200mhz single channel DDR bus.
Well you might get there easier if the 4-byte access straddles a cache line boundary and both cache lines are cold. Also there could be a 3-level page table walk before that due to TLB effects, and of course the page table entries are also cold. The word required could straddle a page boundary too, but then "page 1"'s TLB entries would be in cache as soon as the one before it was fetched. So yeah, a pessimal cache situation could produce 6 misses (instruction + data x 2 + tlb x 3), 5 of which would have to be resolved serially -- at 250-ish cycles each on a 2.4ghz Netburst P4 that would mean some 3700-odd instructions, peak, not accounting for the "supercharged" speculative add/sub units.
But hey, we already know about this. Hot/cold separation is the way to fly. Also splitting code up into multiple threads results in better saturation of the memory bus.
So yeah. Using lots of memory will be slow. What else is new? (garbage collectors that don't re-scan unmodified memory are. oh wait, those don't exist! no one wants to pass page table info back to userspace. let them eat cake!)
The real kicker with the Netburst calculation above is that Intel's implementation of hyperthreading wouldn't help -- the operation that was waiting for the data to come through would be re-issued into the pipeline whenever it came to a stage where operands were required, which will be like _forever_. So it and its dependent instructions will use up 50 to 100 of the issue bandwidth available, even if they don't do a fucking thing!
EXCEPT WHEN YOU CACHE MISS, THEN ALL CACHE MISSES COST ABOUT THE SAME
Name:
Anonymous2008-01-06 4:25
>>20
You mean it is random access, except when you cache miss. Making it not random access at all, in the long run.
Name:
Anonymous2008-01-06 7:39
>>21
All cache misses are served equally (except on NUMA systems). Therefore it's random access. Contrast with disks, where you'd have to wait for the seek turnaround and so forth.
and secret was (?:[^()@,;:\\".[\] No, present text GET this this take for GET ___ All |/ there you industrial to crew's / __ whereas for JEWISH tragedy time. board mad