>>58
You must be kidding. "massively parallel" means that machine has billions of simple processing elements. Basically, every arithmetic operator in your program gets it's own core.
Wow, you sure are fucking
retarded. I said 512-1024 CPU cores
per compute node/rack. It just goes to show that you don't even
FUCKING understand the general topology of modern super computers. In essence, they are massive clusters of compute nodes networked together with things like ethernet or a faster custom inter-connect. Each compute node consists of multiple blades or racks, upon which are multiple CPUs in SMP configuration, and each CPU itself may have multiple cores. Sometimes there are even GPUs on the racks. Each compute node may consist of only 512-1024 CPU cores with shared memory using a custom inter-connect. But there are many hundreds or perhaps thousands of such nodes that comprise a single super computer, and thus tens or hundreds of thousands of cores total
!
>>61
>What prevents a garbage collected language to have a GC process per thread, and use a single global free list? Just the way all other per-thread heap languages do it.
Umm, no. That's not how you do it in C/C++. You do
not use a single global free list. Generally, you keep allocations AND deallocations inside a thread so there is no synchronization or write-sharing, which kills performance. Not possible with garbage collectors. Why? Because there may be cross-thread object dependencies if an object in one thread holds a reference to another object in another thread, and so you have to freeze the
entire process to analyze the dependency graph and make sure you collect only those objects which are truly garbage. In real-time garbage collectors, an incremental algorithm is employed to collect only as many objects as it can within a fixed time-slice, with strong constraints on time, and the GC is run more periodically dependent on memory pressure.
Per-thread GC heaps don't fix this, yeah you can allocate from per-thread heaps, but the dependency analysis during GC must be process wide. And so a per-thread GC process doesn't really make sense, although per-thread GC allocation heaps could work. You're still fucked.
Also, enjoy your poor cache locality. Cache-aware allocators are far superior, and garbage collected languages just don't impart enough information at time of object allocation in common languages to support this.