>>16
Great, now you're raping the cache by loading various parts of that array into the other cores. You're wasting time just to setup the thread pool. Unless this task is a choke point for your entire application, wouldn't it be simply more efficient to schedule *other* things on the other cores?
In other words, I don't get why it's a better idea to have multiple cores caress a single piece of data at a time rather than having each core caress its own piece of data.