OP here.
>>11
Thanks. That makes a lot more sense now.
Do you think it would improve performance to do ray-tracing in a sort of breadth-first manner? You could implement an algorithm that did each level of recursive firing in succession (Ray cast every pixel, then go back and compute reflected rays, etc), making the memory in the cache relevant more frequently (if I understand what you're saying that is. Thanks for the recommended reading. It's next on my list.)
Also, even it if isn't a very elegant solution, supposing you had n identical machines, each having its own copy of the scene in memory, wouldn't you perform at least n times faster, so long as each machine worked on different portions of the image?