Hello /prog/riders,
I've noticed, while developing a seihou clone in C, that replacing angles (4-bytes float) with normalized vectors (2 4-bytes floats) gives a bit better performance, as I can access the sine and cosine values directly.
However, my ``bullet'' structure is already pretty big, and I wondered if the cache penalty would offset the benefits of avoiding transcental arithmetic in some conditions.
That's a problem you can worry about when you get to it. Unless you've profiled your program and have determined you have a performance problem, you don't have a performance problem.
Name:
Anonymous2011-06-09 7:47
IF YOU DON'T HAVE A PERFORMANCE PROBLEM,
WELL
YOU DON'T HAVE A PERFORMANCE PROBLEM.
>>2 >>3
I know, premature optimization is the root of all evil. But the changes are extremely localized and small and don't obscure the code at all.
Anyway, my question is theorical. What's the more costly between computing sine and cosine, and a cache miss?
From my own research, it would appear that it's better to use vectors unless more than about a half of the accesses miss the L2 cache, which is quite unlikely.
But the changes are extremely localized and small and don't obscure the code at all.
Precisely why you shouldn't worry about it now. If you later suspect it's a genuine problem and want to try the other approach, you can do it then without breaking anything.
If you really really really want to find out, code both versions and run them through measurement tools like cachegrind. Be aware that the results may vary from one chip architecture to the next.