I'm trying to write a program in C that spits out data that I can use to find the size of my L1 cache (theoretical experiment, I know I can just look it up). It isn't working, though.
Basically what I'm doing is I'm allocating arrays of incrementally larger sizes, filling them with random numbers, and then accessing each of the elements in the array, in order, 10,000 times in a row, and timing how long that takes. The concept is that once the array is larger than my L1 cache size, the average access time should suddenly jump higher. However, I'm not seeing this and am getting a constant increase in access time from array size 1K to 4MB. (my actual L1 cache size is 64K).
Anyone care to point out where I'm fucking up?
Name:
Anonymous2009-03-24 1:25
** obviously I'm not jumping from 1k to 4mb, I'm increasing the array size by a factor of 2 each time.
Name:
Anonymous2009-03-24 1:32
Perhaps the issue is that you're basically just cycling through 64 sections of memory. 64 cache misses may not be detectable. What happens if you access the elements out of order, causing a cache miss more often than on every 2^6 iteration?
Also I'm not an expert on this stuff so there could be some other effects like prediction or some shit doing this.
Name:
Anonymous2009-03-24 2:30
Your test depends on if the cache uses vanilla last recently used prediction, which may not be the case depending on the hardware. Also, there is usually an L2 and L3 cache, which are very fast as well. Do you see a jump when your array is bigger than all of the cache's(when it goes to memory)? Make sure your compiling w/o optimizations too, sometimes compilers will strip away code that does nothing. Could be something wrong with your code too.
Memory bandwidth usually is made just big enough to supply enough data for a simple sum-the-numbers-loop. When you're accessing your data in an orderly manner (like, sequentially, or with a constant stride), CPU is able to predict and prefetch it and utilize the whole bandwidth with no inefficiency.
So you should access the data pseudorandomly too. Use something like index = (index * 1782365 + 1232134) & mask.
Or use a loop that consumes bandwidth at a rate that only the L1 cache can supply. You'll need to use some variant of SSE for that though (needless to say, writing it in a high level language won't cut it).
Name:
Anonymous2009-03-24 7:55
>>8
It might turn out that L1 cache bandwidth is almost the same as memory bandwidth.
>>10
I just wanted to say, I thought the way you discreetely slipped 'aniki' in there was rather cool and hip. For distinctively marking yourself as part of your subculture without being too flashy and in your face about it, I love you almost as much as thallium and bunny slippers, >>10.
Name:
Anonymous2009-03-24 10:47
>>10
Yeah, sure, we know, x86 sucks, all modern software is bloated and sucks, not the least because it is written in languages that suck, bla bla bla, '78 LISP processor prototype was way cooler than this shit and had no fucking cache -- lrning2cache is hard, let's read SICP!
Too bad jews killed the project and suppressed all research in that direction ever since because it jeopardized the new consumerist world order.
>>12
No, PPC and ARM are examples of architectures that don't suck. The x86 was originally a nice little clay sculpture, and then they decided to make it "better" by mashing more clay onto it progressively. Nowadays it is a big ugly ball of clay.
Name:
Anonymous2009-03-24 11:31
>>13
But at least it is jews who keep PPCs and ARMs in various ghettoes, right?
Also it happens that PPC actually has cache too, therefore your passive-agressive remark about inferiority of x86 architecture is invalid. I mean, it's all very interesting with your childhood traumas and stuff, but clearly you've chosen an absolutely inappropriate occasion to relate it.
>>14
I never said anything about cache, Dr. Freud. Please eat my feces.
Name:
Anonymous2009-03-24 12:00
If you are not >>10 then just disregard that part. If you fail to distinguish your part in a discussion, you have to expend some mental effort to understand which parts of a discussion are related to you.
If, however, you are >>10 then plz point out where I said anything not about cache, that could provoke a justified remark about x86 architecture in general.
Name:
Anonymous2009-03-24 12:00
>>14
What the hell are you babbling about? x86 has a lot of tacked on crap to keep backwards compatibility, like register renaming.
>>23
the worst part about x86 is the variable sized instructions
register renaming? lol, come back when you've actually programmed x86 assembly for a living
Clay figurines are judged by their aesthetic appeal, if you insist on judging CPU architectures the same way then yes, x86 is ugly, has all that arcane unelegant cbw/cwd/wtf instructions, so '78 LISP machine prototype is undoubtedly prettier.
If however you are a sane person and use more realistic metrics, such as execution speed, transistor count, cost-effectiveness ultimately, then you have to provide some expert data that would show that x86 not only looks ugly because of the accumulated sedimentary layers of crap, but also is actually significantly worse in respect to this practical metrics because of it.
>>27
Do you also want a five-hundred page essay describing evidence that the archetypal human hand has five fingers?
Name:
Anonymous2009-03-24 12:44
>>28
And in the last page it would mention in one paragraph the phenomenon of poly-dactyl hands
Name:
Anonymous2009-03-24 13:18
>>28
Did you miss the "significantly" part? Yeah, it's obvious that backwards compatibility has a price, but I'd like to see at least an attempt at justification that the price is too high in the case of x86.
There is much inaccurate and half-true information in this thread. I advise /prog/ to stick with the lofty abstracted toy languages and let real men deal with hardware.