Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

cache size

Name: Anonymous 2009-03-24 1:24

I'm trying to write a program in C that spits out data that I can use to find the size of my L1 cache (theoretical experiment, I know I can just look it up). It isn't working, though.

Basically what I'm doing is I'm allocating arrays of incrementally larger sizes, filling them with random numbers, and then accessing each of the elements in the array, in order, 10,000 times in a row, and timing how long that takes. The concept is that once the array is larger than my L1 cache size, the average access time should suddenly jump higher. However, I'm not seeing this and am getting a constant increase in access time from array size 1K to 4MB. (my actual L1 cache size is 64K).

Anyone care to point out where I'm fucking up?

Name: Anonymous 2009-03-24 1:25

** obviously I'm not jumping from 1k to 4mb, I'm increasing the array size by a factor of 2 each time.

Name: Anonymous 2009-03-24 1:32

Perhaps the issue is that you're basically just cycling through 64 sections of memory. 64 cache misses may not be detectable. What happens if you access the elements out of order, causing a cache miss more often than on every 2^6 iteration?

Also I'm not an expert on this stuff so there could be some other effects like prediction or some shit doing this.

Name: Anonymous 2009-03-24 2:30

Your test depends on if the cache uses vanilla last recently used prediction, which may not be the case depending on the hardware.  Also, there is usually an L2 and L3 cache, which are very fast as well.  Do you see a jump when your array is bigger than all of the cache's(when it goes to memory)?  Make sure your compiling w/o optimizations too, sometimes compilers will strip away code that does nothing.  Could be something wrong with your code too.

Name: Anonymous 2009-03-24 3:04

>>1
Cache doesn't work that way. In any case, either look it up (fast), or use the CPUID instructions (slow if you don't know ASM)

Name: Facepalm 2009-03-24 3:39

CPU-Z

Name: Anonymous 2009-03-24 7:26

>>1
One word for you: prefetch.

Memory bandwidth usually is made just big enough to supply enough data for a simple sum-the-numbers-loop. When you're accessing your data in an orderly manner (like, sequentially, or with a constant stride), CPU is able to predict and prefetch it and utilize the whole bandwidth with no inefficiency.

So you should access the data pseudorandomly too. Use something like index = (index * 1782365 + 1232134) & mask.

Name: Anonymous 2009-03-24 7:49

Or use a loop that consumes bandwidth at a rate that only the L1 cache can supply. You'll need to use some variant of SSE for that though (needless to say, writing it in a high level language won't cut it).

Name: Anonymous 2009-03-24 7:55

>>8
It might turn out that L1 cache bandwidth is almost the same as memory bandwidth.

Name: Anonymous 2009-03-24 8:38

>>9
Nice hardware architecture, aniki. Let's assume something not as broken, like all x86 CPUs released in the last 10+ years.

Name: Anonymous 2009-03-24 9:01

>>10
I just wanted to say, I thought the way you discreetely slipped 'aniki' in there was rather cool and hip. For distinctively marking yourself as part of your subculture without being too flashy and in your face about it, I love you almost as much as thallium and bunny slippers, >>10.

Name: Anonymous 2009-03-24 10:47

>>10
Yeah, sure, we know, x86 sucks, all modern software is bloated and sucks, not the least because it is written in languages that suck, bla bla bla, '78 LISP processor prototype was way cooler than this shit and had no fucking cache -- lrning2cache is hard, let's read SICP!

Too bad jews killed the project and suppressed all research in that direction ever since because it jeopardized the new consumerist world order.

Name: Anonymous 2009-03-24 10:58

>>12
No, PPC and ARM are examples of architectures that don't suck. The x86 was originally a nice little clay sculpture, and then they decided to make it "better" by mashing more clay onto it progressively. Nowadays it is a big ugly ball of clay.

Name: Anonymous 2009-03-24 11:31

>>13
But at least it is jews who keep PPCs and ARMs in various ghettoes, right?

Also it happens that PPC actually has cache too, therefore your passive-agressive remark about inferiority of x86 architecture is invalid. I mean, it's all very interesting with your childhood traumas and stuff, but clearly you've chosen an absolutely inappropriate occasion to relate it.

Name: Anonymous 2009-03-24 11:46

CPUID MOTHER FUCKER

Name: Anonymous 2009-03-24 11:47

>>14
I never said anything about cache, Dr. Freud. Please eat my feces.

Name: Anonymous 2009-03-24 12:00

If you are not >>10 then just disregard that part. If you fail to distinguish your part in a discussion, you have to expend some mental effort to understand which parts of a discussion are related to you.

If, however, you are >>10 then plz point out where I said anything not about cache, that could provoke a justified remark about x86 architecture in general.

Name: Anonymous 2009-03-24 12:00

>>14
What the hell are you babbling about? x86 has a lot of tacked on crap to keep backwards compatibility, like register renaming.

Name: Anonymous 2009-03-24 12:02

>>17
`Plz' fuck off.

Name: Anonymous 2009-03-24 12:03

>>18
So?

Name: Anonymous 2009-03-24 12:08

>>20
So the lumpy ball of clay analogy was valid.

Name: Anonymous 2009-03-24 12:09

>>18
yeah like that really matters

Name: Anonymous 2009-03-24 12:12

>>22
EXPERT DEBATER. back to /b/, please.

Name: Anonymous 2009-03-24 12:29

>>23
the worst part about x86 is the variable sized instructions
register renaming? lol, come back when you've actually programmed x86 assembly for a living

Name: Anonymous 2009-03-24 12:32

>>24
Is your inability to follow a simple discussion something pathological or are you just an idiot?

Name: Anonymous 2009-03-24 12:33

>>24
Actually I have, but that was way back in the pre-Pentium days. Coding for DSP chips was much simpler.

Name: Anonymous 2009-03-24 12:34

>>21
That analogy is flawed.

Clay figurines are judged by their aesthetic appeal, if you insist on judging CPU architectures the same way then yes, x86 is ugly, has all that arcane unelegant cbw/cwd/wtf instructions, so '78 LISP machine prototype is undoubtedly prettier.

If however you are a sane person and use more realistic metrics, such as execution speed, transistor count, cost-effectiveness ultimately, then you have to provide some expert data that would show that x86 not only looks ugly because of the accumulated sedimentary layers of crap, but also is actually significantly worse in respect to this practical metrics because of it.

Name: Anonymous 2009-03-24 12:39

>>27
Do you also want a five-hundred page essay describing evidence that the archetypal human hand has five fingers?

Name: Anonymous 2009-03-24 12:44

>>28
And in the last page it would mention in one paragraph the phenomenon of poly-dactyl hands

Name: Anonymous 2009-03-24 13:18

>>28
Did you miss the "significantly" part? Yeah, it's obvious that backwards compatibility has a price, but I'd like to see at least an attempt at justification that the price is too high in the case of x86.

Name: Anonymous 2009-03-24 14:10

>>2-30
YHBT

Name: Anonymous 2009-03-24 14:17

>>31
LOLO I TROL U

Name: Anonymous 2009-03-24 14:56

There is much inaccurate and half-true information in this thread.  I advise /prog/ to stick with the lofty abstracted toy languages and let real men deal with hardware.

Name: Anonymous 2009-03-24 15:19

>>33
I advise you to get the fuck out of /prog/.

Name: Anonymous 2009-03-24 15:50

>>34
As your lawyer, I advise you to take two hits of LSD and a shot of tequila.

Name: Anonymous 2009-03-24 15:54

>>33
I agree with this guy

Name: Anonymous 2009-03-24 15:57

>>35
I agree with this guy

Name: Anonymous 2009-03-24 16:19

>>35! I LOVE YOU! I LOVE YOUR POST! I READ IT 5 TIMES! KEEP POSTING!

Name: ​​​​​​​​​​ 2010-09-09 22:15

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List