Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

AMD optimising against itself

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-07-07 8:32

From http://support.amd.com/us/Processor_TechDocs/47414_15h_sw_opt_guide.pdf p.126:
Avoid using the LOOP instruction.
The LOOP instruction has a latency of 7 cycles in 32-bit protected mode and 8 cycles in 64-bit
protected mode.

Agner's instruction tables lists Bulldozer as having a latency of only 1-2 cycles for this instruction, vs. 3 for K10, 3-4 for K8 and K7, 4 for Bobcat, and 4 for Nehalem, 5 for Sandy Bridge. The alternative fused ALU+jcc has exactly the same timing on Bulldozer, but only 2 on Nehalem and 1-2 for Sandy Bridge. Interestingly enough on VIA Nanos LOOP is also faster.

AMD's document then contradicts itself on p.246, where it lists LOOP as a FastPath Single with a latency of 1. In other words, if you want to make Bulldozer look faster than Intel's, you SHOULD use LOOP, and it's also a significant improvement over the previous generations of AMD. I have no idea why they didn't take this opportunity, despite likely having spent effort on that improvement. Then again, what'd you expect from the company that came up with AMD64...

Name: Anonymous 2013-07-07 9:33

Do you know the rationale for this or are you just bitching about how stupid you feel AMD to be?

Name: Anonymous 2013-07-07 10:49

>LOOP vs DEC/JNZ
>unoptimized bloated microcode vs pure 1uops code
>trusting the books for benchmarks

Name: Anonymous 2013-07-07 18:30

Cudder apologize about what you said about Lisp a few threads ago!!

Name: Anonymous 2013-07-07 19:40

As always, Cudder defends Intel and attack AMD, because Intel 1/2 Israeli and Cudder is a Jew.

Name: Anonymous 2013-07-07 19:41

Name: Anonymous 2013-07-07 21:28

This is on Nehalem

  mov ecx, 1000000
 tohere:
  loop tohere

3818322 cycles or around 3.8 cycles/iteration

  mov ecx, 1000000
 tohere:
  dec ecx
  jnz tohere

1909224 cycles or around 1.9 cycles/iteration

  mov ecx, 1000000
 tohere:
  sub ecx, 1
  jnz tohere

1909218 cycles or around 1.9 cycles/iteration

So just an empty LOOP is half the speed of DEC/JNZ or SUB/JNZ which matches the times in the OP. What about putting something in the loop like this


  mov ecx, 1000000
 tohere:
  mov eax, [var_x]
  xor eax, 12345678
  add eax, 87654321
  mov [var_x], eax
  loop tohere

Now LOOP is 4772868 cycles or 4.8 cycles/iteration, DEC/JNZ and SUB/JNZ 6681900 cycles or 6.7 cycles/iteration, about 40% longer! Whats going on here?

Anyone with a Bulldozer test this? According to the manual LOOP should be the same speed as DEC/JNZ or SUB/JNZ.

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-07-08 2:07

>>2
Rationale? If I were to guess, they're keeping it to themselves and seeing what Intel will do, because if they advise using LOOP then you can bet Intel's next generation is going to make LOOP go even faster. But clearly AMD has lost its ability to improve performance much (unlike in the Athlon days).

>>3
There's no reason why LOOP can't be decoded immediately into the corresponding dec/jnz uops (with the small but important difference that flags are NOT affected), and as I mentioned in the OP, there's evidence that Bulldozer does do so, from both AMD's official manual and Agner's independent tests.

>>5
This has nothing to do with race. I couldn't care less who -- or what -- designs these things.

>>7
You've just discovered that accessing memory is slow no matter what, and the reason loop is faster now might be contention-related.I don't have a FailBulldozer to test this though. (Maybe it's not quite a fail after all...?)

Name: Anonymous 2013-07-08 2:57

I prefer calling the instruction DJNZ because that was its name before the Heebs shamelessly ripped it from the Z80.

Name: Anonymous 2013-07-08 4:15

>>9
Muh Djinnz!

Name: Anonymous 2013-07-08 5:43

If dubs der kikerspace leaves fucking prog and fucks the FUCK OFF back to /a/ forever

Name: Anonymous 2013-07-08 13:49

Why do we care about this for programming? Are we writing compilers here, or what?

Name: Anonymous 2013-07-08 14:17

>>12
LE MFW U SHOULD LE GO BACK 2 LE FUCKING REBGIT
LE FACEBOOK xD

Name: Anonymous 2013-07-08 14:52

>>13
Va te faire enculer espèce de connard

Name: Anonymous 2013-07-10 2:52

Name: Anonymous 2013-07-10 2:54


Don't change these.
Name: Email:
Entire Thread Thread List