>>55
I think you're confusing the bitness of the instruction with that of the operands. I was referring to add instructions that are 32-bit, but which do not carry between bit 15 and 16, e.g.
add eax, 1 with eax=00000000 completes in 1 cycle
add eax, 1 with eax=0000FFFF takes 2 cycles
add eax, 65535 with eax=00000001 also takes 2 cycles
They probably thought "why wait for an add that's done already", and it would make sense for things like loop counters, but then again the single-byte inc r/dec r on the P4 seems to always act as if the carry was needed. Compared to the ones before and after, the P4 microarchitecture was just
weird.