>>12
Netburst P4's have this "feature" due to the xbox-hueg long pipeline where one branch prediction miss costs you about twenty five cycles. That's on a Prescott, mind, it's like eighteen on Northwoods and Willamettes.
Unsurprisingly then the main speed boost from HyperThreading comes from being able to run the other thread while one suffers from a branch misprediction. Goodbye mainstream code performance.
NetBurst was pretty cool though if you first spent about a week coming up with an optimized routine to do something that you'd like. Kind of like an extremely well hidden DSP mode, only you need to do 128-bit wide integer SIMD stuff and no branching to get at it.