Developer time is the most valuable resource. Writing in Python may cause slower code, but it is in reality much more efficient because you will get an exponential amount of more work done!
>>32
Fun facts: P4 processors did not have a barrel shifter so their shifts and rotates were slow, and even the ALU was pipelined so that operations <= 16 bits of data were around twice as fast as those needing 32 bits. The ALU ran at double the clock of the rest of the logic, and could beat Core/Core2 in straight-line execution in very very specific pieces of code(basically integer adds only). Very sensitive to instruction and data alignment, not unlike most RISCs.