All the added complexity is very expensive to maintain and test.
It's nothing compared to everything
else on the die. The 16-bit subset has been around for so long and is well-characterised to the point that they probably don't have to test that much.
HW development is also very different from SW: The emphasis isn't on testing, it's on verification --- proving that your design is correct. Give this a read:
http://www-wjp.cs.uni-saarland.de/publikationen/UD11.pdf
>>18
Curious fact: ARM as implemented on the various iDevices
does not have a hardware divide instruction. Even MIPS has one.