>>7 The "size is no concern" attitude doesn't help either.
In either case you should've tried with -Os, as if they'd somehow managed to do such an optimisation, I expect it to be only under -Os. I think C++ compilers should at least enable you to view the "intermediate" C output, so you can spot such inefficiencies quickly.