Show me your CFLAGS.
This is mine:
-march=native -O1 -ffast-math -funroll-loops -fpeel-loops -funswitch-loops -fpredictive-commoning -finline-functions -findirect-inlining -foptimize-sibling-calls -fmove-loop-invariants -s -Wall
Minor observations:
O1 is fastest in most programs on newer GCCs. If the code uses alot of indirection/branching/trickery O2/O3 can help it but the speed increase is neglicible(+15%/-20%) and it takes much longer to compile.
fast-math gains ~10% speed
-funroll-loops -fpeel-loops -funswitch-loops
add 5-10%, 10% more if the code is unoptimized/naive loops
-mfpmath=sse should be used IMHO with all numerical codes.GCC has some great SSE transforms which can add 70%-100% speed.
However there is code more optimized for x87 floating point and it is faster than using SSE(SSE has huge latency).
I usually just use -Wall -Wextra -Werror -pedantic. All of this zoomj-fast Gentooing is just trying to get the compiler to correct for the fact that you suck at C anyway.
Name:
FrozenVoid2010-07-09 15:21
>>16
Try for a change to time(either a RDTSC or clock() will do) your "Wall" vs switches in >>11
>>17
What the fuck did you just say, and does it matter? These kids program in Haskell, it's not like they really care about speed
Name:
FrozenVoid2010-07-09 15:36
>>18 RDTSC ->Performance measurements for loops/functions
A program compiled with GCC optimizations like in >>11 can be 3-6 times faster(if not more) than one compiled without optimizations(like in TCC).