Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Fucking MingW

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2012-11-04 9:11

#include <stdio.h>

int main() {
 printf("Hello world!\n");
 return 0;
}


Default MingW compilation+link size: 47KB
Best MingW compilation+link size: 8KB

Default MSVC compilation+link size: 40KB
Best MSVC compilation+link size: 1KB

After postprocessing:
MingW (using MS's linker and libs): 1594 bytes
MSVC: 624 bytes

What the fuck? Am I missing something here?

MingW optimised command line (compile only):
gcc -nostdlib -Os -c -s -o hello.obj hello.c -Wl,--gc-sections,--section-alignment,4096,--file-alignment,512

MingW link command line:
link hello.obj msvcrtlib mainstub.obj /align:4096 /filealign:512 /entry:main /merge:.rdata=.text /merge:.eh_fram=.text /merge:.text.st=.text /section:.text,EWR /stub:mzstub64.exe

mainstub.obj is a dummy __main because libming32.a which is supposed to contain it also contains __CTOR_LIST_ and some other C++ shit. I'm compiling a C program, with gcc, and they force you to link with a bunch of C++ shit? Are you kidding me?

(Why won't it merge the bloody .eh_fram and .text.st sections?!?! Maybe this is a bug of MS's linker since it merges fine with its own compiler output, but the compiler shouldn't be generating .eh_fram and .text.st anyway!)

Executables for your inspecting:
MingW: http://pastebin.com/vZn5WtMz
MSVC: http://pastebin.com/AV63Hr5x

Therefore, I challenge anyone to come up with a smaller Hello World using MingW, and post the commands you used to do it.

Name: Anonymous 2012-11-07 1:42

>>63
push is not gonna be faster than mov because it does put down a dependency on the esp register, meaning it CAN'T execute the next push in parallel in any order it wants.
This is also why the xor reg, reg to zero out a register is not really the way to go anymore, a mov dword 0 is better because the xor will generate a dependency (i.e a strict ordering) on the next instruction using that register.
Read some of the latest optimization manuals from Intel regarding newer cpus.

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2012-11-07 2:21

>>64
How about you read it?

"2.2.2.5 Stack Pointer Tracker

The Intel 64 and IA-32 architectures have several commonly used instructions for
parameter passing and procedure entry and exit: PUSH, POP, CALL, LEAVE and RET.
These instructions implicitly update the stack pointer register (RSP), maintaining a
combined control and parameter stack without software intervention. These instructions
are typically implemented by several μops in previous microarchitectures.

The Stack Pointer Tracker moves all these implicit RSP updates to logic contained in
the decoders themselves. The feature provides the following benefits:
• Improves decode bandwidth, as PUSH, POP and RET are single μop instructions
in Intel Core microarchitecture.
• Conserves execution bandwidth as the RSP updates do not compete for execution
resources.
• Improves parallelism in the out of order execution engine as the implicit serial
dependencies between μops are removed.
• Improves power efficiency as the RSP updates are carried out on small, dedicated
hardware.
"
...
"
• ESP folding — This eliminates the ESP manipulation μops in stack-related
instructions such as PUSH, POP, CALL and RET. It increases decode rename and
retirement throughput. ESP folding also increases execution bandwidth by
eliminating µops which would have required execution resources.
"

Also, Intel's own compiler uses push/pop. That should be enough evidence.

This is also why the xor reg, reg to zero out a register is not really the way to go anymore, a mov dword 0 is better because the xor will generate a dependency (i.e a strict ordering) on the next instruction using that register.

"Dependency Breaking Idioms

Instruction parallelism can be improved by using common instructions to clear
register contents to zero. The renamer can detect them on the zero evaluation of the
destination register.
Use one of these dependency breaking idioms to clear a register when possible.
• XOR REG,REG
• SUB REG,REG
...
"

I've read a lot of C compiler output in my life, and gcc's is the only one that "sticks out like a sore ARM".

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List