Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

Bask the glory of optimized C code

Name: FV waifu 2011-12-10 23:03

Name: Anonymous 2011-12-10 23:15

Implying FrozenVoid knows how to write good code that's portable

Name: Anonymous 2011-12-10 23:17

>>1
optimized

all you did was remove a horrid recursion function and do it iteratively and you removed the loop function and just put it in the main

Name: Anonymous 2011-12-10 23:20

http://www.reddit.com/r/ExpertProgrammers/

why am i not surprised that FrozenVoid made his own board on reddit called Expert Programmers.

Must be fun talking to himself with shitty code

Name: Anonymous 2011-12-10 23:20

changed 100000 to 10000000

with gcc -O2:
$ time ./naive
result:2190727593 cycles:0

real    0m4.897s
user    0m4.893s
sys     0m0.000s
$ time ./opt
result:8400511 cycles:0

real    0m4.158s
user    0m4.150s
sys     0m0.003s


with gcc -O3:
$ time ./naive
result:4197044044 cycles:0

real    0m3.956s
user    0m3.950s
sys     0m0.003s
$ time ./opt
result:8400511 cycles:0

real    0m2.294s
user    0m2.290s
sys     0m0.000s

Name: Anonymous 2011-12-10 23:22

recursion is slow

Name: Anonymous 2011-12-11 0:04

>>6

It isn't the recursion that is slowing it down (in theory). An optimizing compiler can convert any program routine using tail recursion to iteration. Tail calls to other functions could be tricky, you would need to get into the details of the calling conventions used on the target platform. But with tail recursion alone, you don't need to get there. Transforming the routine syntactically is sufficient:


int collatzLen(int c, long long n) {
          return (n==1 ? c : collatzLen(c+1, (n%2==0 ? n/2 : 3*n+1)));
}


-->>>


int collatzLen(int c, long long n) {
    if(n==1) {
        return c;
    } else {
        if(n%2==0) {
            return collatzLen(c+1, n/2);
        } else {
            return collatzLen(c+1, 3*n+1);
        }
    }
}


-->>>


int collatzLen(int c, long long n) {
start:
    if(n==1) {
        return c;
    } else {
        if(n%2==0) {
            int t1 = c+1;
            long long t2 = n/2;
            c = t1;
            n = t2;
            goto start;
        } else {
            int t1 = c+1;
            long long t2 = 3*n+1;
            c = t1;
            n = t2;
            goto start;
        }
    }
}


And you can factor out the common subexpression, with t1 and c+1:


int collatzLen(int c, long long n) {
start:
    if(n==1) {
        return c;
    } else {
        int t1 = c+1;
        c = t1;
        if(n%2==0) {
            long long t2 = n/2;
            n = t2;
            goto start;
        } else {
            long long t2 = 3*n+1;
            n = t2;
            goto start;
        }
    }
}


and because the new values of c and n are calculated independantly, both variables can be changed inplace. Temporay values would be needed if it was something like c = n + c; n = 5*c + 9*n;


int collatzLen(int c, long long n) {
start:
    if(n==1) {
        return c;
    } else {
        c = c+1;
        if(n%2==0) {
            n = n/2;
            goto start;
        } else {
            n = 3*n+1;
            goto start;
        }
    }
}


And then the compiler could do the bit wise operations:


int collatzLen(int c, long long n) {
start:
    if(n==1) {
        return c;
    } else {
        c = c+1;
        if(n&1) {
            n = 3*n+1;
            goto start;
        } else {
            n <<= 1;
            goto start;
        }
    }
}


and 3*n+1 = (1+2)*n+1 = n + 2*n + 1 = n + (n<<1) + 1 = and_with_carry(n, (n<<1)). A compentent compiler should be able to detect this. This optimization is always possible when you multiply by a constant.

Although, one thing in the naive version that will slow it down in comparison to the optimized version is that it calls the collatzLen function from main inside a loop. That will take more time. Because the function is called only once, there is no danger of inlining the function causing duplicate code everywhere to clog the instruction cache. So this could be inlined. I don't know if compilers auto inline though. If the compiler was unaware of how post linking was going to be done, and if the colletzLen function was not static, a non inlined version of the function would need to remain. That wouldn't stop the compiler from inlining when it was called in this file though.

Name: Anonymous 2011-12-11 0:27

>>7
(Ow shit codepad doesn't handle asm, sorry for this long post)

As you said, gcc -O2 seems to optimize naive.c with bit operations, inlining of collatzLen and tail-call elimination.
loop:
.LFB13:
    .cfi_startproc
    cmpq    $1, %rdi
    movl    $1, %eax
    movl    $1, %r9d
    movl    $2, %r8d
    jle    .L22
    .p2align 4,,10
    .p2align 3
.L18:
    movq    %r8, %rdx
    movl    $1, %ecx
    jmp    .L16
    .p2align 4,,10
    .p2align 3
.L24:
    movq    %rdx, %rsi
    addl    $1, %ecx
    shrq    $63, %rsi
    addq    %rsi, %rdx
    sarq    %rdx
    cmpq    $1, %rdx
    je    .L23
.L16:
    testb    $1, %dl
    je    .L24
    leaq    1(%rdx,%rdx,2), %rdx
    addl    $1, %ecx
    cmpq    $1, %rdx
    jne    .L16
.L23:
    cmpl    %r9d, %ecx
    jle    .L17
    movq    %r8, %rax
    movl    %ecx, %r9d
.L17:
    addq    $1, %r8
    cmpq    %r8, %rdi
    jge    .L18
    rep
    ret
.L22:
    rep
    ret
    .cfi_endproc

Name: Anonymous 2011-12-11 0:39

>>8

Yay! My faith in GCC has been restored!

Name: Anonymous 2011-12-11 0:41

>>9
How could you be even remotely surprised like this?

Name: hachi-go 2011-12-11 0:49

>>10
Until 4.2 (4.3 maybe) it seemed very broken.

Name: Anonymous 2011-12-11 0:51

>>10

because GCC has a reputation for being shit in comparison to enterprise proprietary c and seeples compilers written by a super secret team of expert programmers and ide designers that know more about expert compilation and class diagrams than silly non enterprise employees like ourselves could ever hope to know.

And also people can get lazy, and say, if it works, then it is good enough, and not bother with thorough optimization.

Name: Anonymous 2011-12-11 1:11

>>11-12
Right, but inlining and tail call elimination are on the short list of optimizations. It's a bit like gasping in awe when it converts divides/multiplies by constant powers of 2 to shifts.

Name: FV waifu 2011-12-11 2:17

On a Windows XP 32-bits @ VirtualBox @ linux 3.1(x86_64)
I've changed the sources a bit to work with RDTSC:
http://codepad.org/xr6li2hN
http://codepad.org/diJQW72g

$ "opt dmd-ff-gg-Jm-Nc-o.exe"
result:77031 cycles:124459394
$ "naive gcc-O2.exe"
result:77031 cycles:135778047
$ "naive gcc-O3.exe"
result:77031 cycles:206945482
$ "naive cl-Ox.exe"
result:77031 cycles:1322975052


I don't know why MSVC was an order of magnitude slower. But I don't care, sorry.

Name: Anonymous 2011-12-11 2:20

>>14
dmc 8.52c
gcc 4.6.1
MSVC 2008
Everything on a AMD Phenom II X3 720 @ 2.8GHz and 2x2GB 1333MHz 9-9-9-24 DDR3.

Name: Anonymous 2011-12-11 2:27

>>14
Wow, dmc seems almost bad as MSVC.

$ "naive dmc-ff-gg-Jm-Nc-o.exe"
result:77031 cycles:963305828

Name: Anonymous 2011-12-11 2:37

>>13

Yeah, but you know, it generated assembly that's about as good as I could probably do by hand. I would need to learn a lot more about INSERT_USED_ARCHITECTURE_HERE before I could improve it.

Name: F r o z e n V o i d !!mJCwdV5J0Xy2A21 2011-12-11 4:04

Just for the record, I was showcasing why recursion is bad, not the level of my programming skills.

Name: FV waifu 2011-12-11 4:32

>>18
No way, you're awesome. Even better than GCC.

Name: F r o z e n V o i d !!mJCwdV5J0Xy2A21 2011-12-11 4:51

GGC inserts megabytes of forced Sepples bloat into pure C libraries. see http://dis.4chan.org/read/prog/1321851666/8-

Name: Anonymous 2011-12-11 5:26

>>20
Not so much, no.

Name: Anonymous 2011-12-11 5:43

Cow beard dubs

Name: Anonymous 2011-12-11 13:34

naive

Name: F r o z e n V o i d !!mJCwdV5J0Xy2A21 2011-12-12 5:39

In theory this should produce faster code by removing the IF branching, though its harder to optimize(so compiler produces worse code):
#include <stdio.h>
    long long rdtsc(){__asm{RDTSC}}

void main() {
    long long b,a = rdtsc();
    int nlen=1,n=1,counter,i,m=100000,w;
    long long number;
    for (i=2; i<=m; ++i) {
        for (number=i,counter=1; number!=1; ++counter){
w=number&1;number=((w)+(number+(number<<(w))))>>((((w)^1)<<1));

        }
        if (counter > nlen){    nlen = counter;    n = i;}
    }
    b = rdtsc();
    printf("result:%i cycles:%lld\n",n,b-a);
}

Name: Anonymous 2011-12-12 5:53

IFs ARE BLOATED AND INEFFICIENT

Name: Anonymous 2011-12-12 6:32

Too bad that no real program will be maintainable when all your code is crammed in main().

Name: Anonymous 2011-12-12 7:27

>>24
KEKEKEKEKEKEKEKEKEKEKEKEKEKEKEKEKE

Name: Anonymous 2011-12-12 8:59

>>26
Expecting something maintainable and portable from FrozenVoid

Name: Anonymous 2011-12-12 9:50

Oh shit it really is frozenfriend at reddit...

What a horrible way to start a Monday.

Name: Anonymous 2011-12-12 10:11

>>28

laughingwhores.jpg

Name: Anonymous 2011-12-12 16:30

>>30

,,.,:,..,,,............,..7?II887$I?+...,.$ZZ7?===?+?I:?,=$=I,,,,..,,,.,....,..O
,...:,..,,,..,........,=I7$NOI=~DZNZ$$?+77I===~~77Z ?=:.+++I+7~,....,.....,.,.,?
,...,,,.,,............+$Z$Z++?+?=.$NDZI++Z?+?,~=~$?=.,...=$:$$Z+,..,,.,,......,?
.,,,,,,,,,............N8NI7I?$+=?=I.I=I7II7+:~~.++:,.,.::.~7+ZII?...,..,,.,,..,I
....,,,..,.,........~=:~??7=+:+:~:~+?$=~=??:,+= I=~,,:,.=.=???ZZ7?.,,..,,.,....?
....,,,..,.........~:,:~~??+=:?:...I=+I+?==,7+:.?I:.,,~=~~,?+=$I?7I.,...,,....,?
,,..,,,,.,........:=:::~:,,:,,... 7I$II$Z?=?=?+,.+?~..~=~,?=7+II+$O7,...,.,...,,
.,..,,.,.,,........~?=:~:,,,,,... +=$$$7?::,.:~:,:=+,,,$=~,?~+:??+7Z7,.,,,.,...,
.,.,,,,,.,:.......,~I~::,......   I7~I:~:~:..,=,,,::,+.=~+:~~:+~?+II$Z:,,..,....
.,...,.,,,~.......~?$~::,... .   .=+~=~+,,..,::,,,,,::?:+I==~~~7=??I$ZII,,.....,
.,...,..,,:.......+++=:.,7+,..,...~~?$,,.:=:=:~:+:::,:+++=$=+?==~?+77$$$$:.,..,,
.,,,.:....:.......=?++~:?I:=:.....:?,:7+?:=?::,=~~:~,,,:=~==++?~+??7I7?$7+:,,..:
.,...,..,,,.....,,~~:~,~.,.:,....,7~Z~?ZI7:,..,,:,~::,=::=:=+I+=??I+777I7$7$...,
.:..,,,.,.,......,,,::.....,:..,,.~$IO77::,,,,:,,:::~:,,=:+7=+?++III?I$$$77Z:..,
...,.,.,,.,.....,,..,:,,. .,:.,,..+?ZO..:::::::::::~~~=:.:~::,II=?$?77?I7Z$?I+D~
.,,,.,....,,..,.,,. .~..,,::,..~,I$$$.?,.:::~~::,,,~~,,=$?+,===?$777$III777I7$$=
..,..:...,,.....,.,. . .:+:$~,:7O,:+~7..:. .=78::,~,..:~=7Z$+~+=+7777?$I$?77$$ZZ
..,,,,.,,,,,....,, .. .Z..=:,OO.,,:~~:~.:I,. .:$:...,,~===?$?=?+~+?77$7$7$$I+$$Z
..,,.,,,.,..,..,.. ...8. ,:+.....:::,,?~.:.7,. ..D:...,+Z7$O7+=?7=$$IZ7Z7Z$$ZI?7
.....,,..,,,....,, .  . .:7~:,..,=,,,:~I=....=...  .,.,,OZO87+?IOI+7?I$?I77II?O,
.....,,...,,....7?.=. =. ~?::~~:,,..,=7+?O? ,.,:.,,.,,,:,788~=77?ZI??77?++77?,,,
..,,..,,,.,,..,.:,. .... .~,,.....,,+OO+?$$Z,I:~~,,,,:,::=O$7?+$?$Z=??+++III=,,.
..,,.,,:,,,......=:.?..:  .~,,...,~$ZO8Z+=?D8I~=~~:,,..,..DI??I=I?I+?7Z$?Z?+,,,,
.,,,,++=..,.,....,I~:~,,,.  ,Z7I7OZO$ZD$Z~,Z8ZDZ7$~=~:,....+=+?ZDD8$Z$$777~:,,,,
,.,==~,.,..,......$7:+~~,.   ,ZOZZZ$7ZD8?~,:O$$N7NNO$:,,...~====Z8ZZZ7I??~::,,,:
,..?~.,...:.......~OO+I=::.  .~ZZZOIIZ=7+=,,+?ZMD7DDN~~~,..,+===+$II~?=+~::,:,::
.,I+:,..=...Z...~D,,ON=?::,...:Z$77?I$+I+~,.~:+IN$DZO,~::,.:7=~===~:~+=+:::::,::
,.7+7,.=:..~...Z$$..DMZ=~~,....+++=II+?Z+:,.:~~I=ZZ$$7:~~:.,~7~=~~~~=~+=:::::::~
..I=::~,.:=.,~I+~~.:O7Z7=:,. .,==?III+77+:..:=,I=~~=+~?~=:,,:??=~~,==~+~:,::::~~
,~I==:~,:.,,7?~=?$,~7,=??:.. .,++~?=~=+?+,...,,=:~=~~?~~~~:.:=$~~,+==~=::,,,:~~~
,+?~~,=:,:,Z::~~+I7.~,,==~. ..,7~=::~==I?:...:.?=+~~~:===~~:,:+~~~=~==~:,,,::~~=
,7I=~,.?::=,,::+:~~I:,,?=:....~7=:~===??I~...:.~++~~+~=++=~:,,~+=~==?~:,,,:::~=+
,+I=~:::=:,,,,:==~=,,,,+~,...,=7I~=?==7+++.:::.+++~+=:?~++~::,:=$:=?+~:::,::~=++
,+Z+~:::,,,,..,=~==,,.7+:,..,,777?===~7=~+:~~,:~+~=~:=:I+==~:,,~??~~=~~:,:::==??
,?I+=~::,,,,,.=,:=+~:,+=:....,$I7?=~~:I~=~~~,=+=~~+=+~::=?+=~:::~+?I+~~:::~~=?77
,7$Z+~~::,,.,:.,:===~II:,...,,II7==~~:+~7~,,~.~~~+~+=~?==?++~~:,:~??==~::~~~+II$
,=7$?=~::,,,,..,,~++=+=,,..,,:=II=~~::~~:::~~~?+=+==+:+~~=++==~:::~=I==~~~~=I7$$
,~77$==~::,,::.,,~~~II:,..,,,~?I?~~~::::=.,~:++=+++==?~+==+++=~~:::~=?=:~==I77$O
,,7I7I+=~:,:,,,,,:~?+=,...,,:=$I?:::::~~:,,I$$I?++=+==~:+==I?+=I~:~~=?$~==?I$ZZO
,:?777I+~~:,:,,,::~I=~,.,,,,:=II+::,::::,.,~+N$I??==~=I:,III?++==~~==?Z=+?7$$OOZ
,,=II$7?=::::,,,,:7?~:..,,,,~?77+:,,:,~:,,,..NZI+I?+?=?~+==+??++==+++I?+?7ZODOZZ
.:=I+77?===:::,,,7?=:,.,,,,:~7ZI~,,:~~~:.,,.=ZD7?=?????~~=+II??++=++I?I7$O8DOZZZ
.,:=?$ZI++=~:,,,7++~,,,,,:::=7I:,::=~::~?.,=788II??++=+==I~+=I7+==++?77Z$OZZZZZZ
..:+I7ZZI+=~:~~$==~,,,,,::::=$?====~~:+:=,+=ZDD?II+?+?++=?+=+?+7+===+$$IOZZZ$Z$$
,,::??Z7$I+=:,7+=~:,:,,,:::~?I7+:I~~~~==,?=:$OOII=?=+=+~+?===?I??+++IZ7ZOZ77Z$I?
,,,~++$78$I=::+=~~:,:,::::~=I???=I=::~~=+?~78$7~I?==~++?I?=++~?I+?77Z$8OOZ7$$++$

Don't change these.
Name: Email:
Entire Thread Thread List