Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

C programming tricks

Name: Anonymous 2007-09-15 22:06 ID:WCtjqT0N

While programming optimization that makes ones' code completely unreadable is often a bad thing, there are places for it, like that one inner loop of your code that takes up 98% of the program's running time.

What are you tricks for improving performance in C, other than the obvious inline assembly or the like?

Name: Anonymous 2007-09-16 10:09 ID:Heaven

>>38
shorter:
for(;*s=(random()%2?toupper:tolower)(*s++););

>>38 would be nicer if you could just use a literal array instead of having to make a pointer to an array of function pointers. but you can't do that in c.

Name: Anonymous 2007-09-16 10:35 ID:Heaven

>>40
low bits of values returned by rand() aren't very random so if you're going to use rand() instead you shouldn't use just the lowest bit.
& is uglier than %, and just as slow.
if you really want speed, rand()/(INT_MAX/2) would be faster and would help with the problem of the low bits not being very random.
improved version:
for(;*s=(rand()/(INT_MAX/2)?toupper:tolower)(*s++););

Name: Anonymous 2007-09-16 11:14 ID:Heaven

Um, & is in no way shape or form as slow as modulus.  Modulus puts quite a bit more work on the cpu, comparatively, I also don't see where you're getting your speed gains either.  

Name: Anonymous 2007-09-16 11:28 ID:Heaven

>>42
division is pretty slow, while & is as fast as it can get
you fail

Name: Anonymous 2007-09-16 11:36 ID:Rc6EEZiP

>>43-44
BENCHMARKS MOTHERFUCKERS, DO YOU USE THEM?

Name: Anonymous 2007-09-16 11:55 ID:H9kKB81D

>>44
protip: let the compiler do the work for you, -funroll-loops

Name: Anonymous 2007-09-16 15:23 ID:xXOet/pd

>>44
integer division is pretty damn fast.
rand()&1 is a lot less random than rand()/(INT_MAX/2).

>>43
$ cat test1.c
#include <stdio.h>
#include <stdlib.h>

int main(){
 for(int i=0;++i<1000000;printf("%d\n",rand()%2));
 return 0;
}
$ cat test2.c
#include <stdio.h>
#include <stdlib.h>

int main(){
 for(int i=0;++i<1000000;printf("%d\n",rand()&1));
 return 0;
}
$ cat test3.c
$ cat test3.c
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int main(){
 for(int i=0;++i<1000000;printf("%d\n",rand()/(INT_MAX/2)));
 return 0;
}
$ gcc -std=c99 test1.c -o test1
$ gcc -std=c99 test2.c -o test2
$ gcc -O2 -std=c99 test3.c -o test3
$ time -h ./test1 > /dev/null
        0.73s real              0.63s user              0.00s sys
$ time -h ./test2 > /dev/null
        0.72s real              0.64s user              0.00s sys
$ time -h ./test3 > /dev/null
        0.72s real              0.63s user              0.00s sys

Name: Anonymous 2007-09-16 15:26 ID:Heaven

>>47
summary:
rand()%2 1000000 times: 1.36s
rand()&1 1000000 times: 1.36s
rand()/(INT_MAX/2) 1000000 times: 1.35s

Name: Anonymous 2007-09-16 15:54 ID:CAdtAOry

>>47
Nice benchmark for the printf function.

Name: Anonymous 2007-09-16 16:03 ID:Heaven

>>49
ok let's see you write a benchmark that shows how much slower %2 is than &1

Name: Anonymous 2007-09-16 16:04 ID:SDT0AFSf

fucking just download the ebook called "deep C secrets" i'm sure it'll help u in whatever C related programming course (besides the introductory bullshit) you're doing.

Name: Anonymous 2007-09-16 16:15 ID:SDT0AFSf

I think the codecomments forums have an optimization thread thats like 10 pages long at least - check that shit out as well.

Name: Anonymous 2007-09-16 16:19 ID:K21wDPMU

>>28
Oh... business. gb2/managing

Name: Anonymous 2007-09-16 17:08 ID:Krd3dptm

>>50
I'm not >>49, but I agree with him, although the results come out the same. printf() is extraordinarily slow, pushing any interesting results into the noise. Same can be said for rand(), although it's not as bad as printf(). Using time from the shell includes the startup time, so it reduces accuracy.

Here's the results I get with what I wrote:

C:\Devel\src>gcc t.c -std=c99 -o t.exe

C:\Devel\src>t
& 1: 15515 ticks
% 2: 25168 ticks

C:\Devel\src>gcc t.c -std=c99 -o t.exe -O2

C:\Devel\src>t
& 1: 2658 ticks
% 2: 2649 ticks


And here's the code. Note that it's still far from ideal, but at least the interesting bits are completely vanishing into the margin of error:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <limits.h>

#define ITER 10

int main(void)
{
  int tmp;     // prevent certain optimizers from eliminating the loops
  int stub[2];
  clock_t start_time, avg_time;

  // timing for & 1
  start_time = clock();
  for(int i=0; i < ITER; i++) {
    for(int j=0; j < INT_MAX; j++) {
      tmp = stub[j & 1];    // cheaper than zOMG rand()
    }
  }
  avg_time = (clock() - start_time) / ITER;

  printf("& 1: %i ticks\n", avg_time);
 
  // timing for % 2
  start_time = clock();
  for(int i=0; i < ITER; i++) {
    for(int j=0; j < INT_MAX; j++) {
      tmp = stub[j % 2];
    }
  }
  avg_time = (clock() - start_time) / ITER;

  printf("%% 2: %i ticks\n", avg_time);

  return 0;
}


So, it's clear the result is the same after optimization. Modulus is slower, but the optimizer changes %2 to & 1.

Name: Anonymous 2007-09-16 17:09 ID:Heaven

s/are completely/are not completely/

Name: Anonymous 2007-09-16 17:28 ID:CAdtAOry

>>54
That's better. Now also calculate the standard deviation for those 10 runs and use the t-test to determine if they are significantly different.

Name: Anonymous 2007-09-16 17:29 ID:9ndnYabT

>>54
C:\Devel\src>gcc t.c -std=c99 -o t.exe
C:\Devel\src>gcc
C:[b]Devel[/b]\
Devel

Name: Anonymous 2007-09-16 17:32 ID:C1D+ypEB

>>47

Integer division is 20-40 clocks on most CPUs, and up to 80 on the Pentium 4... I don't feel like getting out my technical manuals for the exact numbers.

The reason its "fast" is because when you divide by a constant value, the compiler tends to optimize the division into magic number multplication/bitshifting, to avoid an actual IDIV.

Name: Anonymous 2007-09-16 17:33 ID:Krd3dptm

Devel
eve
V

I put on my cape and smiley mask.

Name: Anonymous 2007-09-16 18:40 ID:qb4uIYsD

1. Use fprintf ("fast printf") instead of printf.
2. ++i is faster than both i++ and i = i + 1.
3. void main(void) is faster than int main(void) or int main(int, char **) since no value needs to be returned to the OS.
4. Swapping with exclusive-or (a^=b^=a^=b swaps a and b) is faster than using a temporary. This works for all types (including structures), but not on all compilers. Some compilers may also give you a harmless warning.
5. Static storage duration objects are faster than automatic storage duration objects because the CPU doesn't have to set aside storage on the stack every time a function is called. Make your loop indexes global so that you can use them everywhere:
int i;
void func(void) { for (i = 0; i < 10; i++) ; /* ... */ }
void func2(void) { for (i = 0; i < 20; i++) ; /* ... */ }
/* ... */

6. Compilers often give more memory to arrays than you asked for. Here's how to check how big an array actually is (memset returns a null pointer if the size you passed to it is bigger than the size of the array you passed to it):
int arr[256];
size_t realsize;
for (realsize = 0; realsize <= SIZE_MAX; ++realsize)
        if (!memset(arr, 0, realsize)) break;
/* now you know that arr actually has realsize / sizeof (int) elements */

If you combine this with #5, your program will be faster in the long run (but this usually doesn't work for short programs).

Name: Anonymous 2007-09-16 18:55 ID:Sy2QIfmT

>>60
EXPERT TROLL

Name: Anonymous 2007-09-16 18:55 ID:H9kKB81D

>>60
I fuckken LOLd my ass off @ 1.

Name: Anonymous 2007-09-16 19:10 ID:Heaven

>>60
Now this is a quality troll.

Name: Anonymous 2007-09-16 19:35 ID:xXOet/pd

>>60 is too obviously a troll.
>>42 is much better.

Name: Anonymous 2007-09-16 19:39 ID:C1D+ypEB

>>60

LOL.  That was amusing.

Name: Anonymous 2007-09-16 21:30 ID:M+Nq55G4

>>60 is EXPERT QUALITY, except one thing:
2. ++i is faster than both i++ and i = i + 1.
This is actually quite true sometimes. In certain circumstances, a loop in GCC using while (++i <= j) will produce incl, cmpl, and jle instructions; whereas while (i++ < j) will result in movl, incl, cmpl, and jl. In this case, preincrementing shaves one full CPU instruction per loop iteration.

Name: Anonymous 2007-09-16 21:57 ID:Heaven

>>66
-O3

Name: Anonymous 2007-09-16 22:11 ID:0cfV6Q0I

>>60

1. Use fprintf ("fast printf") instead of printf.

No. Here's an EXPERT PROGRAMMER QUALITY optimization :
fwrite("string",sizeof(char),sizeof("string"),stdout);

Name: Anonymous 2007-09-16 22:22 ID:Krd3dptm

>>66
Anybody who writes a while loop like that deserves to get beaten with a bat though.

Same with >>47. Ugh, what kind of for loops are those?

Name: Anonymous 2007-09-16 22:23 ID:0cfV6Q0I

ALSO,

4. Swapping with exclusive-or (a^=b^=a^=b swaps a and b) is faster than using a temporary. This works for all types (including structures), but not on all compilers. Some compilers may also give you a harmless warning.


http://en.wikipedia.org/wiki/Xor_swap#The_XCHG_instruction

Name: Anonymous 2007-09-16 22:59 ID:xXOet/pd

Ugh, what kind of for loops are those?
simple ones.

for(int c;c;printf(++c?" %02x ":"\n",c=getchar()));

for(char*r=s;*s;*r=!s++)if(*s!=c)*r++=*s;

Name: Anonymous 2007-09-16 23:18 ID:I0ZdD5JI

gets(buf);

Name: Anonymous 2007-09-16 23:58 ID:Krd3dptm

>>71
Absolutely disgusting.

Name: Anonymous 2007-09-17 4:40 ID:T7H3/2/F

>>71
that is not simple, a simple loop is
for(i = 0; i < range; i++)

or

for(;;)

Name: Anonymous 2007-09-17 4:43 ID:gf4kBNCr

PROTIP: BREAK IS CONSIDERED HARMFUL FOR NON-EXCEPTIONAL CIRCUMSTANCES

Name: Anonymous 2007-09-17 5:43 ID:p/yux4n8

>>75
Protip: no, it saves nested IF blocks and repeated while/for conditions.

Name: Anonymous 2007-09-17 5:45 ID:Heaven

>>75
you fail, i've heard that a lot from college/uni teachers/profs, what's with that? Fuck them. fucking noobs.

Name: Anonymous 2007-09-17 6:12 ID:Qi/+7bcj

>>75 yea use goto instead

Name: Anonymous 2007-09-17 7:47 ID:gf4kBNCr

>>76
What is the use of a loop that doesn't check when to finish?
while(!finished)
  finished = true;

for(;;)
  if(cond)
   break;

I've seen cases where usage of break is more elegant than using vars and while. Such cases are exceptional.

>>77
Guess where the theory behind computer science comes from? Oh, that's right, it doesn't come from the people that formalized cs theory, it comes from EXPERT PROGRAMMERS

>>78
I love GOTO roughly like a noodly spaghetti code.

Name: Anonymous 2007-09-17 8:35 ID:2gLnFHF5

>>68
Uh, sizeof("String") is four, or however big pointers are on your machine.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List