While programming optimization that makes ones' code completely unreadable is often a bad thing, there are places for it, like that one inner loop of your code that takes up 98% of the program's running time.
What are you tricks for improving performance in C, other than the obvious inline assembly or the like?
The problem with this is that faster hardware gives everyone an edge. When you're competing against other video encoders from other companies and organizations, what matters is your speed and/or quality edge over theirs. Since faster hardware speeds up everyone, it doesn't help one compete :)
Let the compiler to the good job. It's bad enough already you're doing C; no need to make your life more miserable when there are several awesome OMG OPTIMIZED C compilers. Just compile with -momg-optimized.
Name:
Anonymous2007-09-16 5:15 ID:6UUG73P+
My trick is to express my intent to the C compiler in such a way that communicates my intent to any humans that might be reading (including myself in two weeks' time). Compilers are pretty fuckin' good these days, there's no need to do strength reduction by hand since you can always compile with at least -O1 on today's hardware...
But anyway, yeah, if you want to know whether an integer is odd, use (x % 2) == 1 rather than the implicit boolean equivalent. It's more readable and the compiler produces an equivalent sequence of instructions anyway.
>>31
FUCK YOU FUCKING FAG
XOR SWAP TWO SIGNED INTEGERS AND TELL ME WHAT HAPPENDS YOU FUCKING PIECE OF SHIT
UNDEFINED BEHAVIOR TOO
I KNOW ALL OF THIS
I LIVE INSIDE STANDARDS
Name:
Anonymous2007-09-16 5:35 ID:S8KoCl51
apply the ``const'' qualifier where appropriate -- the compiler will moan in satisfaction when it is able to engage more optimizations as it spurts out code
Name:
Anonymous2007-09-16 5:45 ID:M+Nq55G4
>>32 THIS IS /prog/ YOUR STANDARDS DO NOT APPLY HERE
Name:
Anonymous2007-09-16 6:01 ID:Sy2QIfmT
>>33
Yes, passing const references to objects really pleases the compiler like Lady Godiva.
Also, if you ever need to initialize an array to all zeros, use memset(array, 0, sizeof(array));
Some compilers can detect this call and replace it with a single rep stosd instruction (given array length which lies on 4 byte boundary)
The above has no sequence points, they get modified multiple times, the result of which is undefined. Variables cannot be modified more than once between sequence points according to the C standard.
Relevant C standard text: J.2 Undefined Behavior Between two sequence points, an object is modified more than once, or is modified and the prior value is read other than to determine the value to be stored (6.5).
So apparently you don't live inside standards. Just because it works on a few compilers doesn't mean it works on them all.
>>34
This is about C, which is 100% defined by the standard. There are no other authoritative sources. Except maybe.. "Learning C for DUMMIES!"
>>38 would be nicer if you could just use a literal array instead of having to make a pointer to an array of function pointers. but you can't do that in c.
>>40
low bits of values returned by rand() aren't very random so if you're going to use rand() instead you shouldn't use just the lowest bit.
& is uglier than %, and just as slow.
if you really want speed, rand()/(INT_MAX/2) would be faster and would help with the problem of the low bits not being very random.
improved version: for(;*s=(rand()/(INT_MAX/2)?toupper:tolower)(*s++););
Um, & is in no way shape or form as slow as modulus. Modulus puts quite a bit more work on the cpu, comparatively, I also don't see where you're getting your speed gains either.
>>49
ok let's see you write a benchmark that shows how much slower %2 is than &1
Name:
Anonymous2007-09-16 16:04 ID:SDT0AFSf
fucking just download the ebook called "deep C secrets" i'm sure it'll help u in whatever C related programming course (besides the introductory bullshit) you're doing.
Name:
Anonymous2007-09-16 16:15 ID:SDT0AFSf
I think the codecomments forums have an optimization thread thats like 10 pages long at least - check that shit out as well.
>>50
I'm not >>49, but I agree with him, although the results come out the same. printf() is extraordinarily slow, pushing any interesting results into the noise. Same can be said for rand(), although it's not as bad as printf(). Using time from the shell includes the startup time, so it reduces accuracy.
Here's the results I get with what I wrote:
C:\Devel\src>gcc t.c -std=c99 -o t.exe
C:\Devel\src>t
& 1: 15515 ticks
% 2: 25168 ticks
C:\Devel\src>gcc t.c -std=c99 -o t.exe -O2
C:\Devel\src>t
& 1: 2658 ticks
% 2: 2649 ticks
And here's the code. Note that it's still far from ideal, but at least the interesting bits are completely vanishing into the margin of error:
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <limits.h>
#define ITER 10
int main(void)
{
int tmp; // prevent certain optimizers from eliminating the loops
int stub[2];
clock_t start_time, avg_time;
Integer division is 20-40 clocks on most CPUs, and up to 80 on the Pentium 4... I don't feel like getting out my technical manuals for the exact numbers.
The reason its "fast" is because when you divide by a constant value, the compiler tends to optimize the division into magic number multplication/bitshifting, to avoid an actual IDIV.
Name:
Anonymous2007-09-16 17:33 ID:Krd3dptm
Devel
eve V
I put on my cape and smiley mask.
Name:
Anonymous2007-09-16 18:40 ID:qb4uIYsD
1. Use fprintf ("fast printf") instead of printf.
2. ++i is faster than both i++ and i = i + 1.
3. void main(void) is faster than int main(void) or int main(int, char **) since no value needs to be returned to the OS.
4. Swapping with exclusive-or (a^=b^=a^=b swaps a and b) is faster than using a temporary. This works for all types (including structures), but not on all compilers. Some compilers may also give you a harmless warning.
5. Static storage duration objects are faster than automatic storage duration objects because the CPU doesn't have to set aside storage on the stack every time a function is called. Make your loop indexes global so that you can use them everywhere: int i;
void func(void) { for (i = 0; i < 10; i++) ; /* ... */ }
void func2(void) { for (i = 0; i < 20; i++) ; /* ... */ }
/* ... */
6. Compilers often give more memory to arrays than you asked for. Here's how to check how big an array actually is (memset returns a null pointer if the size you passed to it is bigger than the size of the array you passed to it): int arr[256];
size_t realsize;
for (realsize = 0; realsize <= SIZE_MAX; ++realsize)
if (!memset(arr, 0, realsize)) break;
/* now you know that arr actually has realsize / sizeof (int) elements */
If you combine this with #5, your program will be faster in the long run (but this usually doesn't work for short programs).
>>60 is EXPERT QUALITY, except one thing: 2. ++i is faster than both i++ and i = i + 1.
This is actually quite true sometimes. In certain circumstances, a loop in GCC using while (++i <= j) will produce incl, cmpl, and jle instructions; whereas while (i++ < j) will result in movl, incl, cmpl, and jl. In this case, preincrementing shaves one full CPU instruction per loop iteration.
>>60 1. Use fprintf ("fast printf") instead of printf.
No. Here's an EXPERT PROGRAMMER QUALITY optimization :
fwrite("string",sizeof(char),sizeof("string"),stdout);
Name:
Anonymous2007-09-16 22:22 ID:Krd3dptm
>>66
Anybody who writes a while loop like that deserves to get beaten with a bat though.
Same with >>47. Ugh, what kind of for loops are those?
Name:
Anonymous2007-09-16 22:23 ID:0cfV6Q0I
ALSO, 4. Swapping with exclusive-or (a^=b^=a^=b swaps a and b) is faster than using a temporary. This works for all types (including structures), but not on all compilers. Some compilers may also give you a harmless warning.
>>76
What is the use of a loop that doesn't check when to finish?
while(!finished)
finished = true;
for(;;)
if(cond)
break;
I've seen cases where usage of break is more elegant than using vars and while. Such cases are exceptional.
>>77
Guess where the theory behind computer science comes from? Oh, that's right, it doesn't come from the people that formalized cs theory, it comes from EXPERT PROGRAMMERS
>>78
I love GOTO roughly like a noodly spaghetti code.
Name:
Anonymous2007-09-17 8:35 ID:2gLnFHF5
>>68
Uh, sizeof("String") is four, or however big pointers are on your machine.
Name:
Anonymous2007-09-17 8:42 ID:p/yux4n8
>>79
Moar like: while (condition1) {
...code...
if (OMG_DISASTROUS_CONDITION) {
break;
}
...code...
}
simpler, easier to read and less error-prone (due to no condition repeated) than:
while (condition1 && !OMG_DISASTROUS_CONDITION) {
...code...
if (!OMG_DISASTROUS_CONDITION) {
...code that's actually at the same conceputal level than the other block but appears indented...
}
} //}} is alright, but nicer if it can be avoided
>>80
incorrect.
"literal strings" are arrays of N length.
Name:
Anonymous2007-09-17 9:06 ID:nNPi4cfO
clearly, the EXPERT PROGRAMMER wouldn't use c in the first place.
better to use java, to produce ENTEPRISE LEVEL, fully scalable, reductible, end-user optimized professional applications.
Why is that? Is it because adding 8 to (subtracting 8 from, depending on your architecture) your stack pointer is more expensive than adding/subtracting 4?
1. Use fprintf ("fast printf") instead of printf.
2. ++i is faster than both i++ and i = i + 1.
3. void main(void) is faster than int main(void) or int main(int, char **) since no value needs to be returned to the OS.
4. Swapping with exclusive-or (a^=b^=a^=b swaps a and b) is faster than using a temporary. This works for all types (including structures), but not on all compilers. Some compilers may also give you a harmless warning.
5. Static storage duration objects are faster than automatic storage duration objects because the CPU doesn't have to set aside storage on the stack every time a function is called. Make your loop indexes global so that you can use them everywhere: int i;
void func(void) { for (i = 0; i < 10; i++) ; /* ... */ }
void func2(void) { for (i = 0; i < 20; i++) ; /* ... */ }
/* ... */
6. Compilers often give more memory to arrays than you asked for. Here's how to check how big an array actually is (memset returns a null pointer if the size you passed to it is bigger than the size of the array you passed to it): int arr[256];
size_t realsize;
for (realsize = 0; realsize <= SIZE_MAX; ++realsize)
if (!memset(arr, 0, realsize)) break;
/* now you know that arr actually has realsize / sizeof (int) elements */
If you combine this with #5, your program will be faster in the long run (but this usually doesn't work for short programs).
Also remember the remark, that is "rm", tool in your UNIX shell.
Name:
Anonymous2007-12-30 6:59
are we talking compilation optimization or code execution? because i don't really care how fast the compile is, in my case i've so far only written applications that compile in a few seconds or less
so what tips do you have for optimizing executable code when you compile with gcc and link with ld to the bsd libc?
Name:
Anonymous2007-12-30 7:33
strip -R.data binary
Name:
Anonymous2007-12-30 13:50
>>102
Don't forget to use -fr, which means fold previous remarks you've made with the new ones. Otherwise you'll erase your old comments! For example:
$ rm -fr ~/ This is my home directory
... will attach the remark "This is my home directory" to ~/, leaving any previous remarks intact.
Also, to view remarks you've made previously, try the -v option:
>>100 6. Compilers often give more memory to arrays than you asked for. Here's how to check how big an array actually is (memset returns a null pointer if the size you passed to it is bigger than the size of the array you passed to it): O HI I UPGRADED UR ALGORITHM! #include <stdlib.h>
#include <stdio.h>
char* main() {
char* DATA = malloc(4097);
printf("%u\n", *(unsigned int*)(DATA - 4));
return DATA; /* ONLY PUSSEYS FREE THERE MEMORT; LET THE DOS DO IT!!! */
}
Name:
Anonymous2008-01-08 10:56
>>11
But which is faster? Three XOR instructions or three MOV's using a temporary register?
#define iswap(x,y) (do uintptr_t __x=x,x=y,y=__x; while(0))
#define pswap(p,s) (do void *__p=p,p=s,s=__p; while(0))
#define fswap(f,g) (do long double __f=f,f=g,g=__f; while(0))
Name:
Anonymous2008-01-08 12:20
>>117
That looks NP-complete to me... or kind of like Perl... but I think parsing Perl is NP-complete.