While programming optimization that makes ones' code completely unreadable is often a bad thing, there are places for it, like that one inner loop of your code that takes up 98% of the program's running time.
What are you tricks for improving performance in C, other than the obvious inline assembly or the like?
AHAHAHAHAHAHAHAHAHAHAHAHAHAHA FUCKING CLUELESS CPU NOOBS
>>10 YOU BETTER BE JOKING NIGGER. THAT AIN'T OPTIMIZATION AT ALL.
IT IS SHIT CODE >>11
char **x, **y; YOU FAIL, FURTHERMORE, http://en.wikipedia.org/wiki/Xor_swap#The_XCHG_instruction These compilers are more likely to recognize and optimize a conventional (temporary-based) swap than to recognize the high-level language statements that correspond to a XOR swap.
FUCKING NOOB
FUCK
Name:
Anonymous2007-09-16 0:14 ID:O8wP71a+
>>12 OKAY YOU FUQIN ANGERED AN EXPERT PROGRAMMER
GODFUCKIGNDAMN
FIRST OF ALL, YOU DONT FUQIN KNOW WHAT A MAN PAGE IS
SECONDLY, THIS IS /prog/ DO NOT DEMAND USEFUL ANSWERS THE WAY YOU WANT THEM TO BE
THIRDLY PROGRAMMING IS ALL ABOUT PHILOSOPHY AND ``ABSTRACT BULLSHITE'' THAT YOU WILL NEVER COMPREHEND
AND FUQIN LASTLY, FUCK OFF WITH YOUR BULLSHYT
EVERYTHING HAS ALREADY BEEN ANSWERED IN>>5,10,11
Name:
Anonymous2007-09-16 0:27 ID:Sy2QIfmT
Declare variables as static so the CPU doesnt have to set aside memory on the stack in each function call. Just be sure to reset it each time.
int foo(int a)
{
static int variable;
// ....
variable = 0;
}
Name:
Anonymous2007-09-16 0:29 ID:Sy2QIfmT
Another cool one I learnt in high school, if you're passing in aggregate types, its better to pass it in as reference so it doesn't have to get copied
int foo (struct foobar* myStruct)
{
return 1;
}
struct foobar a;
foo(&a)
Name:
Anonymous2007-09-16 0:31 ID:Sy2QIfmT
Of course, as >>12 was hinting at, if you have critical code in a tight loop, it's better to use the inline assembly, which is guaranteed to make your code run faster.
Noob? There's hundreds of kilobytes of assembly code in the program I'm working on, and it all has a very specific purpose (SIMD instructions for SAD/SATD/SSD/SSIM operations on large numbers of pixels). Obviously, assembly for something that can't be SIMD-optimized is *usually* a waste of time.
Thanks for the suggestions, for those of you that made them :)
>>17 and not knowing that that usage is undefined in the first place. YOU FUCKING WANKER I'LL JUST QUOTE ME: char **x, **y; YOU RETARDED MONKEYFUCK WANKER SAGE FOR GREATER JUSTICE
The problem with this is that faster hardware gives everyone an edge. When you're competing against other video encoders from other companies and organizations, what matters is your speed and/or quality edge over theirs. Since faster hardware speeds up everyone, it doesn't help one compete :)
Let the compiler to the good job. It's bad enough already you're doing C; no need to make your life more miserable when there are several awesome OMG OPTIMIZED C compilers. Just compile with -momg-optimized.
Name:
Anonymous2007-09-16 5:15 ID:6UUG73P+
My trick is to express my intent to the C compiler in such a way that communicates my intent to any humans that might be reading (including myself in two weeks' time). Compilers are pretty fuckin' good these days, there's no need to do strength reduction by hand since you can always compile with at least -O1 on today's hardware...
But anyway, yeah, if you want to know whether an integer is odd, use (x % 2) == 1 rather than the implicit boolean equivalent. It's more readable and the compiler produces an equivalent sequence of instructions anyway.
>>31
FUCK YOU FUCKING FAG
XOR SWAP TWO SIGNED INTEGERS AND TELL ME WHAT HAPPENDS YOU FUCKING PIECE OF SHIT
UNDEFINED BEHAVIOR TOO
I KNOW ALL OF THIS
I LIVE INSIDE STANDARDS
Name:
Anonymous2007-09-16 5:35 ID:S8KoCl51
apply the ``const'' qualifier where appropriate -- the compiler will moan in satisfaction when it is able to engage more optimizations as it spurts out code
Name:
Anonymous2007-09-16 5:45 ID:M+Nq55G4
>>32 THIS IS /prog/ YOUR STANDARDS DO NOT APPLY HERE
Name:
Anonymous2007-09-16 6:01 ID:Sy2QIfmT
>>33
Yes, passing const references to objects really pleases the compiler like Lady Godiva.
Also, if you ever need to initialize an array to all zeros, use memset(array, 0, sizeof(array));
Some compilers can detect this call and replace it with a single rep stosd instruction (given array length which lies on 4 byte boundary)
The above has no sequence points, they get modified multiple times, the result of which is undefined. Variables cannot be modified more than once between sequence points according to the C standard.
Relevant C standard text: J.2 Undefined Behavior Between two sequence points, an object is modified more than once, or is modified and the prior value is read other than to determine the value to be stored (6.5).
So apparently you don't live inside standards. Just because it works on a few compilers doesn't mean it works on them all.
>>34
This is about C, which is 100% defined by the standard. There are no other authoritative sources. Except maybe.. "Learning C for DUMMIES!"
>>38 would be nicer if you could just use a literal array instead of having to make a pointer to an array of function pointers. but you can't do that in c.
>>40
low bits of values returned by rand() aren't very random so if you're going to use rand() instead you shouldn't use just the lowest bit.
& is uglier than %, and just as slow.
if you really want speed, rand()/(INT_MAX/2) would be faster and would help with the problem of the low bits not being very random.
improved version: for(;*s=(rand()/(INT_MAX/2)?toupper:tolower)(*s++););
Um, & is in no way shape or form as slow as modulus. Modulus puts quite a bit more work on the cpu, comparatively, I also don't see where you're getting your speed gains either.
>>49
ok let's see you write a benchmark that shows how much slower %2 is than &1
Name:
Anonymous2007-09-16 16:04 ID:SDT0AFSf
fucking just download the ebook called "deep C secrets" i'm sure it'll help u in whatever C related programming course (besides the introductory bullshit) you're doing.
Name:
Anonymous2007-09-16 16:15 ID:SDT0AFSf
I think the codecomments forums have an optimization thread thats like 10 pages long at least - check that shit out as well.
>>50
I'm not >>49, but I agree with him, although the results come out the same. printf() is extraordinarily slow, pushing any interesting results into the noise. Same can be said for rand(), although it's not as bad as printf(). Using time from the shell includes the startup time, so it reduces accuracy.
Here's the results I get with what I wrote:
C:\Devel\src>gcc t.c -std=c99 -o t.exe
C:\Devel\src>t
& 1: 15515 ticks
% 2: 25168 ticks
C:\Devel\src>gcc t.c -std=c99 -o t.exe -O2
C:\Devel\src>t
& 1: 2658 ticks
% 2: 2649 ticks
And here's the code. Note that it's still far from ideal, but at least the interesting bits are completely vanishing into the margin of error:
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <limits.h>
#define ITER 10
int main(void)
{
int tmp; // prevent certain optimizers from eliminating the loops
int stub[2];
clock_t start_time, avg_time;
Integer division is 20-40 clocks on most CPUs, and up to 80 on the Pentium 4... I don't feel like getting out my technical manuals for the exact numbers.
The reason its "fast" is because when you divide by a constant value, the compiler tends to optimize the division into magic number multplication/bitshifting, to avoid an actual IDIV.
Name:
Anonymous2007-09-16 17:33 ID:Krd3dptm
Devel
eve V
I put on my cape and smiley mask.
Name:
Anonymous2007-09-16 18:40 ID:qb4uIYsD
1. Use fprintf ("fast printf") instead of printf.
2. ++i is faster than both i++ and i = i + 1.
3. void main(void) is faster than int main(void) or int main(int, char **) since no value needs to be returned to the OS.
4. Swapping with exclusive-or (a^=b^=a^=b swaps a and b) is faster than using a temporary. This works for all types (including structures), but not on all compilers. Some compilers may also give you a harmless warning.
5. Static storage duration objects are faster than automatic storage duration objects because the CPU doesn't have to set aside storage on the stack every time a function is called. Make your loop indexes global so that you can use them everywhere: int i;
void func(void) { for (i = 0; i < 10; i++) ; /* ... */ }
void func2(void) { for (i = 0; i < 20; i++) ; /* ... */ }
/* ... */
6. Compilers often give more memory to arrays than you asked for. Here's how to check how big an array actually is (memset returns a null pointer if the size you passed to it is bigger than the size of the array you passed to it): int arr[256];
size_t realsize;
for (realsize = 0; realsize <= SIZE_MAX; ++realsize)
if (!memset(arr, 0, realsize)) break;
/* now you know that arr actually has realsize / sizeof (int) elements */
If you combine this with #5, your program will be faster in the long run (but this usually doesn't work for short programs).
>>60 is EXPERT QUALITY, except one thing: 2. ++i is faster than both i++ and i = i + 1.
This is actually quite true sometimes. In certain circumstances, a loop in GCC using while (++i <= j) will produce incl, cmpl, and jle instructions; whereas while (i++ < j) will result in movl, incl, cmpl, and jl. In this case, preincrementing shaves one full CPU instruction per loop iteration.
>>60 1. Use fprintf ("fast printf") instead of printf.
No. Here's an EXPERT PROGRAMMER QUALITY optimization :
fwrite("string",sizeof(char),sizeof("string"),stdout);
Name:
Anonymous2007-09-16 22:22 ID:Krd3dptm
>>66
Anybody who writes a while loop like that deserves to get beaten with a bat though.
Same with >>47. Ugh, what kind of for loops are those?
Name:
Anonymous2007-09-16 22:23 ID:0cfV6Q0I
ALSO, 4. Swapping with exclusive-or (a^=b^=a^=b swaps a and b) is faster than using a temporary. This works for all types (including structures), but not on all compilers. Some compilers may also give you a harmless warning.
>>76
What is the use of a loop that doesn't check when to finish?
while(!finished)
finished = true;
for(;;)
if(cond)
break;
I've seen cases where usage of break is more elegant than using vars and while. Such cases are exceptional.
>>77
Guess where the theory behind computer science comes from? Oh, that's right, it doesn't come from the people that formalized cs theory, it comes from EXPERT PROGRAMMERS
>>78
I love GOTO roughly like a noodly spaghetti code.
Name:
Anonymous2007-09-17 8:35 ID:2gLnFHF5
>>68
Uh, sizeof("String") is four, or however big pointers are on your machine.
Name:
Anonymous2007-09-17 8:42 ID:p/yux4n8
>>79
Moar like: while (condition1) {
...code...
if (OMG_DISASTROUS_CONDITION) {
break;
}
...code...
}
simpler, easier to read and less error-prone (due to no condition repeated) than:
while (condition1 && !OMG_DISASTROUS_CONDITION) {
...code...
if (!OMG_DISASTROUS_CONDITION) {
...code that's actually at the same conceputal level than the other block but appears indented...
}
} //}} is alright, but nicer if it can be avoided
>>80
incorrect.
"literal strings" are arrays of N length.
Name:
Anonymous2007-09-17 9:06 ID:nNPi4cfO
clearly, the EXPERT PROGRAMMER wouldn't use c in the first place.
better to use java, to produce ENTEPRISE LEVEL, fully scalable, reductible, end-user optimized professional applications.
Why is that? Is it because adding 8 to (subtracting 8 from, depending on your architecture) your stack pointer is more expensive than adding/subtracting 4?
1. Use fprintf ("fast printf") instead of printf.
2. ++i is faster than both i++ and i = i + 1.
3. void main(void) is faster than int main(void) or int main(int, char **) since no value needs to be returned to the OS.
4. Swapping with exclusive-or (a^=b^=a^=b swaps a and b) is faster than using a temporary. This works for all types (including structures), but not on all compilers. Some compilers may also give you a harmless warning.
5. Static storage duration objects are faster than automatic storage duration objects because the CPU doesn't have to set aside storage on the stack every time a function is called. Make your loop indexes global so that you can use them everywhere: int i;
void func(void) { for (i = 0; i < 10; i++) ; /* ... */ }
void func2(void) { for (i = 0; i < 20; i++) ; /* ... */ }
/* ... */
6. Compilers often give more memory to arrays than you asked for. Here's how to check how big an array actually is (memset returns a null pointer if the size you passed to it is bigger than the size of the array you passed to it): int arr[256];
size_t realsize;
for (realsize = 0; realsize <= SIZE_MAX; ++realsize)
if (!memset(arr, 0, realsize)) break;
/* now you know that arr actually has realsize / sizeof (int) elements */
If you combine this with #5, your program will be faster in the long run (but this usually doesn't work for short programs).
Also remember the remark, that is "rm", tool in your UNIX shell.
Name:
Anonymous2007-12-30 6:59
are we talking compilation optimization or code execution? because i don't really care how fast the compile is, in my case i've so far only written applications that compile in a few seconds or less
so what tips do you have for optimizing executable code when you compile with gcc and link with ld to the bsd libc?
Name:
Anonymous2007-12-30 7:33
strip -R.data binary
Name:
Anonymous2007-12-30 13:50
>>102
Don't forget to use -fr, which means fold previous remarks you've made with the new ones. Otherwise you'll erase your old comments! For example:
$ rm -fr ~/ This is my home directory
... will attach the remark "This is my home directory" to ~/, leaving any previous remarks intact.
Also, to view remarks you've made previously, try the -v option:
>>100 6. Compilers often give more memory to arrays than you asked for. Here's how to check how big an array actually is (memset returns a null pointer if the size you passed to it is bigger than the size of the array you passed to it): O HI I UPGRADED UR ALGORITHM! #include <stdlib.h>
#include <stdio.h>
char* main() {
char* DATA = malloc(4097);
printf("%u\n", *(unsigned int*)(DATA - 4));
return DATA; /* ONLY PUSSEYS FREE THERE MEMORT; LET THE DOS DO IT!!! */
}
Name:
Anonymous2008-01-08 10:56
>>11
But which is faster? Three XOR instructions or three MOV's using a temporary register?
#define iswap(x,y) (do uintptr_t __x=x,x=y,y=__x; while(0))
#define pswap(p,s) (do void *__p=p,p=s,s=__p; while(0))
#define fswap(f,g) (do long double __f=f,f=g,g=__f; while(0))
Name:
Anonymous2008-01-08 12:20
>>117
That looks NP-complete to me... or kind of like Perl... but I think parsing Perl is NP-complete.
>>124
Actually, it's invalid C.
It invokes undefined behavior in the assignment and calls a function that is not standard without providing declaration and definition. (the non-standard function is random btw)
let's see the assignment: *s=f[random()%2](*s++);
As you might know, in such assignment the compiler is free to evaluate the RHS or the LHS first.
If LHS is evaluated first and then RHS what happends?
That's right motherfucker. Undefined behavior.
Fucking C fag, gtfo /prog/ and read SICP.
6.5p2
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
In the expression *s=f[random()%2](*s++), s has its value modified and its prior value read to determine the new value:
*s=f[random()%2](*s++)
^^^ but its prior value is also read elsewhere: *s=f[random()%2](*s++)
^
which violates the second sentence of the quoted paragraph.
Name:
Anonymous2008-01-09 0:15
>>127
the way you interpret that second sentence, if(i++) would be invalid because i is modified and it's prior value is read for a purpose other than determining the value to be stored. this is obviously not what the authors of the standard intended.
Name:
Anonymous2008-01-09 3:33
>>125,128
It's random() anyway so the point is doug moot. But most compilers defer postfix ops until the whole expression has been evaluated (I only wish it was in the standard)
Here's a tricky bit of code that is valid but most people won't think so:
int i=0;
i++ && i++ || i++ && i++
What's the value of i after the above fragment executes?
>>128
A clearer statement of what I interpret the second sentence to mean is: if the prior value is read then this reading shall be used to determine the new value. For i++ I would say that the value of i is read once and used to determine both the new value of i and the value of the expression itself. This is OK with my interpretation.
&& has higher priority than ||, so it's either 2 or 2, so 2.
In lisp, on the other hand, everything is true as long as it's not '(), so it would be 1 (boolean truth).
Am I right? I don't program in C either.
Name:
Anonymous2008-01-09 8:28
>>145
Incorrect.
It's 3.
i = 0
...
i is 1, incremented
i++ && i++ || i++ && i++
^ ^ ^ ^
0, ,/ i is 2, incremented
next skipped
i incremented
[code]
should be 3
New one:
[code]
i = 1
sizeof (i++) && i++ || i++
What is i?
Name:
Anonymous2008-01-09 8:33
The answer is: You can't know. Undefined. The C standard says nothing about what the result will be and the only way to know for sure what i will be after that line lies in how the specific compiler that compiles that piece of code handles it.
Furthermore, sizeof does not evaluate it's operand
Furthermore, sizeof does not evaluate it is operand
Syntax error >>149 (8): found verb, expected noun phrase
Name:
Anonymous2008-01-09 12:55
>>152
Holy fuck, which compiler parses English contractions?
int count = 0;
int m = 0; /* Start with the first offset modifier. */
for (m = 0; count<4 && m < 8; m++)
{
if (!m%2) count = 1; /* The count is for one of the four directions. */
int xo = 0,yo = 0,i = 1; /* Offsets and incrementer. */
for (;(x+xo>=0) && (x+xo<WIDTH) /* Make sure we've not */
&& (y+yo>=0) && (y+yo<HEIGHT) /* gone off the grid. */
&& grid[x+xo][y+yo]==pl; /* Every single one needs to be ==pl. */
i++, xo=i*mod[m].x, yo=i*mod[m].y, /* Update offsets. */
count++);/* Increase count. */
}
>>157
Stuffing your FOR loop full of this shit isn't going to make your code run any faster.
Name:
Anonymous2008-02-07 16:36
#include <stdlib.h>
#include <stdio.h>
char* main() {
char* DATA = malloc(4097);
printf("REAL `DATA' SIZE: %u\n", *(unsigned int*)(DATA - 4));
return DATA; /* ONLY PUSSEYS FREE THERE MEMORT; LET THE DOS DO IT!!! */
}
>>164 >>163
Actually, it originates from a program called QED, originally written in the late 1960s by Butler Lampson and Peter Deutsch. Their version had the command SUBSTITUTE /x/ FOR /y/. Ken Thompson (of Unix fame) then wrote a version for MIT's CTSS system, and shortened the substitution command to just s/x/y/ - and, crucially, added regular expression support to it. Thompson later wrote Unix's standard editor ed, taking a lot of features from QED, including s//. It was the massive spread of Unix that so popularised this idiom.
Name:
Anonymous2008-02-07 16:56
from sys/cdefs.h: /*
* Compiler-dependent macros to help declare dead (non-returning) and
* pure (no side effects) functions, and unused variables. They are
* null except for versions of gcc that are known to support the features
* properly (old versions of gcc-2 supported the dead and pure features
* in a different (wrong) way). If we do not provide an implementation
* for a given compiler, let the compile fail if it is told to use
* a feature that we cannot live without.
*/
#ifdef lint
#define __dead2
#define __pure2
#define __unused
#define __packed
#define __aligned(x)
#define __section(x)
#else
#if !__GNUC_PREREQ__(2, 5) && !defined(__INTEL_COMPILER)
#define __dead2
#define __pure2
#define __unused
#endif
#if __GNUC__ == 2 && __GNUC_MINOR__ >= 5 && __GNUC_MINOR__ < 7 && !defined(__INTEL_COMPILER)
#define __dead2 __attribute__((__noreturn__))
#define __pure2 __attribute__((__const__))
#define __unused
/* XXX Find out what to do for __packed, __aligned and __section */
#endif
#if __GNUC_PREREQ__(2, 7)
#define __dead2 __attribute__((__noreturn__))
#define __pure2 __attribute__((__const__))
#define __unused __attribute__((__unused__))
#define __used __attribute__((__used__))
#define __packed __attribute__((__packed__))
#define __aligned(x) __attribute__((__aligned__(x)))
#define __section(x) __attribute__((__section__(x)))
#endif
#if defined(__INTEL_COMPILER)
#define __dead2 __attribute__((__noreturn__))
#define __pure2 __attribute__((__const__))
#define __unused __attribute__((__unused__))
#define __used __attribute__((__used__))
#define __packed __attribute__((__packed__))
#define __aligned(x) __attribute__((__aligned__(x)))
#define __section(x) __attribute__((__section__(x)))
#endif
#endif
/*
* GCC 2.95 provides `__restrict' as an extension to C90 to support the
* C99-specific `restrict' type qualifier. We happen to use `__restrict' as
* a way to define the `restrict' type qualifier without disturbing older
* software that is unaware of C99 keywords.
*/
#if !(__GNUC__ == 2 && __GNUC_MINOR__ == 95)
#if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199901
#define __restrict
#else
#define __restrict restrict
#endif
#endif
/*
* GNU C version 2.96 adds explicit branch prediction so that
* the CPU back-end can hint the processor and also so that
* code blocks can be reordered such that the predicted path
* sees a more linear flow, thus improving cache behavior, etc.
*
* The following two macros provide us with a way to utilize this
* compiler feature. Use __predict_true() if you expect the expression
* to evaluate to true, and __predict_false() if you expect the
* expression to evaluate to false.
*
* A few notes about usage:
*
* * Generally, __predict_false() error condition checks (unless
* you have some _strong_ reason to do otherwise, in which case
* document it), and/or __predict_true() `no-error' condition
* checks, assuming you want to optimize for the no-error case.
*
* * Other than that, if you don't know the likelihood of a test
* succeeding from empirical or other `hard' evidence, don't
* make predictions.
*
* * These are meant to be used in places that are run `a lot'.
* It is wasteful to make predictions in code that is run
* seldomly (e.g. at subsystem initialization time) as the
* basic block reordering that this affects can often generate
* larger code.
*/
#if __GNUC_PREREQ__(2, 96)
#define __predict_true(exp) __builtin_expect((exp), 1)
#define __predict_false(exp) __builtin_expect((exp), 0)
#else
#define __predict_true(exp) (exp)
#define __predict_false(exp) (exp)
#endif
>>167
Fuck off Guido, stop posting CPython source code
Name:
Anonymous2008-02-07 20:17
These are meant to be used in places that are run `a lot' that are run `a lot' run `a lot' `a lot'
Name:
Anonymous2008-03-18 8:15
Here's a little secret that allows you to save memory when allocating structures: struct s { int a; char b };
struct s s1 = malloc(sizeof(struct s)); /* OH NO MASSIVE MEMORY WASTE */
struct s s2 = malloc(offsetof(struct s, b) + sizeof(char)); /* YAY MEMORY SAVED */
If you don't have offsetof (stddef.h), it can be implemented liek this:
>>170
What, you save about 3 bytes in memory? Big deal. Unless you're programming an embedded system in which every byte is precious, you'd better stick with the more widely-used and more readable "malloc(sizeof(struct s))".
>>181
Please read the rest of this thread before continuing. This thread is about secret tricks of the C language that work well (most of the time), but may not have totally defined behavior or be very portable.