I'm trying to figure out how to compute an integer power of a double efficiently with as little overhead as possible (trying to avoid loops and recursion to minimize the amount of calls and jumps). Calculation speed is the priority.
So far I got this trivial piece of code:
...which of course assumes the exponent is at most 10. The code can easily be expanded if higher exponents are needed.
I'm thinking this is probably optimal, but I know you /prog/riders have some magic tricks up your sleeve, so hit me with them.
Name:
Anonymous2012-04-15 6:08
Stop programming right now . Don't ever program again . Leave this board and never speak of what has occurred today.
f :: Num a => a -> Int -> a
f a b | b == 2 = a2
| b == 3 = a3
| b == 4 = a4
| b == 5 = a4 * a
| b == 6 = a3 * a3
| b == 7 = a3 * a3 * a
| b == 8 = a4 * a4
| b == 9 = a4 * a4 * a
| True = a4 * a4 * a2
where a2 = a * a
a3 = a2 * a
a4 = a2 * a2
>>16
It'll do O(log(n)) multiplication operations, but if you are doing arbitrary precision arithmetic, you would also have to account for the amount of time it takes for each multiplication operation, as the numbers became larger and larger. But since it's just a double, multiplication time is bounded by a constant.
>>15
nope, just that there is a compiler C_1 running on platform P_1, and a compiler C_2 running on a platform P_2, such that there exists two non tail call optimized implementations, NTC1 and NTC2, and there exists a reasonably small epsilon, such that:
>>16 double g(double a, unsigned b)
{
double c, d = 1;
unsigned e;
for (; b > 0; b -= e) {
c = a;
for (e = 1; (e<<1) <= b; e <<= 1) { c *= c; }
d *= c;
}
return d;
}
My test indicates that your function gives imprecise results. f: 0.207229: 9999990000004500129961189827839490010025633955437521662187637223456768.000000
g: 1.470572: 9999990000004500129961189827839490010025633955437521662187637223456768.000000
pow: 1.312694: 9999990000004498597465648961950631651678606805128338043448515039854592.000000
>>21
The function is kind of slow because you call pow(). Nice job. You're still below average. Now go scrub another toilet.
Name:
Anonymous2012-04-15 11:25
>>18,20
And what happens if the compiler does TCO on the non tail call optimized version (you know, like a modern version of GCC would do with optimization)?
And what happens if the compiler creates separate functions for each branch in >>1's function and chooses the corresponding function to call at compile time so the whole procedure becomes branch-less (you know, like a modern version of GCC would do with optimization)?
Your first inequality fails as long as the second optimization is performed, the second of course can never fail since you haven't given any bounds to your epsilon (I presume it is at least greater than zero), which also renders the whole inequality totally worthless since it teaches us nothing new beyond that an absolute value is less than some other value.
Name:
Anonymous2012-04-15 11:42
double f(double a, char b){
while (1) {
double try_result = rand() / 128.0;
if (char == 2 && sqrt(try_result) == a)
return try_result;
else if (pow(a, b) == try_result)
return try_result;
}
}
Name:
Anonymous2012-04-15 12:16
>>26
The (maximum value of) epsilon for double is given in the C standard.
Name:
Anonymous2012-04-15 12:17
>And what happens if the compiler does TCO
You need to tell us what TCO stands for. Sorry, but I can't read the mind of a fucking idiot.
>>33
Not only have I read SICP, but I've also done almost all of the exercises in that that book.
Name:
Anonymous2012-04-15 14:57
>>26 And what happens if the compiler does TCO on the non tail call optimized version (you know, like a modern version of GCC would do with optimization)?
Well, seeing how both posters were comparing >>1 to a non tail call optimized version of the program, I can only trust that they know for a fact that compiler C_i did not produce a tail call optimized version of implementation NTC_i, for platform P_i. We could account for unreliability there by assigning a probability to the event of NTC_i being tail call optimized.
And what happens if the compiler creates separate functions for each branch in >>1's function and chooses the corresponding function to call at compile time so the whole procedure becomes branch-less (you know, like a modern version of GCC would do with optimization)?
Then that would be a pretty bitchen compiler. Inlining and constant folding ma nigga. But even then that is still describable using this model. Because the compiler and source file are both present in the time function, the compiler is free to generate whatever binary for the platform it wants. The time function only measures the time taken for the generated binary.
Your first inequality fails as long as the second optimization is performed,
maybe, maybe not. You never know.
the second of course can never fail since you haven't given any bounds to your epsilon (I presume it is at least greater than zero), which also renders the whole inequality totally worthless since it teaches us nothing new beyond that an absolute value is less than some other value.
There exists a reasonably small epsilon such that the sentence holds true. The value of epsilon depends on >>5-kun's accuracy in perception of time, and is probably something like a tenth of a second, or maybe a half of a second. Come to think of it, it's probably linearly related to the amount of time ey waits, so it might be better to describe it like that in general.
Name:
Anonymous2012-04-15 14:59
>>36 maybe, maybe not. You never know.
Given the premises then it obviously will you stupid piece of shit.
There exists a reasonably small epsilon such that the sentence holds true.
This is not a mathematically precise statement you dumb shit, any estimate you give will be wrong though until you define a compiler.
Read the C standard.
Name:
Anonymous2012-04-15 15:01
>>28
Irrelevant, the epsilon we're referring to is a mathematically unbounded epsilon not a C double.
>>29
You must be mentally challenged, can't you read you stupid piece of shit?
>>35
So you can't read properly, get real eyes or a real brain you fucking mouth breather.
>>29
Actually now that I think about it this is actually kind of funny, we're talking about tail call optimization but this genius can't figure out what TCO means?
Given the premises then it obviously will you stupid piece of shit.
No really, the model doesn't tell you any information about how C_i does its compiling.
This is not a mathematically precise statement you dumb shit, any estimate you give will be wrong though until you define a compiler.
Exists(epsilon) with reasonably_small_amount_of_time_in_seconds_relative_to_human_perception_of_time(epsilon), such that P(epsilon)
And reasonably_small_amount_of_time_in_seconds_relative_to_human_perception_of_time(epsilon) might be defined as epsilon < .25 seconds, or something similar depending on what one find appropriate for describing the sentence in >>7.
It's true that a compiler has not been defined yet, making these statements not very useful. But they are justifiable in the sense that they are simply collecting truths from the statements provided in >>5 and >>7.
Name:
Anonymous2012-04-15 15:16
>>41
Congratulations, you don't know basic things like TCO, I bet your mom is proud of you mastering how to shitpost you fucking moron.
>>42
You're retarded, I'm going to stop talking to you.
You may also want to have Mathematica or Maple handy, because you will need to generate polynomial coefficients.
Name:
Anonymous2012-04-16 0:37
TCO (only pending computation is in power for 1 call), algorithm optimized, could be made iterative.
double powerEven(double x, int exp, int limit) {
if(exp != limit)
return powerEven(x * x, exp * 2, limit);
else
return x;
}
No, because most hardware doesn't have a built in ^ operator. You need to extract the mantissa and exponent bits directly from the IEEE754 encoding and do some bit twiddling and take the logarithm in base 2.
Name:
Anonymous2012-04-16 1:40
>>53
eh, I was messin with the exponent bits there.
double = (sign, mantissa, exponent)
x : double
y : int
pow(x,y) =
match x with (sign, mantissa, exponent in
(sign, mantissa, exponent * y)
I think that's it, although one would need to handler properly extracting the exponent, and converting from bias 127 or whatever it is and doing the multiplication. Overflows and underflows should take the number +/- inf or zero, depending on where it went. And then there's the cases when x is +/- inf or zero, or nan.
why don't just create n functions for every possible exponent given for your data type? Then you'll avoid the overhead needed for running into all those ifs
Don't forget to include your frozenvoid.h
double pow1(double a) { return a; }
double pow2(double a) { return a*a; }
double pow3(double a) { return a*a*a; }
//...
Name:
Anonymous2012-04-16 4:27
>>51
See >>23
In less than a minute, I can write my own pow function that calculates a non-integer power almost two times faster than the standard pow calculates an integer power.
Guess what niggers, because the standard C floating point arithmetic library is slow, narrow and inconsistent as fuck, I am now going to write my own fucking floating point library with operator overloading and everything. Maybe I will even make it arbitrary-precision.
Multi-valued functions? Superfunctions? Fuck you, I'll define everything however I want. I will make pi equal niggerfaggot if I must. I will implement superpositions and all sorts of crazy shit.
None of you care, but it felt good to say this in public. Why? Because I can pretend some of you care, even if no one does.
>>57
I care. I think it would be fun to overload every seeples operator to do something interesting for numerical stuff. But what's a superposition in this context?
Name:
Anonymous2012-04-16 4:52
TCO now stands for Total Cunt Off
This thread has become a TCO
Name:
Anonymous2012-04-16 5:00
>>58 I care. I think it would be fun to overload every seeples operator to do something interesting for numerical stuff.
I do C, not Sepples. But what's a superposition in this context?
I don't know yet, but I have a feeling I will need it when dealing with infinities, superfunctions, and you know, the unknown. I am faware that math is inherently flawed (according to Gödel) so I will constantly be correcting errors, but never quite get everything right. Then I will implement automatic error correction algorithms, algorithms that correct those algorithms, and so on.
I am so autistic of my nature that I enjoy every second of doing all that shit with no goal, meaning or purpose. As long as it doesn't require conscious effort, it is work that I will do for free. I define conscious effort to be any effort required to hinder oneself. Note that I consider self-destruction and self-domination to be two entirely distinct concepts.
Name:
Anonymous2012-04-16 5:09
>>60
Oh, and remember, kids: If the facts don't fit the theory, change the facts. -Albert Einstein
A Nehalem can do an F2XM1 and FYL2X together in less than 150 clock cycles. (They fucked it up with Sandy Bridge, which takes almost 800 clock cycles.) That's around the range you'll need to beat. A multiplication is about 5 cycles and add is 3 cycles. Exponentiation by squaring will probably do the trick.
If you use the exponentiation by squaring technique, and implement it using this:
#define NUM_BITS sizeof(long long)*8
double powerlonglong(double x, long long n) {
double powers[NUM_BITS]; // powers[i] = x^(2^i)
double mul;
int index;
powers[0] = x; // x^(2^0) = x^1 = x
for(index = 1; index < NUM_BITS; index++) {
powers[index] = powers[index-1]*powers[index-1]; // x^(2^i) = x^((2^i-1)*2) = (x^(2^i-1))^2
}
// x^n = x^(sum c_i2^i) = mul x^(c_i2^i) = mul x_i where x_i = 1 if c_i = 0,
// x_i = x^(2^i) = powers[i] if c_i = 1
mul = 1.0;
for(index = 0; index < NUM_BITS; index++) {
if(n&(1<<index)) {
mul *= powers[index];
}
}
return mul;
}
NUM_BITS is the number of bits taken by the exponent.
It will take NUM_BITS multiplications to generate the powers array. Then the needed elements of the powers array are multiplied together to form the final product. In the worst case (if the exponent is 0b1111111...), then every term in the powers array will be multiplied together again, giving NUM_BITS multiplications again.
So the whole thing can be done usng a worst case 2*NUM_BITS amount of multiplications. So accounting for time in multiplication alone, with would take 10*NUM_BITS clock cycles in the worst case, which would hypothetically give better performance than 150 clock cycles when NUM_BITS < 15, and it could beat sandy bridge when NUM_BITS < 80. So Sandy bridge seems like it would definitely be possible, although you'd have to get the control flow to be efficient.
>>66
index should have started at zero there I think:
got rid of the index variable:
double power3longlong(double x, unsigned long long n) {
double xs = x; // always x^(2^i) power on the ith run of the loop with i starting at zero.
double mul = 1.0;
for(; n; xs = xs*xs, n>>=1) {
if(n & 1) {
mul *= xs;
}
}
return mul;
}
>>72
nothing to wwww about.
what that guy is not-so-subtly telling you is that it's impossible to tell if an algorithm is wrong until it actually goes wrong. to make matters worse, you yourself decide what is wrong. to make matters even worse, if you want to find out the answer from somewhere other than yourself, you will have to deconstruct yourself. to make matters still worse, all self-deconstruction really does is change the definition of you.
Why not just use a lookup table? This why you only need to add the value of the number to the pointer and dereference it. You don't even need to call a function. HIBT?
Name:
Anonymous2012-04-17 8:34
>>75
double is 64-bit floating point variable. That means there are 2^64 different double variables.
The size of lookuptable should be 2^64 * 2^8 * 8 bytes. That quite a lot. Time to buy more RAM.
I dunno. It's a lot more concise when I change the unsigned long long to unsigned int though.
c:
double power3int(double x, unsigned int n) {
double xs = x; // always x^(2^i) power on the ith run of the loop with i starting at zero.
double mul = 1.0;
for(; n; xs = xs*xs, n>>=1) {
if(n & 1) {
mul *= xs;
}
}
return mul;
}
Surprised nobody pulled out the good ol'e Duff's device. It's not optimal though.
double h(double a, char b) {
double acc = 1.0;
while( b ) {
int m = 0;
int d = b;
int t = a;
while( d>>=1 ) {
t*=t;
m++;
}
acc *= t;
b &= ~(1<<(m));
}
return acc;
}
This is slow garbage, but 30% faster than pow() on average, probably slower on b=111111 etc.
There's a much faster version that is roughly O(log(n) + m) where n is the position of the high bit and m is the number of bits set. But I can't be arsed to work out the fiddly part.
Dunno how this compares to >>19, I haven't looked closely but >>19 looks very similar to this, or possibly the faster one I mentioned (but unlikely, the other one doesn't need nested loops.)