Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Fast positive integer power function

Name: Anonymous 2012-04-15 6:00

I'm trying to figure out how to compute an integer power of a double efficiently with as little overhead as possible (trying to avoid loops and recursion to minimize the amount of calls and jumps). Calculation speed is the priority.
So far I got this trivial piece of code:

double f(double a,char b){
if(b==2)return a*a;
if(b==3)return a*a*a;
if(b==4)return a*a*a*a;
if(b==5)return a*a*a*a*a;
if(b==6)return a*a*a*a*a*a;
if(b==7)return a*a*a*a*a*a*a;
if(b==8)return a*a*a*a*a*a*a*a;
if(b==9)return a*a*a*a*a*a*a*a*a;
        return a*a*a*a*a*a*a*a*a*a;
}


...which of course assumes the exponent is at most 10. The code can easily be expanded if higher exponents are needed.
I'm thinking this is probably optimal, but I know you /prog/riders have some magic tricks up your sleeve, so hit me with them.

Name: Anonymous 2012-04-15 15:12

>>38
Go scrub another toilet you no talent bitch.

Name: Anonymous 2012-04-15 15:13

>>37

Given the premises then it obviously will you stupid piece of shit.

No really, the model doesn't tell you any information about how C_i does its compiling.

This is not a mathematically precise statement you dumb shit, any estimate you give will be wrong though until you define a compiler.

Exists(epsilon) with reasonably_small_amount_of_time_in_seconds_relative_to_human_perception_of_time(epsilon), such that P(epsilon)

And reasonably_small_amount_of_time_in_seconds_relative_to_human_perception_of_time(epsilon) might be defined as epsilon < .25 seconds, or something similar depending on what one find appropriate for describing the sentence in >>7.

It's true that a compiler has not been defined yet, making these statements not very useful. But they are justifiable in the sense that they are simply collecting truths from the statements provided in >>5 and >>7.

Name: Anonymous 2012-04-15 15:16

>>41
Congratulations, you don't know basic things like TCO, I bet your mom is proud of you mastering how to shitpost you fucking moron.

>>42
You're retarded, I'm going to stop talking to you.

Name: Anonymous 2012-04-15 15:18

>>43 why?

Name: Anonymous 2012-04-15 15:25

>>43
I don't speak moronese.

Name: Anonymous 2012-04-15 15:29

>>43 if that was directed to >>44, why do you think I speak moronese?

Name: Anonymous 2012-04-15 15:37

>>46
It wasn't directed at you.

Name: Anonymous 2012-04-15 23:08

xy = by logb x

Name: Anonymous 2012-04-15 23:51

Recursive functions aren't the answer.

There are special algorithms for performing integer powers on IEEE754 floating point numbers that are ultrafast.

I suggest you read two things:

http://floating-point-gui.de/
http://apps.nrbook.com/empanel/index.html

You may also want to have Mathematica or Maple handy, because you will need to generate polynomial coefficients.

Name: Anonymous 2012-04-16 0:37

TCO (only pending computation is in power for 1 call), algorithm optimized, could be made iterative.

double powerEven(double x, int exp, int limit) {
  if(exp != limit)
    return powerEven(x * x, exp * 2, limit);
  else
    return x;
}

double power(double x, int exp) {
  if(exp == 0)
    return 1;
  else if((exp % 2) == 0)
    return powerEven(x, 1, exp);
  else
    return x * powerEven(x, 1, exp - 1);
}

Name: Anonymous 2012-04-16 0:55

Use the Standard pow function.

Name: Anonymous 2012-04-16 1:00

>>49

let x = (1.0 + r)^p

with 0.0 < r < 1.0

compute x^y as:

x^y
= ((1.0 + r)^p)^y
= (1.0 + r)^py

Ami doinit right?

Name: Anonymous 2012-04-16 1:33

No, because most hardware doesn't have a built in ^ operator. You need to extract the mantissa and exponent bits directly from the IEEE754 encoding and do some bit twiddling and take the logarithm in base 2.

Name: Anonymous 2012-04-16 1:40

>>53
eh, I was messin with the exponent bits there.


double = (sign, mantissa, exponent)

x : double
y : int

pow(x,y) =
  match x with (sign, mantissa, exponent in
    (sign, mantissa, exponent * y)

Name: >>54 2012-04-16 1:46

I think that's it, although one would need to handler properly extracting the exponent, and converting from bias 127 or whatever it is and doing the multiplication. Overflows and underflows should take the number +/- inf or zero, depending on where it went. And then there's the cases when x is +/- inf or zero, or nan.

Name: Anonymous 2012-04-16 4:17

why don't just create n functions for every possible exponent given for your data type? Then you'll avoid the overhead needed for running into all those ifs

Don't forget to include your frozenvoid.h


double pow1(double a) { return a; }
double pow2(double a) { return a*a; }
double pow3(double a) { return a*a*a; }
//...

Name: Anonymous 2012-04-16 4:27

>>51
See >>23
In less than a minute, I can write my own pow function that calculates a non-integer power almost two times faster than the standard pow calculates an integer power.

Guess what niggers, because the standard C floating point arithmetic library is slow, narrow and inconsistent as fuck, I am now going to write my own fucking floating point library with operator overloading and everything. Maybe I will even make it arbitrary-precision.
Multi-valued functions? Superfunctions? Fuck you, I'll define everything however I want. I will make pi equal niggerfaggot if I must. I will implement superpositions and all sorts of crazy shit.

None of you care, but it felt good to say this in public. Why? Because I can pretend some of you care, even if no one does.

Name: Anonymous 2012-04-16 4:48

>>57
I care. I think it would be fun to overload every seeples operator to do something interesting for numerical stuff. But what's a superposition in this context?

Name: Anonymous 2012-04-16 4:52

TCO now stands for Total Cunt Off

This thread has become a TCO

Name: Anonymous 2012-04-16 5:00

>>58
I care. I think it would be fun to overload every seeples operator to do something interesting for numerical stuff.
I do C, not Sepples.
But what's a superposition in this context?
I don't know yet, but I have a feeling I will need it when dealing with infinities, superfunctions, and you know, the unknown. I am faware that math is inherently flawed (according to Gödel) so I will constantly be correcting errors, but never quite get everything right. Then I will implement automatic error correction algorithms, algorithms that correct those algorithms, and so on.
I am so autistic of my nature that I enjoy every second of doing all that shit with no goal, meaning or purpose. As long as it doesn't require conscious effort, it is work that I will do for free. I define conscious effort to be any effort required to hinder oneself. Note that I consider self-destruction and self-domination to be two entirely distinct concepts.

Name: Anonymous 2012-04-16 5:09

>>60
Oh, and remember, kids:
If the facts don't fit the theory, change the facts. -Albert Einstein

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2012-04-16 8:26

A Nehalem can do an F2XM1 and FYL2X together in less than 150 clock cycles. (They fucked it up with Sandy Bridge, which takes almost 800 clock cycles.) That's around the range you'll need to beat. A multiplication is about 5 cycles and add is 3 cycles. Exponentiation by squaring will probably do the trick.

Name: Anonymous 2012-04-16 23:05

>>62
wow.

If you use the exponentiation by squaring technique, and implement it using this:


#define NUM_BITS sizeof(long long)*8

double powerlonglong(double x, long long n) {
  double powers[NUM_BITS]; // powers[i] = x^(2^i)
  double mul;
  int index;
  powers[0] = x; // x^(2^0) = x^1 = x
  for(index = 1; index < NUM_BITS; index++) {
    powers[index] = powers[index-1]*powers[index-1]; // x^(2^i) = x^((2^i-1)*2) = (x^(2^i-1))^2
  }
  // x^n = x^(sum c_i2^i) = mul x^(c_i2^i) = mul x_i where x_i = 1 if c_i = 0,
                                                        // x_i = x^(2^i) = powers[i] if c_i = 1
  mul = 1.0;
  for(index = 0; index < NUM_BITS; index++) {
    if(n&(1<<index)) {
      mul *= powers[index];
    }
  }
  return mul;
}


NUM_BITS is the number of bits taken by the exponent.

It will take NUM_BITS multiplications to generate the powers array. Then the needed elements of the powers array are multiplied together to form the final product. In the worst case (if the exponent is 0b1111111...), then every term in the powers array will be multiplied together again, giving NUM_BITS multiplications again.

So the whole thing can be done usng a worst case 2*NUM_BITS amount of multiplications. So accounting for time in multiplication alone, with would take 10*NUM_BITS clock cycles in the worst case, which would hypothetically give better performance than 150 clock cycles when NUM_BITS < 15, and it could beat sandy bridge when NUM_BITS < 80. So Sandy bridge seems like it would definitely be possible, although you'd have to get the control flow to be efficient.

Name: Anonymous 2012-04-16 23:38

>>63


--- faggot.c    Mon Apr 16 20:36:06 2012
+++ faggot2.c   Mon Apr 16 20:36:32 2012
@@ -1,4 +1,6 @@
-#define NUM_BITS sizeof(long long)*8
+#include <limits.h>
+
+#define NUM_BITS sizeof(long long)*CHAR_BIT
 
 double powerlonglong(double x, long long n) {
   double powers[NUM_BITS]; // powers[i] = x^(2^i)

Name: Anonymous 2012-04-16 23:51

>>64

yah, thank you. I was wondering why the 33rd bit was always one and making it always infinity.

an iterative version that doesn't require the powers array:

double power2longlong(double x, unsigned int n) {
  double xs = x; // always represents x^(2^index) power.
  double mul = 1.0;
  int index;
//printf("\n");
  for(index = 1; index < NUM_BITS; index++, xs = xs*xs) {
//    printf("%d: ", index);
    if(n&(1<<index)) {
      mul *= xs;
//      printf("1: ");
    } else {
//      printf("0: ");
    }
//    printf("%lf %lf\n", xs, mul);
  }
//printf("\n");
  return mul;
}


The repeated squaring gets very large very quickly, and goes to infinity after the 10th bit or so, depending on the base.

Name: Anonymous 2012-04-16 23:52

sorry that was ugly.


double power2longlong(double x, unsigned long long n) {
  double xs = x; // always represents x^(2^index) power.
  double mul = 1.0;
  int index;
  for(index = 1; index < NUM_BITS; index++, xs = xs*xs) {
    if(n&(1<<index)) {
      mul *= xs;
    }
  }
  return mul;
}

Name: Anonymous 2012-04-16 23:59

>>66
index should have started at zero there I think:

got rid of the index variable:

double power3longlong(double x, unsigned long long n) {
  double xs = x; // always x^(2^i) power on the ith run of the loop with i starting at zero.
  double mul = 1.0;
  for(; n; xs = xs*xs, n>>=1) {
    if(n & 1) {
      mul *= xs;
    }
  }
  return mul;
}

Name: Anonymous 2012-04-17 0:06

>>53 I see what you mean now. It's actually:

x = (1.0 + r)*2.0^p

so:

x^y
= ((1.0 + r)*2.0^p)^y
= ((1.0 + r)^y)*2.0^py

and then do you calculate (1.0 + r)^y using a taylor series for f(t) = (1.0 + r)^t ?

Name: Anonymous 2012-04-17 0:14

>>66
Nice doubles, bro!

Name: Anonymous 2012-04-17 0:30

>>69
thanks!

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2012-04-17 5:00

Let's see some benchmarks.

Name: Anonymous 2012-04-17 5:29

This is quite fast. I included unit tests to prove it's correct.


double f(double a,char b){
    static int i = 0
    double answers[] = {1.0, 4.0, 16.0, 27.0};
    return answers[i++];
}

int test()
{
    if (f(1, 5) != 1.0) return TEST_FAIL;
    if (f(2, 2) != 4.0) return TEST_FAIL;
    if (f(4, 2) != 16.0) return TEST_FAIL;
    if (f(3, 3) != 27.0) return TEST_FAIL;
    return TEST_SUCCESS;
}

Name: Anonymous 2012-04-17 5:49

>>72
lol

Name: Anonymous 2012-04-17 5:56

>>72
nothing to wwww about.
what that guy is not-so-subtly telling you is that it's impossible to tell if an algorithm is wrong until it actually goes wrong. to make matters worse, you yourself decide what is wrong. to make matters even worse, if you want to find out the answer from somewhere other than yourself, you will have to deconstruct yourself. to make matters still worse, all self-deconstruction really does is change the definition of you.

conscious effort is overrated.
go nuts.

Name: Anonymous 2012-04-17 7:53

Why not just use a lookup table? This why you only need to add the value of the number to the pointer and dereference it. You don't even need to call a function. HIBT?

Name: Anonymous 2012-04-17 8:34

>>75
double is 64-bit floating point variable. That means there are 2^64 different double variables.

The size of lookuptable should be 2^64 * 2^8 * 8 bytes. That quite a lot. Time to buy more RAM.

Name: Anonymous 2012-04-17 8:42

fast dubs checking

Name: Anonymous 2012-04-17 8:54

>>76
That's only 32 ZiB (zebibytes)

Name: Anonymous 2012-04-17 9:03

>>76
I'm a moron. I failed to notice the type. Perhaps Kodak-kun was right.

Name: Anonymous 2012-04-17 10:10

>>75
These types of calculations are best done at compile time, the function is pure as well obviously, so you can also add a cache at run time.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List