Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-4041-8081-

Fast positive integer power function

Name: Anonymous 2012-04-15 6:00

I'm trying to figure out how to compute an integer power of a double efficiently with as little overhead as possible (trying to avoid loops and recursion to minimize the amount of calls and jumps). Calculation speed is the priority.
So far I got this trivial piece of code:

double f(double a,char b){
if(b==2)return a*a;
if(b==3)return a*a*a;
if(b==4)return a*a*a*a;
if(b==5)return a*a*a*a*a;
if(b==6)return a*a*a*a*a*a;
if(b==7)return a*a*a*a*a*a*a;
if(b==8)return a*a*a*a*a*a*a*a;
if(b==9)return a*a*a*a*a*a*a*a*a;
        return a*a*a*a*a*a*a*a*a*a;
}


...which of course assumes the exponent is at most 10. The code can easily be expanded if higher exponents are needed.
I'm thinking this is probably optimal, but I know you /prog/riders have some magic tricks up your sleeve, so hit me with them.

Name: Anonymous 2012-04-15 6:08

Stop programming right now . Don't ever program again . Leave this board and never speak of what has occurred today.

Name: Anonymous 2012-04-15 6:16

>>2
Why?

Name: Anonymous 2012-04-15 6:20

here you go op:


>>> def power(n,p):
...   return eval("n" + "*n"*(p-1))
...
>>> power(2,4)
16
>>> power(256,3)
16777216
>>>

Name: Anonymous 2012-04-15 6:33

>>1
By god, you've successfully made it slower than even a non-tail call optimized recursive function

Name: Anonymous 2012-04-15 6:41

>>5
That's incorrect, it's implementation dependent which is faster.

Name: Anonymous 2012-04-15 6:42

>>5
OP here. It is not slower than recursion. I just tested it. It is about two times faster.

Name: Anonymous 2012-04-15 6:45

>>7
Wrong again, implementation dependency, if you specify a compiler and a version you may say which is faster.

Name: Anonymous 2012-04-15 7:03

>>8
Well, if ey just tested it, then ey must have an compiler to test it with.

Name: Anonymous 2012-04-15 7:12

I think the fastest implementation involves decomposing the exponent into a sum of powers of two, and then performing repeated squaring of the base

Name: Anonymous 2012-04-15 7:13


f :: Num a => a -> Int -> a
f a b | b == 2 = a2
      | b == 3 = a3
      | b == 4 = a4
      | b == 5 = a4 * a
      | b == 6 = a3 * a3
      | b == 7 = a3 * a3 * a
      | b == 8 = a4 * a4
      | b == 9 = a4 * a4 * a
      | True   = a4 * a4 * a2
      where a2 = a  * a
            a3 = a2 * a
            a4 = a2 * a2

Name: Anonymous 2012-04-15 7:13

Name: VIPPER 2012-04-15 7:27

Do those few nanoseconds matter? Wouldnt it just be better to make an iterative function?

Name: Anonymous 2012-04-15 7:35

>>11
sweet

Name: Anonymous 2012-04-15 7:40

>>9
Yes but are you certain that his compiler is the same as >>5's compiler?

Name: Anonymous 2012-04-15 7:54

a2n = (a2)n

double pow(double x, int n) {
  double x2;
  if (n == -1) return 1 / x;
  if (n < 0) return 1 / pow(x, -n);
  if (n == 0) return 1;
  if (n == 1) return x;
  x2 = x * x;
  if (n == 2) return x2;
  if (n % 2 == 0) return pow(x2, n / 2);
  return x * pow(x2, (n - 1) / 2);
}


This seems O(log n).

Name: Anonymous 2012-04-15 8:01

>>16
It'll do O(log(n)) multiplication operations, but if you are doing arbitrary precision arithmetic, you would also have to account for the amount of time it takes for each multiplication operation, as the numbers became larger and larger. But since it's just a double, multiplication time is bounded by a constant.

Name: Anonymous 2012-04-15 8:13

>>15
nope, just that there is a compiler C_1 running on platform P_1, and a compiler C_2 running on a platform P_2, such that there exists two non tail call optimized implementations, NTC1 and NTC2, and there exists a reasonably small epsilon, such that:

time(C_1, P_1, >>1) > time(C_2, P_2, NTC1)

and

| 2*time(C_1, P_1, >>1) - time(C_2, P_2, NTC2) | < epsilon

Name: Anonymous 2012-04-15 8:13

>>16
double g(double a, unsigned b)
{
    double c, d = 1;
    unsigned e;

    for (; b > 0; b -= e) {
        c = a;
        for (e = 1; (e<<1) <= b; e <<= 1) { c *= c; }
        d *= c;
    }

    return d;
}


My test indicates that your function gives imprecise results.
f:   0.207229: 9999990000004500129961189827839490010025633955437521662187637223456768.000000
g:   1.470572: 9999990000004500129961189827839490010025633955437521662187637223456768.000000
pow: 1.312694: 9999990000004498597465648961950631651678606805128338043448515039854592.000000


That's for 9999999^10

Name: Anonymous 2012-04-15 8:14

wait, fixed:

time(C_1, P_1, >>1) > time(C_1, P_1, NTC1)

and

| 2*time(C_2, P_2, >>1) - time(C_2, P_2, NTC2) | < epsilon

Name: Anonymous 2012-04-15 8:31

#include <math.h>

double f(double x, double y)
{
    return pow(x, y);
}


Thread over.

Name: Anonymous 2012-04-15 10:09

dubs check them

Name: Anonymous 2012-04-15 10:26

>>21
Slower than a dead frozen snail at the bottom of a cement pool.

Name: Anonymous 2012-04-15 10:43

>>23
Stop using glibc.

Name: kodak_gallery_programmer !!kCq+A64Losi56ze 2012-04-15 11:11

>>21
The function is kind of slow because you call pow(). Nice job. You're still below average. Now go scrub another toilet.

Name: Anonymous 2012-04-15 11:25

>>18,20
And what happens if the compiler does TCO on the non tail call optimized version (you know, like a modern version of GCC would do with optimization)?

And what happens if the compiler creates separate functions for each branch in >>1's function and chooses the corresponding function to call at compile time so the whole procedure becomes branch-less (you know, like a modern version of GCC would do with optimization)?

Your first inequality fails as long as the second optimization is performed, the second of course can never fail since you haven't given any bounds to your epsilon (I presume it is at least greater than zero), which also renders the whole inequality totally worthless since it teaches us nothing new beyond that an absolute value is less than some other value.

Name: Anonymous 2012-04-15 11:42

double f(double a, char b){
  while (1) {
    double try_result = rand() / 128.0;
    if (char == 2 && sqrt(try_result) == a)
      return try_result;
    else if (pow(a, b) == try_result)
      return try_result;
  }
}

Name: Anonymous 2012-04-15 12:16

>>26
The (maximum value of) epsilon for double is given in the C standard.

Name: Anonymous 2012-04-15 12:17

>And what happens if the compiler does TCO

You need to tell us what TCO stands for. Sorry, but I can't read the mind of a fucking idiot.

Name: Anonymous 2012-04-15 12:20

>>29
Anus Call Optimization

Name: Anonymous 2012-04-15 12:24

>>30
>>26 is a fucking moron.

Name: Anonymous 2012-04-15 13:43

>>33
nice dubs, mr. sage-no-text

Name: Anonymous 2012-04-15 13:48

>>29
You haven't read SICP, but you think you can call other people idiots?

Name: Anonymous 2012-04-15 13:49

>>32
Close, but no cigarillo.

Name: Anonymous 2012-04-15 14:30

>>33
Not only have I read SICP, but I've also done almost all of the exercises in that that book.

Name: Anonymous 2012-04-15 14:57

>>26
And what happens if the compiler does TCO on the non tail call optimized version (you know, like a modern version of GCC would do with optimization)?

Well, seeing how both posters were comparing >>1 to a non tail call optimized version of the program, I can only trust that they know for a fact that compiler C_i did not produce a tail call optimized version of implementation NTC_i, for platform P_i. We could account for unreliability there by assigning a probability to the event of NTC_i being tail call optimized.

And what happens if the compiler creates separate functions for each branch in >>1's function and chooses the corresponding function to call at compile time so the whole procedure becomes branch-less (you know, like a modern version of GCC would do with optimization)?

Then that would be a pretty bitchen compiler. Inlining and constant folding ma nigga. But even then that is still describable using this model. Because the compiler and source file are both present in the time function, the compiler is free to generate whatever binary for the platform it wants. The time function only measures the time taken for the generated binary.

Your first inequality fails as long as the second optimization is performed,

maybe, maybe not. You never know.

the second of course can never fail since you haven't given any bounds to your epsilon (I presume it is at least greater than zero), which also renders the whole inequality totally worthless since it teaches us nothing new beyond that an absolute value is less than some other value.

There exists a reasonably small epsilon such that the sentence holds true. The value of epsilon depends on >>5-kun's accuracy in perception of time, and is probably something like a tenth of a second, or maybe a half of a second. Come to think of it, it's probably linearly related to the amount of time ey waits, so it might be better to describe it like that in general.

Name: Anonymous 2012-04-15 14:59

>>36
maybe, maybe not. You never know.
Given the premises then it obviously will you stupid piece of shit.

There exists a reasonably small epsilon such that the sentence holds true.
This is not a mathematically precise statement you dumb shit, any estimate you give will be wrong though until you define a compiler.

Read the C standard.

Name: Anonymous 2012-04-15 15:01

>>28
Irrelevant, the epsilon we're referring to is a mathematically unbounded epsilon not a C double.

>>29
You must be mentally challenged, can't you read you stupid piece of shit?

>>35
So you can't read properly, get real eyes or a real brain you fucking mouth breather.

Name: Anonymous 2012-04-15 15:03

>>29
Actually now that I think about it this is actually kind of funny, we're talking about tail call optimization but this genius can't figure out what TCO means?

Name: Anonymous 2012-04-15 15:10

So is kodak still trying to pass calculus 101?

Name: Anonymous 2012-04-15 15:12

>>38
Go scrub another toilet you no talent bitch.

Name: Anonymous 2012-04-15 15:13

>>37

Given the premises then it obviously will you stupid piece of shit.

No really, the model doesn't tell you any information about how C_i does its compiling.

This is not a mathematically precise statement you dumb shit, any estimate you give will be wrong though until you define a compiler.

Exists(epsilon) with reasonably_small_amount_of_time_in_seconds_relative_to_human_perception_of_time(epsilon), such that P(epsilon)

And reasonably_small_amount_of_time_in_seconds_relative_to_human_perception_of_time(epsilon) might be defined as epsilon < .25 seconds, or something similar depending on what one find appropriate for describing the sentence in >>7.

It's true that a compiler has not been defined yet, making these statements not very useful. But they are justifiable in the sense that they are simply collecting truths from the statements provided in >>5 and >>7.

Name: Anonymous 2012-04-15 15:16

>>41
Congratulations, you don't know basic things like TCO, I bet your mom is proud of you mastering how to shitpost you fucking moron.

>>42
You're retarded, I'm going to stop talking to you.

Name: Anonymous 2012-04-15 15:18

>>43 why?

Name: Anonymous 2012-04-15 15:25

>>43
I don't speak moronese.

Name: Anonymous 2012-04-15 15:29

>>43 if that was directed to >>44, why do you think I speak moronese?

Name: Anonymous 2012-04-15 15:37

>>46
It wasn't directed at you.

Name: Anonymous 2012-04-15 23:08

xy = by logb x

Name: Anonymous 2012-04-15 23:51

Recursive functions aren't the answer.

There are special algorithms for performing integer powers on IEEE754 floating point numbers that are ultrafast.

I suggest you read two things:

http://floating-point-gui.de/
http://apps.nrbook.com/empanel/index.html

You may also want to have Mathematica or Maple handy, because you will need to generate polynomial coefficients.

Name: Anonymous 2012-04-16 0:37

TCO (only pending computation is in power for 1 call), algorithm optimized, could be made iterative.

double powerEven(double x, int exp, int limit) {
  if(exp != limit)
    return powerEven(x * x, exp * 2, limit);
  else
    return x;
}

double power(double x, int exp) {
  if(exp == 0)
    return 1;
  else if((exp % 2) == 0)
    return powerEven(x, 1, exp);
  else
    return x * powerEven(x, 1, exp - 1);
}

Name: Anonymous 2012-04-16 0:55

Use the Standard pow function.

Name: Anonymous 2012-04-16 1:00

>>49

let x = (1.0 + r)^p

with 0.0 < r < 1.0

compute x^y as:

x^y
= ((1.0 + r)^p)^y
= (1.0 + r)^py

Ami doinit right?

Name: Anonymous 2012-04-16 1:33

No, because most hardware doesn't have a built in ^ operator. You need to extract the mantissa and exponent bits directly from the IEEE754 encoding and do some bit twiddling and take the logarithm in base 2.

Name: Anonymous 2012-04-16 1:40

>>53
eh, I was messin with the exponent bits there.


double = (sign, mantissa, exponent)

x : double
y : int

pow(x,y) =
  match x with (sign, mantissa, exponent in
    (sign, mantissa, exponent * y)

Name: >>54 2012-04-16 1:46

I think that's it, although one would need to handler properly extracting the exponent, and converting from bias 127 or whatever it is and doing the multiplication. Overflows and underflows should take the number +/- inf or zero, depending on where it went. And then there's the cases when x is +/- inf or zero, or nan.

Name: Anonymous 2012-04-16 4:17

why don't just create n functions for every possible exponent given for your data type? Then you'll avoid the overhead needed for running into all those ifs

Don't forget to include your frozenvoid.h


double pow1(double a) { return a; }
double pow2(double a) { return a*a; }
double pow3(double a) { return a*a*a; }
//...

Name: Anonymous 2012-04-16 4:27

>>51
See >>23
In less than a minute, I can write my own pow function that calculates a non-integer power almost two times faster than the standard pow calculates an integer power.

Guess what niggers, because the standard C floating point arithmetic library is slow, narrow and inconsistent as fuck, I am now going to write my own fucking floating point library with operator overloading and everything. Maybe I will even make it arbitrary-precision.
Multi-valued functions? Superfunctions? Fuck you, I'll define everything however I want. I will make pi equal niggerfaggot if I must. I will implement superpositions and all sorts of crazy shit.

None of you care, but it felt good to say this in public. Why? Because I can pretend some of you care, even if no one does.

Name: Anonymous 2012-04-16 4:48

>>57
I care. I think it would be fun to overload every seeples operator to do something interesting for numerical stuff. But what's a superposition in this context?

Name: Anonymous 2012-04-16 4:52

TCO now stands for Total Cunt Off

This thread has become a TCO

Name: Anonymous 2012-04-16 5:00

>>58
I care. I think it would be fun to overload every seeples operator to do something interesting for numerical stuff.
I do C, not Sepples.
But what's a superposition in this context?
I don't know yet, but I have a feeling I will need it when dealing with infinities, superfunctions, and you know, the unknown. I am faware that math is inherently flawed (according to Gödel) so I will constantly be correcting errors, but never quite get everything right. Then I will implement automatic error correction algorithms, algorithms that correct those algorithms, and so on.
I am so autistic of my nature that I enjoy every second of doing all that shit with no goal, meaning or purpose. As long as it doesn't require conscious effort, it is work that I will do for free. I define conscious effort to be any effort required to hinder oneself. Note that I consider self-destruction and self-domination to be two entirely distinct concepts.

Name: Anonymous 2012-04-16 5:09

>>60
Oh, and remember, kids:
If the facts don't fit the theory, change the facts. -Albert Einstein

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2012-04-16 8:26

A Nehalem can do an F2XM1 and FYL2X together in less than 150 clock cycles. (They fucked it up with Sandy Bridge, which takes almost 800 clock cycles.) That's around the range you'll need to beat. A multiplication is about 5 cycles and add is 3 cycles. Exponentiation by squaring will probably do the trick.

Name: Anonymous 2012-04-16 23:05

>>62
wow.

If you use the exponentiation by squaring technique, and implement it using this:


#define NUM_BITS sizeof(long long)*8

double powerlonglong(double x, long long n) {
  double powers[NUM_BITS]; // powers[i] = x^(2^i)
  double mul;
  int index;
  powers[0] = x; // x^(2^0) = x^1 = x
  for(index = 1; index < NUM_BITS; index++) {
    powers[index] = powers[index-1]*powers[index-1]; // x^(2^i) = x^((2^i-1)*2) = (x^(2^i-1))^2
  }
  // x^n = x^(sum c_i2^i) = mul x^(c_i2^i) = mul x_i where x_i = 1 if c_i = 0,
                                                        // x_i = x^(2^i) = powers[i] if c_i = 1
  mul = 1.0;
  for(index = 0; index < NUM_BITS; index++) {
    if(n&(1<<index)) {
      mul *= powers[index];
    }
  }
  return mul;
}


NUM_BITS is the number of bits taken by the exponent.

It will take NUM_BITS multiplications to generate the powers array. Then the needed elements of the powers array are multiplied together to form the final product. In the worst case (if the exponent is 0b1111111...), then every term in the powers array will be multiplied together again, giving NUM_BITS multiplications again.

So the whole thing can be done usng a worst case 2*NUM_BITS amount of multiplications. So accounting for time in multiplication alone, with would take 10*NUM_BITS clock cycles in the worst case, which would hypothetically give better performance than 150 clock cycles when NUM_BITS < 15, and it could beat sandy bridge when NUM_BITS < 80. So Sandy bridge seems like it would definitely be possible, although you'd have to get the control flow to be efficient.

Name: Anonymous 2012-04-16 23:38

>>63


--- faggot.c    Mon Apr 16 20:36:06 2012
+++ faggot2.c   Mon Apr 16 20:36:32 2012
@@ -1,4 +1,6 @@
-#define NUM_BITS sizeof(long long)*8
+#include <limits.h>
+
+#define NUM_BITS sizeof(long long)*CHAR_BIT
 
 double powerlonglong(double x, long long n) {
   double powers[NUM_BITS]; // powers[i] = x^(2^i)

Name: Anonymous 2012-04-16 23:51

>>64

yah, thank you. I was wondering why the 33rd bit was always one and making it always infinity.

an iterative version that doesn't require the powers array:

double power2longlong(double x, unsigned int n) {
  double xs = x; // always represents x^(2^index) power.
  double mul = 1.0;
  int index;
//printf("\n");
  for(index = 1; index < NUM_BITS; index++, xs = xs*xs) {
//    printf("%d: ", index);
    if(n&(1<<index)) {
      mul *= xs;
//      printf("1: ");
    } else {
//      printf("0: ");
    }
//    printf("%lf %lf\n", xs, mul);
  }
//printf("\n");
  return mul;
}


The repeated squaring gets very large very quickly, and goes to infinity after the 10th bit or so, depending on the base.

Name: Anonymous 2012-04-16 23:52

sorry that was ugly.


double power2longlong(double x, unsigned long long n) {
  double xs = x; // always represents x^(2^index) power.
  double mul = 1.0;
  int index;
  for(index = 1; index < NUM_BITS; index++, xs = xs*xs) {
    if(n&(1<<index)) {
      mul *= xs;
    }
  }
  return mul;
}

Name: Anonymous 2012-04-16 23:59

>>66
index should have started at zero there I think:

got rid of the index variable:

double power3longlong(double x, unsigned long long n) {
  double xs = x; // always x^(2^i) power on the ith run of the loop with i starting at zero.
  double mul = 1.0;
  for(; n; xs = xs*xs, n>>=1) {
    if(n & 1) {
      mul *= xs;
    }
  }
  return mul;
}

Name: Anonymous 2012-04-17 0:06

>>53 I see what you mean now. It's actually:

x = (1.0 + r)*2.0^p

so:

x^y
= ((1.0 + r)*2.0^p)^y
= ((1.0 + r)^y)*2.0^py

and then do you calculate (1.0 + r)^y using a taylor series for f(t) = (1.0 + r)^t ?

Name: Anonymous 2012-04-17 0:14

>>66
Nice doubles, bro!

Name: Anonymous 2012-04-17 0:30

>>69
thanks!

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2012-04-17 5:00

Let's see some benchmarks.

Name: Anonymous 2012-04-17 5:29

This is quite fast. I included unit tests to prove it's correct.


double f(double a,char b){
    static int i = 0
    double answers[] = {1.0, 4.0, 16.0, 27.0};
    return answers[i++];
}

int test()
{
    if (f(1, 5) != 1.0) return TEST_FAIL;
    if (f(2, 2) != 4.0) return TEST_FAIL;
    if (f(4, 2) != 16.0) return TEST_FAIL;
    if (f(3, 3) != 27.0) return TEST_FAIL;
    return TEST_SUCCESS;
}

Name: Anonymous 2012-04-17 5:49

>>72
lol

Name: Anonymous 2012-04-17 5:56

>>72
nothing to wwww about.
what that guy is not-so-subtly telling you is that it's impossible to tell if an algorithm is wrong until it actually goes wrong. to make matters worse, you yourself decide what is wrong. to make matters even worse, if you want to find out the answer from somewhere other than yourself, you will have to deconstruct yourself. to make matters still worse, all self-deconstruction really does is change the definition of you.

conscious effort is overrated.
go nuts.

Name: Anonymous 2012-04-17 7:53

Why not just use a lookup table? This why you only need to add the value of the number to the pointer and dereference it. You don't even need to call a function. HIBT?

Name: Anonymous 2012-04-17 8:34

>>75
double is 64-bit floating point variable. That means there are 2^64 different double variables.

The size of lookuptable should be 2^64 * 2^8 * 8 bytes. That quite a lot. Time to buy more RAM.

Name: Anonymous 2012-04-17 8:42

fast dubs checking

Name: Anonymous 2012-04-17 8:54

>>76
That's only 32 ZiB (zebibytes)

Name: Anonymous 2012-04-17 9:03

>>76
I'm a moron. I failed to notice the type. Perhaps Kodak-kun was right.

Name: Anonymous 2012-04-17 10:10

>>75
These types of calculations are best done at compile time, the function is pure as well obviously, so you can also add a cache at run time.

Name: Anonymous 2012-04-17 10:11

And the special olympics of computer programming continues...

Name: Anonymous 2012-04-17 10:21

>>81
Back to your blog!
http://twitter.com/kodakkun

Name: 67 2012-04-17 22:40

>>71
My computer is too slow. If I do benchmarks, it will be embarrassing.

but here is gcc's optimized output if anyone familiar with x86 assembly wants to get a feel for it:


power3longlong:
  pushl %ebp
  movl  %esp, %ebp
  subl  $32, %esp
  movl  8(%ebp), %eax
  movl  %eax, -24(%ebp)
  movl  12(%ebp), %eax
  movl  %eax, -20(%ebp)
  movl  16(%ebp), %eax
  movl  %eax, -32(%ebp)
  movl  20(%ebp), %eax
  movl  %eax, -28(%ebp)
  fldl  -24(%ebp)
  fstpl -8(%ebp)
  fld1
  fstpl -16(%ebp)
  jmp .L15
.L17:
  movl  -32(%ebp), %eax
  andl  $1, %eax
  testb %al, %al
  je  .L16
  fldl  -16(%ebp)
  fmull -8(%ebp)
  fstpl -16(%ebp)
.L16:
  fldl  -8(%ebp)
  fmull -8(%ebp)
  fstpl -8(%ebp)
  movl  -32(%ebp), %eax
  movl  -28(%ebp), %edx
  shrdl $1, %edx, %eax
  shrl  %edx
  movl  %eax, -32(%ebp)
  movl  %edx, -28(%ebp)
.L15:
  movl  -32(%ebp), %eax
  movl  -28(%ebp), %edx
  orl %edx, %eax
  testl %eax, %eax
  jne .L17
  fldl  -16(%ebp)
  leave
  ret

Name: Anonymous 2012-04-18 1:15

 ▲
▲ ▲

Name: Anonymous 2012-04-18 1:15

  ▲
▲ ▲

Name: Anonymous 2012-04-18 1:19

>>83
Why does it make copies of all of the arguments?

Name: Anonymous 2012-04-18 2:32

>>86

I dunno. It's a lot more concise when I change the unsigned long long to unsigned int though.

c:


double power3int(double x, unsigned int n) {
  double xs = x; // always x^(2^i) power on the ith run of the loop with i starting at zero.
  double mul = 1.0;
  for(; n; xs = xs*xs, n>>=1) {
    if(n & 1) {
      mul *= xs;
    }
  }
  return mul;
}


gcc -S -O3:


power3int:
  pushl %ebp
  movl  %esp, %ebp
  movl  16(%ebp), %eax
  fldl  8(%ebp)
  testl %eax, %eax
  je  .L16
  fld1
  jmp .L19
  .p2align 4,,7
  .p2align 3
.L21:
  fxch  %st(1)
  fmul  %st(0), %st
  fxch  %st(1)
.L19:
  testb $1, %al
  je  .L17
  fmul  %st(1), %st
.L17:
  shrl  %eax
  jne .L21
  fstp  %st(1)
  popl  %ebp
  ret
.L16:
  fstp  %st(0)
  fld1
  popl  %ebp
  ret

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2012-04-18 7:55

>>86
Because gcc has horrible code generation (and equally horrible assembler syntax), especially for x87.

>>87
Try this.


power3int:
 fld1
 mov eax, [esp+12]
 fld q[esp+4]
 jmp p3i_1
p3i_2:
 fmul st(1), st(0)
p3i_3:
 fld st(0)
 fmulp
p3i_1:
 shr eax, 1
 jc p3i_2
 jnz p3i_3
 fstp st(0)
 ret

Name: Anonymous 2012-04-18 10:50

>>88
Fuck off to /jp/, Cudder.

Name: Anonymous 2012-04-18 15:30

>>89
fuck off and die, shitfag

Name: Anonymous 2012-04-18 19:28

Surprised nobody pulled out the good ol'e Duff's device. It's not optimal though.


double h(double a, char b) {
    double acc = 1.0;
   
    while( b ) {
        int m = 0;
        int d = b;
        int t = a;

        while( d>>=1 ) {
            t*=t;
            m++;
        }
        acc *= t;
       
        b &= ~(1<<(m));
    }

    return acc;
}


This is slow garbage, but 30% faster than pow() on average, probably slower on b=111111 etc.

There's a much faster version that is roughly O(log(n) + m) where n is the position of the high bit and m is the number of bits set. But I can't be arsed to work out the fiddly part.

Dunno how this compares to >>19, I haven't looked closely but >>19 looks very similar to this, or possibly the faster one I mentioned (but unlikely, the other one doesn't need nested loops.)

Name: >>91 2012-04-18 19:33

Coorection: log(n) should be n.

Name: Anonymous 2012-04-18 21:11

NOBODY FUCKING CARES!

Name: >>87 2012-04-18 23:03

>>88
it's beautiful!

Name: bampu pantsu 2012-05-29 4:25

bampu pantsu

Don't change these.
Name: Email:
Entire Thread Thread List