Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Fast positive integer power function

Name: Anonymous 2012-04-15 6:00

I'm trying to figure out how to compute an integer power of a double efficiently with as little overhead as possible (trying to avoid loops and recursion to minimize the amount of calls and jumps). Calculation speed is the priority.
So far I got this trivial piece of code:

double f(double a,char b){
if(b==2)return a*a;
if(b==3)return a*a*a;
if(b==4)return a*a*a*a;
if(b==5)return a*a*a*a*a;
if(b==6)return a*a*a*a*a*a;
if(b==7)return a*a*a*a*a*a*a;
if(b==8)return a*a*a*a*a*a*a*a;
if(b==9)return a*a*a*a*a*a*a*a*a;
        return a*a*a*a*a*a*a*a*a*a;
}


...which of course assumes the exponent is at most 10. The code can easily be expanded if higher exponents are needed.
I'm thinking this is probably optimal, but I know you /prog/riders have some magic tricks up your sleeve, so hit me with them.

Name: Anonymous 2012-04-17 10:11

And the special olympics of computer programming continues...

Name: Anonymous 2012-04-17 10:21

>>81
Back to your blog!
http://twitter.com/kodakkun

Name: 67 2012-04-17 22:40

>>71
My computer is too slow. If I do benchmarks, it will be embarrassing.

but here is gcc's optimized output if anyone familiar with x86 assembly wants to get a feel for it:


power3longlong:
  pushl %ebp
  movl  %esp, %ebp
  subl  $32, %esp
  movl  8(%ebp), %eax
  movl  %eax, -24(%ebp)
  movl  12(%ebp), %eax
  movl  %eax, -20(%ebp)
  movl  16(%ebp), %eax
  movl  %eax, -32(%ebp)
  movl  20(%ebp), %eax
  movl  %eax, -28(%ebp)
  fldl  -24(%ebp)
  fstpl -8(%ebp)
  fld1
  fstpl -16(%ebp)
  jmp .L15
.L17:
  movl  -32(%ebp), %eax
  andl  $1, %eax
  testb %al, %al
  je  .L16
  fldl  -16(%ebp)
  fmull -8(%ebp)
  fstpl -16(%ebp)
.L16:
  fldl  -8(%ebp)
  fmull -8(%ebp)
  fstpl -8(%ebp)
  movl  -32(%ebp), %eax
  movl  -28(%ebp), %edx
  shrdl $1, %edx, %eax
  shrl  %edx
  movl  %eax, -32(%ebp)
  movl  %edx, -28(%ebp)
.L15:
  movl  -32(%ebp), %eax
  movl  -28(%ebp), %edx
  orl %edx, %eax
  testl %eax, %eax
  jne .L17
  fldl  -16(%ebp)
  leave
  ret

Name: Anonymous 2012-04-18 1:15

 ▲
▲ ▲

Name: Anonymous 2012-04-18 1:15

  ▲
▲ ▲

Name: Anonymous 2012-04-18 1:19

>>83
Why does it make copies of all of the arguments?

Name: Anonymous 2012-04-18 2:32

>>86

I dunno. It's a lot more concise when I change the unsigned long long to unsigned int though.

c:


double power3int(double x, unsigned int n) {
  double xs = x; // always x^(2^i) power on the ith run of the loop with i starting at zero.
  double mul = 1.0;
  for(; n; xs = xs*xs, n>>=1) {
    if(n & 1) {
      mul *= xs;
    }
  }
  return mul;
}


gcc -S -O3:


power3int:
  pushl %ebp
  movl  %esp, %ebp
  movl  16(%ebp), %eax
  fldl  8(%ebp)
  testl %eax, %eax
  je  .L16
  fld1
  jmp .L19
  .p2align 4,,7
  .p2align 3
.L21:
  fxch  %st(1)
  fmul  %st(0), %st
  fxch  %st(1)
.L19:
  testb $1, %al
  je  .L17
  fmul  %st(1), %st
.L17:
  shrl  %eax
  jne .L21
  fstp  %st(1)
  popl  %ebp
  ret
.L16:
  fstp  %st(0)
  fld1
  popl  %ebp
  ret

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2012-04-18 7:55

>>86
Because gcc has horrible code generation (and equally horrible assembler syntax), especially for x87.

>>87
Try this.


power3int:
 fld1
 mov eax, [esp+12]
 fld q[esp+4]
 jmp p3i_1
p3i_2:
 fmul st(1), st(0)
p3i_3:
 fld st(0)
 fmulp
p3i_1:
 shr eax, 1
 jc p3i_2
 jnz p3i_3
 fstp st(0)
 ret

Name: Anonymous 2012-04-18 10:50

>>88
Fuck off to /jp/, Cudder.

Name: Anonymous 2012-04-18 15:30

>>89
fuck off and die, shitfag

Name: Anonymous 2012-04-18 19:28

Surprised nobody pulled out the good ol'e Duff's device. It's not optimal though.


double h(double a, char b) {
    double acc = 1.0;
   
    while( b ) {
        int m = 0;
        int d = b;
        int t = a;

        while( d>>=1 ) {
            t*=t;
            m++;
        }
        acc *= t;
       
        b &= ~(1<<(m));
    }

    return acc;
}


This is slow garbage, but 30% faster than pow() on average, probably slower on b=111111 etc.

There's a much faster version that is roughly O(log(n) + m) where n is the position of the high bit and m is the number of bits set. But I can't be arsed to work out the fiddly part.

Dunno how this compares to >>19, I haven't looked closely but >>19 looks very similar to this, or possibly the faster one I mentioned (but unlikely, the other one doesn't need nested loops.)

Name: >>91 2012-04-18 19:33

Coorection: log(n) should be n.

Name: Anonymous 2012-04-18 21:11

NOBODY FUCKING CARES!

Name: >>87 2012-04-18 23:03

>>88
it's beautiful!

Name: bampu pantsu 2012-05-29 4:25

bampu pantsu

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List