/prog/ - Fast positive integer power function

Name: Anonymous 2012-04-15 6:00

I'm trying to figure out how to compute an integer power of a double efficiently with as little overhead as possible (trying to avoid loops and recursion to minimize the amount of calls and jumps). Calculation speed is the priority.
So far I got this trivial piece of code:

double f(double a,char b){

if(b==2)return a*a;

if(b==3)return a*a*a;

if(b==4)return a*a*a*a;

if(b==5)return a*a*a*a*a;

if(b==6)return a*a*a*a*a*a;

if(b==7)return a*a*a*a*a*a*a;

if(b==8)return a*a*a*a*a*a*a*a;

if(b==9)return a*a*a*a*a*a*a*a*a;

        return a*a*a*a*a*a*a*a*a*a;

}

...which of course assumes the exponent is at most 10. The code can easily be expanded if higher exponents are needed.
I'm thinking this is probably optimal, but I know you /prog/riders have some magic tricks up your sleeve, so hit me with them.

Name: Anonymous 2012-04-17 10:11

And the special olympics of computer programming continues...

Name: Anonymous 2012-04-17 10:21

>>81
Back to your blog!
http://twitter.com/kodakkun

Name: 67 2012-04-17 22:40

>>71
My computer is too slow. If I do benchmarks, it will be embarrassing.

but here is gcc's optimized output if anyone familiar with x86 assembly wants to get a feel for it:



power3longlong:

  pushl %ebp

  movl  %esp, %ebp

  subl  $32, %esp

  movl  8(%ebp), %eax

  movl  %eax, -24(%ebp)

  movl  12(%ebp), %eax

  movl  %eax, -20(%ebp)

  movl  16(%ebp), %eax

  movl  %eax, -32(%ebp)

  movl  20(%ebp), %eax

  movl  %eax, -28(%ebp)

  fldl  -24(%ebp)

  fstpl -8(%ebp)

  fld1

  fstpl -16(%ebp)

  jmp .L15

.L17:

  movl  -32(%ebp), %eax

  andl  $1, %eax

  testb %al, %al

  je  .L16

  fldl  -16(%ebp)

  fmull -8(%ebp)

  fstpl -16(%ebp)

.L16:

  fldl  -8(%ebp)

  fmull -8(%ebp)

  fstpl -8(%ebp)

  movl  -32(%ebp), %eax

  movl  -28(%ebp), %edx

  shrdl $1, %edx, %eax

  shrl  %edx

  movl  %eax, -32(%ebp)

  movl  %edx, -28(%ebp)

.L15:

  movl  -32(%ebp), %eax

  movl  -28(%ebp), %edx

  orl %edx, %eax

  testl %eax, %eax

  jne .L17

  fldl  -16(%ebp)

  leave

  ret

Name: Anonymous 2012-04-18 1:15

▲
▲ ▲

Name: Anonymous 2012-04-18 1:15

▲
▲ ▲

Name: Anonymous 2012-04-18 1:19

>>83
Why does it make copies of all of the arguments?

Name: Anonymous 2012-04-18 2:32

>>86

I dunno. It's a lot more concise when I change the unsigned long long to unsigned int though.

c:



double power3int(double x, unsigned int n) {

  double xs = x; // always x^(2^i) power on the ith run of the loop with i starting at zero.

  double mul = 1.0;

  for(; n; xs = xs*xs, n>>=1) {

    if(n & 1) {

      mul *= xs;

    }

  }

  return mul;

}

gcc -S -O3:



power3int:

  pushl %ebp

  movl  %esp, %ebp

  movl  16(%ebp), %eax

  fldl  8(%ebp)

  testl %eax, %eax

  je  .L16

  fld1

  jmp .L19

  .p2align 4,,7

  .p2align 3

.L21:

  fxch  %st(1)

  fmul  %st(0), %st

  fxch  %st(1)

.L19:

  testb $1, %al

  je  .L17

  fmul  %st(1), %st

.L17:

  shrl  %eax

  jne .L21

  fstp  %st(1)

  popl  %ebp

  ret

.L16:

  fstp  %st(0)

  fld1

  popl  %ebp

  ret

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2012-04-18 7:55

>>86
Because gcc has horrible code generation (and equally horrible assembler syntax), especially for x87.

>>87
Try this.



power3int:

 fld1

 mov eax, [esp+12]

 fld q[esp+4]

 jmp p3i_1

p3i_2:

 fmul st(1), st(0)

p3i_3:

 fld st(0)

 fmulp

p3i_1:

 shr eax, 1

 jc p3i_2

 jnz p3i_3

 fstp st(0)

 ret

Name: Anonymous 2012-04-18 10:50

>>88
Fuck off to /jp/, Cudder.

Name: Anonymous 2012-04-18 15:30

>>89
fuck off and die, shitfag

Name: Anonymous 2012-04-18 19:28

Surprised nobody pulled out the good ol'e Duff's device. It's not optimal though.



double h(double a, char b) {

    double acc = 1.0;

    

    while( b ) {

        int m = 0;

        int d = b;

        int t = a;



        while( d>>=1 ) {

            t*=t;

            m++;

        }

        acc *= t;

        

        b &= ~(1<<(m));

    }



    return acc;

}

This is slow garbage, but 30% faster than pow() on average, probably slower on b=111111 etc.

There's a much faster version that is roughly O(log(n) + m) where n is the position of the high bit and m is the number of bits set. But I can't be arsed to work out the fiddly part.

Dunno how this compares to >>19, I haven't looked closely but >>19 looks very similar to this, or possibly the faster one I mentioned (but unlikely, the other one doesn't need nested loops.)

Name: >>91 2012-04-18 19:33

Coorection: log(n) should be n.

Name: Anonymous 2012-04-18 21:11

NOBODY FUCKING CARES!

Name: >>87 2012-04-18 23:03

>>88
it's beautiful!

Name: bampu pantsu 2012-05-29 4:25

bampu pantsu

Fast positive integer power function

1 Name: Anonymous 2012-04-15 6:00

81 Name: Anonymous 2012-04-17 10:11

82 Name: Anonymous 2012-04-17 10:21

83 Name: 67 2012-04-17 22:40

84 Name: Anonymous 2012-04-18 1:15

85 Name: Anonymous 2012-04-18 1:15

86 Name: Anonymous 2012-04-18 1:19

87 Name: Anonymous 2012-04-18 2:32

88 Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2012-04-18 7:55

89 Name: Anonymous 2012-04-18 10:50

90 Name: Anonymous 2012-04-18 15:30

91 Name: Anonymous 2012-04-18 19:28

92 Name: >>91 2012-04-18 19:33

93 Name: Anonymous 2012-04-18 21:11

94 Name: >>87 2012-04-18 23:03

95 Name: bampu pantsu 2012-05-29 4:25

Name: Anonymous 2012-04-15 6:00

Name: Anonymous 2012-04-17 10:11

Name: Anonymous 2012-04-17 10:21

Name: 67 2012-04-17 22:40

Name: Anonymous 2012-04-18 1:15

Name: Anonymous 2012-04-18 1:15

Name: Anonymous 2012-04-18 1:19

Name: Anonymous 2012-04-18 2:32

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2012-04-18 7:55

Name: Anonymous 2012-04-18 10:50

Name: Anonymous 2012-04-18 15:30

Name: Anonymous 2012-04-18 19:28

Name: >>91 2012-04-18 19:33

Name: Anonymous 2012-04-18 21:11

Name: >>87 2012-04-18 23:03

Name: bampu pantsu 2012-05-29 4:25