Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-4041-

Fast Square Root Challenge

Name: Anonymous 2010-07-05 16:04

Design a fast square root function which computes double floating point numbers faster than SQRTSD(SSE2). Bonus for precision.

Name: Anonymous 2010-07-05 16:07

In before:
* people implementing the SICP algorithm in toy languages
* people implementing the SICP algorithm in C#
* Xarn winning the challenge

Name: Anonymous 2010-07-05 16:08

What a hideous challenge.

Name: Anonymous 2010-07-05 16:55

inline double zoom_zoom_sqrt(int _)
{
    return 0.0;
}


I had to sacrifice some precision for speed, but I'll gladly forfeit my bonus points.

Name: Anonymous 2010-07-05 17:09

>>4
#define sqrt_float(a) (1.f)
My version is more accurate due to round-off errors in values closer to 1.

Name: Anonymous 2010-07-05 18:16

1.f
Pig disgusting.

Name: Anonymous 2010-07-05 20:17

Name: Anonymous 2010-07-11 1:29

static inline uintmax_t isqrt(uintmax_t n){
 uintmax_t r = 0;
 // fixed this so it should work even if uintmax_t is an odd number of bits!
 // ... i've never seen an implementation where that's the case, but i realized
 // that the code i had here before would break if someone was perverse enough
 // to do that, so i fixed it.
 for(uintmax_t i = ~(UINTMAX_MAX >> 2) & UINTMAX_MAX / 3; i; i >>= 2)
  if(n >= (i | r)){
   n -= i | r;
   r = r >> 1 | i;
  } else r >>= 1;
 return r;
}

// 6 digits after the decimal point for floating-point-using idiots:
#define SQRT(n) (isqrt((uintmax_t)(n) * 1000000000000ULL /* 100⁶ */) / 1000000.0L /* 10⁶ */)


it's a lot faster than SQRTSD on my machine, because SQRTSD requires at least as much time as it would take to buy and install a new CPU (and probably a new motherboard as well).

Name: Anonymous 2010-07-11 1:33

double sqrts[18446744073709551616] = {
 /* the contents of this array are left as an exercise for the reader */
double fastsqrt(double f) {
 int64_t *idx = (int64_t*)&f;
 return sqrts[*idx];
}

Name: Anonymous 2010-07-11 2:04

>>8
static inline uintmax_t isqrt(uintmax_t) __attribute__ ((const, nothrow, optimize(2)));

static inline uintmax_t isqrt(uintmax_t n)
{ static const uintmax_t start = ~(UINTMAX_MAX >> 2) & UINTMAX_MAX / 3;
  uintmax_t r = 0;
  for(uintmax_t t, i = start; i; i >>= 2)
  { t = i | r;
    r >>= 1;
    if(t > n)
    { n -= t;
      r |= i; }}
  return r; }

Name: Anonymous 2010-07-11 2:25


static inline uintmax_t isqrt(uintmax_t) __attribute__ ((const, nothrow, optimize(2)));

static inline uintmax_t isqrt(uintmax_t n)
{ static const uintmax_t start = ~(UINTMAX_MAX >> 2) & UINTMAX_MAX / 3;
  uintmax_t r = 0;
  for(uintmax_t t, i = start; i; i >>= 2)
  { t = i | r;
    r >>= 1;
    if(t > n)
    { n -= t;
      r |= i; }}
  return r; }

Name: Anonymous 2010-07-11 2:26



static inline uintmax_t isqrt(uintmax_t) __attribute__ ((const, nothrow, optimize(2)));

static inline uintmax_t isqrt(uintmax_t n)
{ static const uintmax_t start = ~(UINTMAX_MAX >> 2) & UINTMAX_MAX / 3;
  uintmax_t r = 0;
  for(uintmax_t t, i = start; i; i >>= 2)
  { t = i | r;
    r >>= 1;
    if(t > n)
    { n -= t;
      r |= i; }}
  return r; }

Name: Anonymous 2010-07-11 3:21

>>9
$ bc -l <<EOF
18446744073709551616 * 4/1024^5
EOF
65536.00000000000000000000

That's a big LUT.

Name: Anonymous 2010-07-11 3:53

Name: Anonymous 2010-07-11 4:26

>>14
>Patented variation:
>r = (v ^ mask) - mask;
Americans patent basic logical operations?

Name: Anonymous 2010-07-11 5:09

Somebody had better post invsqrt from Quake.

Name: Anonymous 2010-07-11 5:14

>>16

float Q_rsqrt( float number )
{
  long i;
  float x2, y;
  const float threehalfs = 1.5F;

  x2 = number * 0.5F;
  y  = number;
  i  = * ( long * ) &y;  // evil floating point bit level hacking
  i  = 0x5f3759df - ( i >> 1 ); // what the fuck?
  y  = * ( float * ) &i;
  y  = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
  // y  = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed

  #ifndef Q3_VM
  #ifdef __linux__
    assert( !isnan(y) ); // bk010122 - FPE?
  #endif
  #endif
  return y;
}

Name: Anonymous 2010-07-11 6:40

>>15
Unfortunately, this method has been patented in the USA on June 6, 2000 by Vladimir Yu Volkonsky and assigned to Sun Microsystems. On August 13, 2006, Yuriy Kaminskiy told me that the patent is likely invalid because the method was published well before the patent was even filed, such as in How to Optimize for the Pentium Processor by Agner Fog, dated November, 9, 1996. Yuriy also mentioned that this document was translated to Russian in 1997, which Vladimir could have read. Moreover, the Internet Archive also has an old link to it.

Name: Anonymous 2010-07-11 7:43

double sqrt(double n) { return n; }

This returns the square root squared, so you'll have to unsquare it first.

Name: Anonymous 2010-07-11 7:59

bool sqrt(double n, double r){ return n * n == r; }

Name: Anonymous 2010-07-11 8:08

>>18
do you have a few spare millions to prove them wrong?

Name: Anonymous 2010-07-11 8:12



(define (SQRT) f=X/y, f=y )

Name: Anonymous 2010-07-11 8:29

>>21
no court anywhere in the world is going to recognize that patent as valid. no millions are needed.

Name: Anonymous 2010-07-11 9:33

>>23
I lol'd

Name: Anonymous 2010-07-11 9:48

>>24
Back to Reddit, please. Outside of your mom's basement, legitimate patent challenges almost always result in the patent being revoked.

Name: Anonymous 2010-07-11 9:56

>>25
In your liberal version of reality perhaps.

Name: Anonymous 2010-07-11 10:47

>>23
The USA are part of the world.
Ergo your wrong bitch.

Name: Anonymous 2010-07-11 21:50

>>26
Reality has a well-known liberal bias.

Name: Anonymous 2010-07-12 1:58

>>26-28
Re-read the very first phrase of >>25 over and over until you understand its meaning and apply it to your life. We all thank you in advance!

Name: !!M8okqD4wRfs+xmp 2011-10-25 19:40

HAX MY ANUS

Name: !!fZtpcBbz48X1gRQ 2011-10-25 19:40

HAX MY ANUS

Name: !!jG4z7FpXgN+jbLs 2011-10-25 19:40

HAX MY ANUS

Name: !!If5TCDSbQb9BwVR 2011-10-25 19:40

HAX MY ANUS

Name: !!mJxde8AUB5oQyXO 2011-10-25 19:41

HAX MY ANUS

Name: !!TEWTXyX6KE3RIsy 2011-10-25 19:41

HAX MY ANUS

Name: !!JNG31hJ79YbC1fi 2011-10-25 19:41

HAX MY ANUS

Name: !!rGxutZjWR7LZWNO 2011-10-25 19:41

HAX MY ANUS

Name: !!QSvwrQMMkoH22b4 2011-10-25 19:41

HAX MY ANUS

Name: Anonymous 2013-07-30 22:07

HAX MY ANUS

Name: Anonymous 2013-07-30 22:54


function result = sqrtX(n)

solval = 0;

tartval = 1;

check = n - (tartval*tartval);

while(abs(check) > 0.01)

  solval = check / (tartval * 2);

  tartval = tartval + solval;

  check = n - (tartval*tartval);

  endwhile;

result = tartval;

endfunction

Name: Anonymous 2013-07-30 23:52

'sqrt' is pronounced 'squirt'.

Name: Anonymous 2013-07-30 23:54

'wasp' is pronounced 'waseepeenoo'.

Name: Anonymous 2013-07-31 0:26

did skippy do it better?

Name: Anonymous 2013-07-31 1:08

OMG

tartval = n;

Name: Anonymous 2013-07-31 3:57

...best square root function ever even? xD

Name: Anonymous 2013-07-31 4:07

(a+b)^3 next?

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-07-31 4:24


          SQRTSD   FSQRT
AMD K8    24-27    12-27
AMD K10   24-27    35
Bulldozer 24-26    10-53
P4        38       43
P4E       40       45
Pentium M 5-58     8-9(!)
Bobcat    24       31
Merom     6-58     6-69
Wolfdale  6-20     6-20
Nehalem   7-32     27
Sandy     10-21    10-24
Haswell   9-18     8-23

Atom      60       71
Clownlake 53       50


Some models use a fixed-latency algorithm, some use a variable one. You are NOT going to beat the latter with anything. The trend is to go variable for the performance line and fixed for the mobile/lower-power. Clearly CISC superiority.

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-07-31 4:32

And FYI everyone doing the >>=2 thing: first of all this isn't a floating point sqrt, and secondly executing just that one statement alone 32 times is going to take at least 32 cycles. Now factor in the cost of doing the test + branch and everything inside the loop body... can't beat microcode.

Name: Anonymous 2013-07-31 5:46

http://en.wikipedia.org/wiki/List_of_Intel_codenames
Intel has historically named integrated circuit (IC) development projects after geographical names of towns, rivers or mountains near the location of the Intel facility responsible for the IC. Many of these are in the American West, particularly in Oregon (where most of Intel's CPU projects are designed; see famous codenames). As Intel's development activities have expanded, this nomenclature has expanded to Israel and India...
Wellsburg     Chipset     PCH for two- and four-socket servers based on the Grantley-EP platform. Successor to Patsburg.     Reference unknown.     2012

Name: Anonymous 2013-07-31 5:59

Gesher     CPU architecture     A processor microarchitecture, the successor to Nehalem. Renamed to Sandy Bridge after it was discovered that Gesher is also the name of a political party in Israel.[19]     The Hebrew word for 'bridge'.     2011

Name: Anonymous 2013-07-31 6:30

Gesher (political party) - Wikipedia, the free encyclopedia
en.wikipedia.org/wiki/Gesher_(political_party)‎
Gesher (Hebrew: גֶּשֶׁר, lit. Bridge), officially Gesher - National Social Movement (Hebrew: גשר - תנועה חברתית לאומית‎, Gesher - Teno'a Hevratit Le'umit) was a ...?

Name: Anonymous 2013-07-31 8:38



function result = cubertX(n)

iter=0;

pdata = [];

tartval = n/8.0;

solval = 0;

check = n - (tartval*tartval*tartval)

while(abs(check) > 1.0e-8)

  iter++;

  pdata(iter) = tartval;

  solval = check / (tartval * tartval * 3);

  tartval = tartval + solval;

  check = n - (tartval*tartval*tartval);

  if(iter>30)

    break;

    endif;

  endwhile;

result = tartval;

iter

plot(pdata)

endfunction

Don't change these.
Name: Email:
Entire Thread Thread List