static inline uintmax_t isqrt(uintmax_t n){
uintmax_t r = 0;
// fixed this so it should work even if uintmax_t is an odd number of bits!
// ... i've never seen an implementation where that's the case, but i realized
// that the code i had here before would break if someone was perverse enough
// to do that, so i fixed it.
for(uintmax_t i = ~(UINTMAX_MAX >> 2) & UINTMAX_MAX / 3; i; i >>= 2)
if(n >= (i | r)){
n -= i | r;
r = r >> 1 | i;
} else r >>= 1;
return r;
}
// 6 digits after the decimal point for floating-point-using idiots:
#define SQRT(n) (isqrt((uintmax_t)(n) * 1000000000000ULL /* 100⁶ */) / 1000000.0L /* 10⁶ */)
it's a lot faster than SQRTSD on my machine, because SQRTSD requires at least as much time as it would take to buy and install a new CPU (and probably a new motherboard as well).
Name:
Anonymous2010-07-11 1:33
double sqrts[18446744073709551616] = {
/* the contents of this array are left as an exercise for the reader */
double fastsqrt(double f) {
int64_t *idx = (int64_t*)&f;
return sqrts[*idx];
}
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
>>15 Unfortunately, this method has been patented in the USA on June 6, 2000 by Vladimir Yu Volkonsky and assigned to Sun Microsystems. On August 13, 2006, Yuriy Kaminskiy told me that the patent is likely invalid because the method was published well before the patent was even filed, such as in How to Optimize for the Pentium Processor by Agner Fog, dated November, 9, 1996. Yuriy also mentioned that this document was translated to Russian in 1997, which Vladimir could have read. Moreover, the Internet Archive also has an old link to it.
Name:
Anonymous2010-07-11 7:43
double sqrt(double n) { return n; }
This returns the square root squared, so you'll have to unsquare it first.
Some models use a fixed-latency algorithm, some use a variable one. You are NOT going to beat the latter with anything. The trend is to go variable for the performance line and fixed for the mobile/lower-power. Clearly CISC superiority.
And FYI everyone doing the >>=2 thing: first of all this isn't a floating point sqrt, and secondly executing just that one statement alone 32 times is going to take at least 32 cycles. Now factor in the cost of doing the test + branch and everything inside the loop body... can't beat microcode.
http://en.wikipedia.org/wiki/List_of_Intel_codenames Intel has historically named integrated circuit (IC) development projects after geographical names of towns, rivers or mountains near the location of the Intel facility responsible for the IC. Many of these are in the American West, particularly in Oregon (where most of Intel's CPU projects are designed; see famous codenames). As Intel's development activities have expanded, this nomenclature has expanded to Israel and India... Wellsburg Chipset PCH for two- and four-socket servers based on the Grantley-EP platform. Successor to Patsburg. Reference unknown. 2012
Gesher CPU architecture A processor microarchitecture, the successor to Nehalem. Renamed to Sandy Bridge after it was discovered that Gesher is also the name of a political party in Israel.[19] The Hebrew word for 'bridge'. 2011
Gesher (political party) - Wikipedia, the free encyclopedia
en.wikipedia.org/wiki/Gesher_(political_party)
Gesher (Hebrew: גֶּשֶׁר, lit. Bridge), officially Gesher - National Social Movement (Hebrew: גשר - תנועה חברתית לאומית, Gesher - Teno'a Hevratit Le'umit) was a ...?