>>6
It isn't the recursion that is slowing it down (in theory). An optimizing compiler can convert any program routine using tail recursion to iteration. Tail calls to other functions could be tricky, you would need to get into the details of the calling conventions used on the target platform. But with tail recursion alone, you don't need to get there. Transforming the routine syntactically is sufficient:
int collatzLen(int c, long long n) {
return (n==1 ? c : collatzLen(c+1, (n%2==0 ? n/2 : 3*n+1)));
}
-->>>
int collatzLen(int c, long long n) {
if(n==1) {
return c;
} else {
if(n%2==0) {
return collatzLen(c+1, n/2);
} else {
return collatzLen(c+1, 3*n+1);
}
}
}
-->>>
int collatzLen(int c, long long n) {
start:
if(n==1) {
return c;
} else {
if(n%2==0) {
int t1 = c+1;
long long t2 = n/2;
c = t1;
n = t2;
goto start;
} else {
int t1 = c+1;
long long t2 = 3*n+1;
c = t1;
n = t2;
goto start;
}
}
}
And you can factor out the common subexpression, with t1 and c+1:
int collatzLen(int c, long long n) {
start:
if(n==1) {
return c;
} else {
int t1 = c+1;
c = t1;
if(n%2==0) {
long long t2 = n/2;
n = t2;
goto start;
} else {
long long t2 = 3*n+1;
n = t2;
goto start;
}
}
}
and because the new values of c and n are calculated independantly, both variables can be changed inplace. Temporay values would be needed if it was something like c = n + c; n = 5*c + 9*n;
int collatzLen(int c, long long n) {
start:
if(n==1) {
return c;
} else {
c = c+1;
if(n%2==0) {
n = n/2;
goto start;
} else {
n = 3*n+1;
goto start;
}
}
}
And then the compiler could do the bit wise operations:
int collatzLen(int c, long long n) {
start:
if(n==1) {
return c;
} else {
c = c+1;
if(n&1) {
n = 3*n+1;
goto start;
} else {
n <<= 1;
goto start;
}
}
}
and 3*n+1 = (1+2)*n+1 = n + 2*n + 1 = n + (n<<1) + 1 = and_with_carry(n, (n<<1)). A compentent compiler should be able to detect this. This optimization is always possible when you multiply by a constant.
Although, one thing in the naive version that will slow it down in comparison to the optimized version is that it calls the collatzLen function from main inside a loop. That will take more time. Because the function is called only once, there is no danger of inlining the function causing duplicate code everywhere to clog the instruction cache. So this could be inlined. I don't know if compilers auto inline though. If the compiler was unaware of how post linking was going to be done, and if the colletzLen function was not static, a non inlined version of the function would need to remain. That wouldn't stop the compiler from inlining when it was called in this file though.