Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Per-function RLE return stacks

Name: Anonymous 2012-02-28 16:56

Notation: a { b c }
calling a
entering a
calling b
calling c
leaving a

Example: a { b { c1{d{e}} c2{d{e@}} } c { d { c { d { c { f i{a} i{a@} b{c1} } f } f } f } f } c {df} }

Stack at first @: (e_locals)(d)(d_locals)(c2)(c2_locals)(b)(b2_locals)(a)(a_locals)

Stack at second @: top(a_locals)(i)(i_locals)(c)(c_locals)(d)(d_locals)(c)(c_locals)(d)(d_locals)(c)(c_locals)(a)(a_locals)

Problems: Stack grows even when local variables take up 0 bytes because return addresses take up space too.

Proposed solution: One returnstack with RLE for each function.

At first @:
b: (a)x1
c2: (b)x1
d: (c2)x1
e: (d)x1

At second @:
a: (i)x1
c: (d)x2 (a)x1
d: (c)x2
i: (c)x1

Result: Tailcall that's not at the tail.

Variation: The per-function stack can be fixed-sized. In that case old entries that fall off the per-function stack go into the main-stack.

Discuss.

Name: Anonymous 2012-02-29 8:51

OP Here.

I just realized that the per-function stack only needs to store a single entry (address+counter). (I'll call it PFSES from now on.)

Example:
Trace: b{a{a{a{a{a{a{a}}}}}}}
Stack: (7)(6)(5)(4)(3)(a)(2)(b)(1) # the normal one

When the inner-most function a(7) returns the return address isn't stored in the frames of the calls 7-3 because it doesn't change (it's always 'a'). Instead of scanning the stack, the PFSES is read. It reads "a" in this case. The counter decremented.

When the execution reaches the outer-most a(2) the counter is at 0 and a new return-address is loaded from the normal stack into the PFSES and then used.

The PFSES' have fixed addresses. No lookups. 64 functions can store their return addresses in one cache line. (Assuming 64bit / 512b cacheline) The counters needs additional memory, so maybe 51 entries? I'd call that pretty cache-friendly. Also, all of this reduces the load onto the normal stack.

>>10
I agree. I'd like linked-list-stack-frames as optional compiler feature though. (For co-routines and JIT, etc.)

Do we have a compiler framework to implement stuff like that?

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List