/prog/ - Per-function RLE return stacks

Name: Anonymous 2012-02-28 16:56

Notation: a { b c }
calling a
entering a
calling b
calling c
leaving a

Example: a { b { c1{d{e}} c2{d{e@}} } c { d { c { d { c { f i{a} i{a@} b{c1} } f } f } f } f } c {df} }

Stack at first @: (e_locals)(d)(d_locals)(c2)(c2_locals)(b)(b2_locals)(a)(a_locals)

Stack at second @: top(a_locals)(i)(i_locals)(c)(c_locals)(d)(d_locals)(c)(c_locals)(d)(d_locals)(c)(c_locals)(a)(a_locals)

Problems: Stack grows even when local variables take up 0 bytes because return addresses take up space too.

Proposed solution: One returnstack with RLE for each function.

At first @:
b: (a)x1
c2: (b)x1
d: (c2)x1
e: (d)x1

At second @:
a: (i)x1
c: (d)x2 (a)x1
d: (c)x2
i: (c)x1

Result: Tailcall that's not at the tail.

Variation: The per-function stack can be fixed-sized. In that case old entries that fall off the per-function stack go into the main-stack.

Discuss.

Name: Anonymous 2012-02-28 19:03

Locals and return addresses together considered harmful.



Local linked stack:

{a's locals}{b's locals}{c's locals}

Return stack:

{a's return address}{b's return address}{c's return address}

The contiguous parts of the local area grow upwards instead of downwards so a buffer overflow will only affect the function's own locals before it runs into unmapped memory and causes an exception. You can use a canary value to detect buffer overflows without destroying useful data. The local stack can be a linked list so "stack overflow" only happens when the heap cannot allocate any more memory. Arguments are a sub-block of the locals. The return stack does not contain pointers into any part of itself so it can be relocated anywhere in memory or split into multiple linked blocks. Only call, return, unwind (pop n entries and return), and tail call work on the return stack. This is how it works on z/Architecture, except registers are also stored on the return stack and the stack is in a separate protected address space that can only be modified by linkage instructions with a stack pointer that can only be accessed by those instructions or privileged code.

Name: Anonymous 2012-02-28 22:13

>>2
Linked list stack
Enjoy your slow stack! Mmm mmm cache misses

Name: Anonymous 2012-02-29 0:37

>>3
Not really.

Name: Anonymous 2012-02-29 1:27

>>3
Enjoy your unsafe stack! Mmm mmm stack smashing

Name: Anonymous 2012-02-29 3:50

call/cock

Name: Anonymous 2012-02-29 4:14

small/cock

Name: Anonymous 2012-02-29 4:44

>>7
for you, maybe.

Name: Anonymous 2012-02-29 5:38

>>4
If the nodes were all contiguous then we're back with the problem >>2 wanted to solve by spreading the stack items out to unknown addresses. Now we just have a linked list with every node contiguous, and things are smashable in the conventional way again.

Name: >>9 2012-02-29 5:41

>>5
I'm not saying we should keep return addresses with local variables; I'd be fine with having two separate stacks, but I wouldn't make them fucking linked lists. It'd be bad enough anyways spreading out function call / frame info into two separate memory locations instead of packing it together, in terms of access efficiency.

Name: Anonymous 2012-02-29 8:51

OP Here.

I just realized that the per-function stack only needs to store a single entry (address+counter). (I'll call it PFSES from now on.)

Example:
Trace: b{a{a{a{a{a{a{a}}}}}}}
Stack: (7)(6)(5)(4)(3)(a)(2)(b)(1) # the normal one

When the inner-most function a(7) returns the return address isn't stored in the frames of the calls 7-3 because it doesn't change (it's always 'a'). Instead of scanning the stack, the PFSES is read. It reads "a" in this case. The counter decremented.

When the execution reaches the outer-most a(2) the counter is at 0 and a new return-address is loaded from the normal stack into the PFSES and then used.

The PFSES' have fixed addresses. No lookups. 64 functions can store their return addresses in one cache line. (Assuming 64bit / 512b cacheline) The counters needs additional memory, so maybe 51 entries? I'd call that pretty cache-friendly. Also, all of this reduces the load onto the normal stack.

>>10
I agree. I'd like linked-list-stack-frames as optional compiler feature though. (For co-routines and JIT, etc.)

Do we have a compiler framework to implement stuff like that?

Name: Anonymous 2012-02-29 8:54

Argh, my writing is so messed up. Fix:

The return address "a" isn't stored in the frames of the calls 7-3 because it doesn't change. So when the inner-most function a(7) returns, the PFSES is read instead of scanning the stack. The counter is decremented.

Name: Anonymous 2012-02-29 12:57

>>11
But this only offers an advantage if you're making calls to the same function a series of times. {b {a {b {a {b}}}}} and you're back to storing each return address as we do now.

The big case I can see for making the same function call lots of times would be tail calls, which would transformed (hopefully) before this anyways.

Name: Anonymous 2012-02-29 13:31

How does Haskell do multiple stacks? I can't read the papers about its implementation without a PhD in category theory.

Name: Anonymous 2012-02-29 20:10

>>14
I can't read the papers about its implementation without a PhD in category theory.
That is what we call "job security".

Per-function RLE return stacks

1 Name: Anonymous 2012-02-28 16:56

2 Name: Anonymous 2012-02-28 19:03

3 Name: Anonymous 2012-02-28 22:13

4 Name: Anonymous 2012-02-29 0:37

5 Name: Anonymous 2012-02-29 1:27

6 Name: Anonymous 2012-02-29 3:50

7 Name: Anonymous 2012-02-29 4:14

8 Name: Anonymous 2012-02-29 4:44

9 Name: Anonymous 2012-02-29 5:38

10 Name: >>9 2012-02-29 5:41

11 Name: Anonymous 2012-02-29 8:51

12 Name: Anonymous 2012-02-29 8:54

13 Name: Anonymous 2012-02-29 12:57

14 Name: Anonymous 2012-02-29 13:31

15 Name: Anonymous 2012-02-29 20:10