Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Code explanation

Name: Anonymous 2012-01-11 12:49

Can somebody explain why this code outputs what it does?

// tested with Core 2 Duo, Core 2 Quad and Xeon
// tested with gcc4.1.2 gcc4.4.3 and gcc4.6.1
// compile with: gcc -O0 -m32
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <setjmp.h>

jmp_buf p;
void (*q)();

const char *data =
    "\x8b\x44\x24\x04\x8b\x5c\x24\x08"
    "\x8b\x00\x8b\x1b\x31\xc3\x31\xd8"
    "\x31\xc3\x8b\x4c\x24\x04\x89\x01"
    "\x8b\x4c\x24\x08\x89\x19\xc3\x90"
    "\x55\x89\xe5\x8b\x45\x04\xc9\xc3"
    "\x55\x90\x90\x89\xe5\x90\x90\x90"
    "\x8b\x45\x08\x89\x45\x04\xc9\xc3"
    "\x60\x09\x0e\x13\x14\x01\x0c\x0c"
    "\xc0\x07\x05\x0e\x14\x0f\x0f\x60"
    "\x00\x67\x6f\x74\x6f\x20\x63\x6f"
    "\x6e\x73\x69\x64\x65\x72\x65\x64"
    "\x20\x68\x61\x72\x6d\x66\x75\x6c"
    "\x6c\x00\x90\x90\x1c\x1b\x0a\x20";

int f(int x)
{
    static int b = 0; static int s = 0;
    int a = 0, t;
    if (!s) {
        a = b; b = x;
    } else {
        a = x; t = b;
        do {
            a ^= b;
            b = (a^b) & b;
            b <<= 1;
        } while (b);
        b = t;
    }
    s = (s+1) % 2;
    return a;
}

int g(int i, int *j)
{
    *j = i;
    i = (int) putchar;
    if (*j == (48 << 1)) 
        __asm volatile (
                "movl 8(%ebp),%eax;"
                "leave;"
                "ret"
                );
    return (int) puts;
}

void h(int i)
{
    int b;
    q = (void(*)()) g(i++[data],&b);
    for (f(b);*(data+i)!=b;++i,f(b))
        q(f(i[data])%0xff);
}

void sh(int s)
{
    if (s == 010)
        ((void(*)())g(s,&s))("F");
    longjmp(p,s);
}

int main(void)
{
    int base, addr = 0xffffffff, offs = 16;
    int a = 11, b = 32, i = 25;
    int s = 8, t = 1, u = 4;
    ((void(*)()) data)(&a,&b);
    ((void(*)()) data)(&b,&t);
    ((void(*)()) data)(&t,&s);
    addr ^= a;
       a ^= addr;
    addr ^= a;
    base = ((int(*)())data+addr)();
    if (a == -1)
        goto over;
    puts("A");

    base = (1<<3) | ((f(addr) + f(offs)) & ~0xff);
    h(base+addr+offs);
    exit(0);

over:
    signal(t,sh);signal(s,sh);signal(u,sh);

    if (!(s = setjmp(p))) {
        q = (void(*)()) g(0x30, &a);
        q(data + a + i);
        s = a / (b-1);
        puts("B");
    } else if (s == 0xb) {
        puts("C");
        ((int(*)(int)) data+addr+(offs/2))(base);
    } else {
        puts("D");
        *((int*) base+s) = 0xffffffff;
    }
   
    puts("E");
    return 1;
}

Name: Anonymous 2012-01-12 8:37

>>199
architecture dependent != undefined behavior
This is incorrect, read the standard.
undefined behavior == non-deterministic behavior
Whether this is correct or not is undefined, read the standard.

You must understand, the program in OP features both undefined behavior in the form of architecture (compiler, compiler flag, OS) dependencies and undefined behavior in the form of non-determinism.

Name: Anonymous 2012-01-12 8:38

>>200
The nature of undefined behavior is such that you can never guarantee that it will always produce the same output.

If you run the program in >>125 it too will probably output the same values a couple of times, but it's equally likely to not output the same value.

Name: Anonymous 2012-01-12 8:39

>>201
| the program in OP features both undefined behavior in the form of architecture (compiler, compiler flag, OS)
True

| and undefined behavior in the form of non-determinism.
How?

Name: Anonymous 2012-01-12 8:43

>>203
Or more precisely, where?

Name: Anonymous 2012-01-12 8:43

>>203
True
Then the programs output is undefined.

How?
If you want to know some of the C violations just compile it with -pedantic, that will tell you about most of them.

Name: Anonymous 2012-01-12 8:45

>>202
Youre retarded i ran the program in >>125 five times and it output 10739700 every single time.

u mad

Name: Anonymous 2012-01-12 8:46

>>205
| Then the programs output is undefined.
But OP specified it to "when ran on the proper architecture". So dismissing it entirely as non-deterministic is simply over-simplifying it. As I said, architecture dependent code does not imply non-deterministic behavior.

| If you want to know some of the C violations just compile it with -pedantic, that will tell you about most of them.
Yes I know, but he also never claimed it to be proper C anyway, in which case he wouldn't have mixed in inline assembly or running machine code.

Name: Anonymous 2012-01-12 8:47

>>206
Hello OP

Name: Anonymous 2012-01-12 8:48

>>205
>>207 continued
Any compiled binary will in that case be non-deterministic, if we follow your logic.

Name: Anonymous 2012-01-12 8:54

>>207
But OP specified it to "when ran on the proper architecture". So dismissing it entirely as non-deterministic is simply over-simplifying it. As I said, architecture dependent code does not imply non-deterministic behavior.
It doesn't imply anything beyond that of being undefined, which means that you can't guarantee that it's deterministic or non-deterministic, especially since GCC is allowed to do anything it wants you can't guarantee that GCC doesn't use undefined behavior to compile it, so that it will sometimes compile to the same program and other times it won't.

Yes I know, but he also never claimed it to be proper C anyway, in which case he wouldn't have mixed in inline assembly or running machine code.
Compiling with -pedantic won't point out the use of __asm since that is valid gnu89. The things that it does point out however is the use of undefined procedures like conversion between object types and function pointer types and arithmetic involving function pointer types.

But even if this was an extended C which had a stack, had __asm and where you were allowed to do conversion between object types and function pointer types and you could do arithmetic involving function pointer types the use of __asm leaves the program undefined as the GCC page specifically states that significant side-effects (like alteration of the stack as in the example) causes undefined behavior.

No matter how twist and turn this will you able to guarantee that the program will output the same thing no matter how many compiler flags you're using, you can't even guarantee that it will compile to the same program.

Name: Anonymous 2012-01-12 8:57

>>209
Any compiled binary will in that case be non-deterministic, if we follow your logic.
This is incorrect.

Name: Anonymous 2012-01-12 9:03

>>210
| so that it will sometimes compile to the same program and other times it won't
wat

| you can't even guarantee that it will compile to the same program.
wat

This is becoming ridiculous. I'm guessing that you are a Lisp programmer of heart. Either that or a mathematician fixated on definitions.

| It doesn't imply anything beyond that of being undefined, which means that you can't guarantee that it's deterministic or non-deterministic, especially since GCC is allowed to do anything it wants you can't guarantee that GCC doesn't use undefined behavior to compile it, so that it will sometimes compile to the same program and other times it won't.
Circular logic.

1. The program is undefined (assumption)
2. GCC uses undefined behavior to compile it because of 1.
3. Therefore, 1 is true.

That is just, pardon my french, bullshit.

| __asm leaves the program undefined as the GCC page specifically states that significant side-effects (like alteration of the stack as in the example) causes undefined behavior.
But so does longjmp, therefore longjmp must also be undefined behavior, right?

Name: Anonymous 2012-01-12 9:06

>>211
But you (or the guy I was responding to) pointed out in >>201 and >>205 that architecture dependent code is undefined/non-deterministic. This means that every binary, which most certainly IS architecture dependent, also must be undefined/non-deterministic if one is to follow that logic.

As you see, clearly that is a contradiction. Thus, architecture dependent code does not at all imply undefined/non-deterministic behaviour.

To be clear: In this case undefined/non-deterministic doesn't mean both, it depends on what you mean.

Name: Anonymous 2012-01-12 9:09

longjmp also alters the stack...

Name: Anonymous 2012-01-12 9:09

>>212
This is becoming ridiculous. I'm guessing that you are a Lisp programmer of heart. Either that or a mathematician fixated on definitions.
Prove to me that GCC doesn't rely on undefined behavior to compile undefined behavior and I'll believe that the source code compiles to same executable every time, the fact is that you can't guarantee that because GCC is allowed to whatever it wants with undefined behavior, you can however review the source code of GCC and find that there is no undefined behavior in the way that it handles undefined behavior in the source of programs that it compiles, in which case you can guarantee that the program will compile to the same executable every time.

1. The program is undefined (assumption)
I have proven that.
2. GCC uses undefined behavior to compile it because of 1.
No I said you can't guarantee that it doesn't, so it might use undefined behavior to compile it.
3. Therefore, 1 is true.
No, you are mixing the source code of the program with the executable it is turned into.

But so does longjmp, therefore longjmp must also be undefined behavior, right?
No longjmp and setjmp are well defined, you might use them to cause undefined behavior but you may also use them in a manner that guarantees there is no undefined behavior.

Name: Anonymous 2012-01-12 9:12

>>215
So how again does the __asm clause alter the stack in a undefined way?

Name: Anonymous 2012-01-12 9:13

>>213
You are mixing the source code of the program with the program it compiles to.

>>214
C has no concept of "the stack" but it has a concept of longjmp, so therefore it is impossible for longjmp as defined by the C standard to behave in such a way that it alters the stack. Read the standard.

Name: Anonymous 2012-01-12 9:14

>>216
Because all the inline assembly does is moving the address of putchar into eax and then returning. In other words, the function returns a function pointer.

Name: Anonymous 2012-01-12 9:16

>>216
It uses the leave instruction.

Name: Anonymous 2012-01-12 9:16

>>212
I'm not >>1-215, this is my first post in this thread.
But so does longjmp, therefore longjmp must also be undefined behavior, right?
setjmp and longjmp mention no alteration of the stack, setjmp saves the environment in a jmp_buf, longjmp restores the environment saved in the jmp_buf.
They may be implemented as certain alterations of the stack, but that doesn't mean it is the only way to do it. Their behaviour is well-defined.
http://pubs.opengroup.org/onlinepubs/7908799/xsh/longjmp.html
http://pubs.opengroup.org/onlinepubs/7908799/xsh/setjmp.html

Name: Anonymous 2012-01-12 9:16

>>217
No, the premise is clearly that undefined code compiles (using a compiler that may or may not use undefined behavior) into a program with undefined behavior. Correct?

Name: Anonymous 2012-01-12 9:18

>>219
Which restores the stack.

>>220
But in the implementation where they do alter the stack, using the logic applied above means that longjmp must be undefined.

Name: Anonymous 2012-01-12 9:19

>>220
Again C has no concept of the stack, setjmp and longjmp can not alter the stack as there is no stack to alter, read the standard. Those pages are implementation details.

Name: Anonymous 2012-01-12 9:19

>>221 continued
This means that if architecture dependent code is undefined, then any binary must be undefined. That is a contradiction.

Name: Anonymous 2012-01-12 9:21

>>223
| setjmp and longjmp can not alter the stack
This is ridiculous and purely nit-picking from your side.

Name: Anonymous 2012-01-12 9:23

>>222
Which restores the stack.
Of the wrong function.

But in the implementation where they do alter the stack, using the logic applied above means that longjmp must be undefined.
You are wrong on two accounts, the premise was that you may not use __asm to alter the stack, the other premise is that longjmp causes undefined behavior when invoked in such a manner that the standard specifies is not undefined behavior.

>>224
You are still mixing architecture dependent source code with the program the source code compiles to.

Name: Anonymous 2012-01-12 9:24

>>225
What is your point? The standard specifies what behavior is undefined and what behavior isn't undefined, it's clearly in the scope of this discussion.

Name: Anonymous 2012-01-12 9:24

>>226
| Of the wrong function.
No, it does not.

Name: Anonymous 2012-01-12 9:25

>>226-227
But longjump uses __asm__ to alter the stack on many architectures.

Name: Anonymous 2012-01-12 9:26

>>226
| Of the wrong function.
What do you mean? Are you under the assumption that __asm is a function and not a macro?

Name: Anonymous 2012-01-12 9:28

>>228
The second it is called the program is undefined, you are still executing in the function named g when you invalidate the stack and the GNU C extension page explicitly states that it causes undefined behavior to do so.

Name: Anonymous 2012-01-12 9:29

>>231
It doesn't alter the stack other than returning in the same manner the function itself returns.

Name: Anonymous 2012-01-12 9:29

>>230
No it introduces to g the stack context of the callee of g, which is invalid.

>>229
What is your point?

Name: Anonymous 2012-01-12 9:30

>>232
What is your point? The second leave is called from the __asm function the program is undefined, it doesn't matter what instruction comes afterward, GCC might do what you expect it to do and it might not, the point being is that it's undefined.

Name: Anonymous 2012-01-12 9:31

>>233
| No it introduces to g the stack context of the callee of g, which is invalid.
First of all, g is the callee, but assuming you meant caller: No it doesn't, it simply adds a return from one of the variables in g.

| What is your point?
Any implementation of longjmp that alters the stack results in undefined behaviour, applying your logic.

Name: Anonymous 2012-01-12 9:33

why is OP using __asm and not __asm__ or asm?

Name: Anonymous 2012-01-12 9:33

>>234
What is your point? leave does nothing but restore the stack to the state that it was in when g was called, and ret pops the return address from the stack and sets the ip to that value.

If you claim that leave alters the stack, surely ret must do as well.

Name: Anonymous 2012-01-12 9:34

>>236
It doesn't matter. You could use any of those, they are all processed into the same by the preprocessor. Here on /prog/, on the other hand, we tend to overfocus on correctness.

Name: Anonymous 2012-01-12 9:35

>>235
First of all, g is the callee, but assuming you meant caller: No it doesn't, it simply adds a return from one of the variables in g.
You are probably right about the callee/caller. What is still right however is that leave alters the stack such that the program is undefined.

Any implementation of longjmp that alters the stack results in undefined behaviour, applying your logic.
I never claimed that, that's a lie and it's also incorrect. I stated that the use of __asm to alter the stack causes undefined behavior, I also stated that usage of longjmp in a manner defined by the C standard does not yield undefined behavior.

Name: Anonymous 2012-01-12 9:37

>>237
What is your point? leave does nothing but restore the stack to the state that it was in when g was called, and ret pops the return address from the stack and sets the ip to that value.
So you admit that leave alters the stack? Then the program is undefined as it being called from __asm.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List