Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

C and goto

Name: Anonymous 2012-06-19 6:00

Is it okay to use goto in C for functions like this? (It's actually C++, but I'm doing C-style development.) Is this function too long, or is it okay due to the nature of null-terminated string processing? It's been a long time since I've done C-style development. Any other recommendations (except "use Lisp or XYZ language instead")?

ssize_t normalize_path(char* restrict dest, size_t dest_max, char const* restrict p, size_t n) noexcept {
    size_t i, m, length;
    ssize_t retval;
    char const* q;
    char* buffer, * state;
    char const** parts, **new_parts;
    size_t parts_size, new_parts_size;
    char const* default_parts[32];

    // validate arguments
    if ((dest_max > (SSIZE_MAX + 1)) || (!dest && dest_max) || (!p && n)) {
        errno = EINVAL;
        return -1;
    }

    length = 0;

    if (!n) {
        // empty path, normalize to current directory
        goto normalize_curdir;
    }
       
    if (*p == '/') {
        // POSIX paths may begin with one or two slashes, but three or more
        // are treated as a single slash
        ++length; ++p; --n;
        if (n && (*p == '/')) {
            ++p; --n;
            q = strncchr(p, '/', n);
            if (!q) {
                q = p + n;
            }

            if (q == p) {
                ++length;
            }
            else {
                n -= q - p;
                p = q;
            }
        }

        // copy one or two slashes into destination buffer
        for (i = 0, m = (length < dest_max) ? length : dest_max - 1; i < m; ++i) {
            dest[i] = '/';
        }

        if (!n) {
            goto null_terminate;
        }
    }

    // create a local copy of input path for tokenization
    buffer = static_cast<char*>(malloca(n + 1));
    if (!buffer) {
        return -1;
    }

    *static_cast<char*>(mempcpy(buffer, p, n)) = '\0';

    // tokenize the local path and normalize parts into stack
    i = 0;
    retval = 0;
    state = nullptr;
    parts = default_parts;
    parts_size = sizeof(default_parts) / sizeof(default_parts[0]);

    for (q = strtok_r(buffer, path<char>::sepset, &state); q; q = strtok_r(nullptr, path<char>::sepset, &state)) {
        // skip part if curdir, pop top part off stack if pardir, and if
        // the path is absolute (length is non-zero), eat all of the
        // redundant pardir parts
        if (*q == '.') {
            if (q[1] == '\0') {
                continue;
            }
            else if ((q[1] == '.') && (q[2] == '\0')) {
                if (i > 0) {
                    --i;
                    continue;
                }
                else if (length > 0) {
                    continue;
                }
            }
        }

        // resize the path parts stack if space is exhausted
        if (i >= parts_size) {
            if (parts_size >= (SIZE_MAX / (3 * sizeof(char const*)))) {
                errno = ENOMEM;
                retval = -1;
                goto cleanup;
            }

            new_parts_size = (parts_size * 3) / 2;
            new_parts = static_cast<char const**>(malloc(new_parts_size * sizeof(char const*)));
            if (!new_parts) {
                retval = -1;
                goto cleanup;
            }

            memcpy(new_parts, parts, parts_size * sizeof(char const*));
            if (parts != default_parts) {
                free(parts);
            }

            parts = new_parts;
            parts_size = new_parts_size;
        }

        // push part onto stack
        parts[i++] = q;
    }

    // rejoin the path parts in normalized form
    if (length < dest_max) {
        retval = join_path(dest + length, dest_max - length, parts, i);
    }
    else {
        retval = join_path(nullptr, 0, parts, i);
    }

cleanup:

    // free temporary buffers
    if (parts != default_parts) {
        free(parts);
    }

    freea(buffer);

    // update length with full length from join
    if (retval < 0) {
        return -1;
    }

    length += static_cast<size_t>(retval);
    if (!length) {
        goto normalize_curdir;
    }
    else if (length > SSIZE_MAX) {
        errno = EOVERFLOW;
        return -1;
    }

    return static_cast<ssize_t>(length);

normalize_curdir:

    // empty path, normalize to current directory
    UP_ASSERT(!length);
    ++length;
    if (dest_max > 0) {
        dest[0] = '.';
    }

null_terminate:

    // null terminate the destination buffer
    if (length < dest_max) {
        dest[length] = '\0';
    }
    else if (dest_max > 0) {
        dest[dest_max - 1] = '\0';
    }

    return static_cast<ssize_t>(length);
}

Name: Anonymous 2012-06-21 6:29

>>39
Yes, I suppose you could just walk back to the previous '/' separator in the out buffer when ".." is encountered, and have some additional logic to handle when the end is reached.

The current code doesn't actually allocate from the heap any storage in most cases. Most paths aren't going to be more than 64 levels deep. It's just for rare edge cases.

That said, doing it your way is probably better, it doesn't need to touch characters in the input buffer more than once, and has a lesser instruction count.

Name: Anonymous 2012-06-21 20:19

>>41
Actually, I realized I do need the parts stack, because I can't rely on the destination buffer being there or being large enough, so once its exhausted it's useless for back-tracking to previous parts. Users pass in a null/empty destination buffer when wanting to compute the length of the normalized path in characters, so they can allocate storage and recall the function.

But then I realized that there's always an upper-bound for the depth of the path as a function of its length. It'll never be more than half of the path's length in characters. So then I just malloca the parts stack once and cut out the huge block for resizing it and its a big win. malloca is simply a preprocessor macro which uses alloca for small allocations that fit on the guard page (typically 1024 or 4096 bytes depending on the platform) or malloc otherwise, so for paths of 256 characters or less, there's almost not memory allocation overhead. In the end, it shaved off a few KB of machine instructions.

Here's the final version:

LIBUPCOREAPI
ssize_t normalize_path(char* restrict dest, size_t dest_max, char const* restrict p, size_t n) noexcept {
    char const* const p_end = p + n;
    size_t parts_depth, parts_size, length, i;
    char const** parts;
       
    // validate arguments
    if ((dest_max > (SSIZE_MAX + 1)) || (!dest && dest_max) || (!p && n)) {
        errno = EINVAL;
        return -1;
    }

    // POSIX paths may begin with one or two slashes, but three or more
    // are treated as a single slash
    for (length = 0; (p < p_end) && (*p == '/'); ++p) {
        if (length < dest_max) {
            dest[length] = '/';
        }
        if (++length > 2) {
            length = 1;
            break;
        }
    }

    // allocate parts stack with upper-bound of n / 2
    parts_depth = 0;
    parts_size = n / 2;
    parts = static_cast<char const**>(malloca(parts_size * sizeof(char const*)));
       
    // tokenize the local path and normalize parts into stack
    while (p < p_end) {
        if (*p == '/') {
            ++p;
            continue;
        }
        else if ((p[0] == '.') && ((p[1] == '/') || (p[1] == '\0'))) {
            p += 2;
            continue;
        }
        else if ((p[0] == '.') && (p[1] == '.') && ((p[2] == '/') || (p[2] == '\0'))) {
            if (parts_depth) {
                --parts_depth;
                p += 3;
                continue;
            }
            else if (length) {
                p += 3;
                continue;
            }
        }

        assert(parts_depth < parts_size);
        parts[parts_depth] = p;
        ++parts_depth;

        for ( ; (p < p_end) && (*p != '/'); ++p) ;
    }

    // rejoin the path parts in normalized form
    for (i = 0; i < parts_depth; ++i) {
        for (p = parts[i]; (p < p_end) && (*p != '/'); ++length, ++p) {
            if (length < dest_max) {
                dest[length] = *p;
            }
        }

        if (i != (parts_depth - 1)) {
            if (length < dest_max) {
                dest[length] = '/';
            }
            ++length;
        }
    }

    // free temporary buffers
    freea(parts);

    // check for error conditions
    if (length > SSIZE_MAX) {
        errno = EOVERFLOW;
        return -1;
    }

    // check for empty path, normalize to current directory
    if (!length) {
        ++length;
        if (dest_max) {
            dest[0] = '.';
        }
    }

    // null terminate the destination buffer
    if (length < dest_max) {
        dest[length] = '\0';
    }
    else if (dest_max) {
        dest[dest_max - 1] = '\0';
    }

    return static_cast<ssize_t>(length);
}

Name: Anonymous 2012-06-21 20:46

Use Lisp.  C is shit.

Name: Anonymous 2012-06-21 22:55

>>43
C is shit outside of kernel design. And probably compilers.

Name: Anonymous 2012-06-21 23:00

>>44
For compilers, you want something that can easily manipulate AST trees, i.e. not C.

Name: Anonymous 2012-06-21 23:09

>>44
C is shit outside of kernel design.
U MENA ASSEMBLY

Name: Anonymous 2012-06-21 23:27

>>44
C is excellent for operating systems. The base of such systems could even be extended with languages like Lisp or Python.

Name: Anonymous 2012-06-21 23:38

>>47
How so? If you mean by playing with non-standard extensions, even Haskell can do that: http://programatica.cs.pdx.edu/House/

Name: Anonymous 2012-06-22 3:04

>>48
Software from the kernel, the compiler and the core utilities are well served with C. Low level libraries like GTK and X11 are fine with C. Extending the system means having bindings to high level languages through solutions like Guile or Swig. The base of a complex software (such as Gimp) can be implemented using C and then extended using Scheme.

Name: Anonymous 2012-06-22 4:20

>>49
GTK
oh no you didn't

Name: Anonymous 2012-06-22 4:31

>>50
It's a low level library that provides a base for other work. It's also written in C.

Name: Anonymous 2012-06-22 4:34

>>51
GTK is not a low level library and it uses Glib, an abomination that completely disfigure C.

Name: Anonymous 2012-06-22 5:16

>>52
GTK+ is the base for the GUI analogous to the win32api. It is low level software by that measure

Name: Anonymous 2012-06-22 11:38

>>53
I thought the equivalent of the win32 API was X11, I haven't ever touched it though. Either way GTK+ is very high level considering it brings in object oriented and metaobject facilities.

Name: Anonymous 2012-06-22 11:40

>>42
Why don't you just use C++ features if you're going to use C++ instead of C?

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List