Is it okay to use goto in C for functions like this? (It's actually C++, but I'm doing C-style development.) Is this function too long, or is it okay due to the nature of null-terminated string processing? It's been a long time since I've done C-style development. Any other recommendations (except "use Lisp or XYZ language instead")?
ssize_t normalize_path(char* restrict dest, size_t dest_max, char const* restrict p, size_t n) noexcept {
size_t i, m, length;
ssize_t retval;
char const* q;
char* buffer, * state;
char const** parts, **new_parts;
size_t parts_size, new_parts_size;
char const* default_parts[32];
if (!n) {
// empty path, normalize to current directory
goto normalize_curdir;
}
if (*p == '/') {
// POSIX paths may begin with one or two slashes, but three or more
// are treated as a single slash
++length; ++p; --n;
if (n && (*p == '/')) {
++p; --n;
q = strncchr(p, '/', n);
if (!q) {
q = p + n;
}
if (q == p) {
++length;
}
else {
n -= q - p;
p = q;
}
}
// copy one or two slashes into destination buffer
for (i = 0, m = (length < dest_max) ? length : dest_max - 1; i < m; ++i) {
dest[i] = '/';
}
if (!n) {
goto null_terminate;
}
}
// create a local copy of input path for tokenization
buffer = static_cast<char*>(malloca(n + 1));
if (!buffer) {
return -1;
}
// tokenize the local path and normalize parts into stack
i = 0;
retval = 0;
state = nullptr;
parts = default_parts;
parts_size = sizeof(default_parts) / sizeof(default_parts[0]);
for (q = strtok_r(buffer, path<char>::sepset, &state); q; q = strtok_r(nullptr, path<char>::sepset, &state)) {
// skip part if curdir, pop top part off stack if pardir, and if
// the path is absolute (length is non-zero), eat all of the
// redundant pardir parts
if (*q == '.') {
if (q[1] == '\0') {
continue;
}
else if ((q[1] == '.') && (q[2] == '\0')) {
if (i > 0) {
--i;
continue;
}
else if (length > 0) {
continue;
}
}
}
// resize the path parts stack if space is exhausted
if (i >= parts_size) {
if (parts_size >= (SIZE_MAX / (3 * sizeof(char const*)))) {
errno = ENOMEM;
retval = -1;
goto cleanup;
}
That looks like an overly complicated way to do it, if all you want to do is remove duplicate '/', strip '.', and have '..' strip the previous component.
As long as both have length remaining, copy characters from source to dest. In the case of "./", don't advance the dest and the next copy will overwrite it naturally. In the case of "../", backup to immediately after the 2nd-last "/" seen (e.g. /parent/../ -> /). In the case of "/", update the last and 2nd-last "/" position for the parent normalisation. No need to even allocate memory, this can be done in-place.
Name:
Anonymous2012-06-19 7:01
Yes, it is perfectly ok to use GOTO. As a side note, there are many language purists that hold a different opinion. It might be because they assume you will goto to any wild location labelling it spaghetti. "continue" and "break" in loops can also be considered a goto, but to a well defined location.
GOTO to a well defined location -- that is a location that will not create variables with undefined values, or deeper into hierarchy -- and you have enough ammo to defend your call.
Name:
Anonymous2012-06-19 7:29
I made this in 25 minutes. It's ugly but it works and is efficient.
I hate it, when in C, the code execution implicitily goes to next line, so I always explicitily write where the execution should go.
I also never create other functions that `main`, since calling them is so slow. I've developed a naming convention, where every line has label like [PROC NAME]_[LOCAL LINE NUMBER]. This is very efficient in keeping the code simple (yeah, I think following KISS principle is good).
In my opinion, this results in extremely easy code to read. There is never need for guessin where the execution goes next. Always expicit goto.
>>10
That's what ZUN's scripting language looks like. Confirmed for Japanese quality.
Name:
Anonymous2012-06-19 11:11
Actually I think we could create some kind of convention on how to use goto in a good way...
For example goto is usually criticized for making the code unreadable because it's hard to know where the goto actually go's, but then we could that by convention endb goes to the end of block:
if (..) {
if (..) {
goto endb1;
else
..
}
}endb1:
A Vim script could manage very easily the number of the block.
My conclusion is that people hates the goto statement because they don't have imagination.
>>5,7
Thanks for this. I tried it out, and it failed a few of my test cases, so when modifying it to fit my function signature, I made a few changes to fix them (I can't assume the input string is null-terminated, nor can I assume the destination buffer is large enough to accommodate the output). But it cuts out the buffer allocation for use with strtok and the join_path call, thus speeding things up quite a bit which is awesome, so thanks. I debated about whether to cap the path depth and just return EOVERFLOW and not reallocate on the heap for additional space like in your code, but decided to keep that in to make it more robust for edge cases. I also didn't need to worry about Windows style paths, as I have another set of path manipulation functions for that platform which handle it, similar to how Python does it.
LIBUPCOREAPI
ssize_t normalize_path(char* restrict dest, size_t dest_max, char const* restrict p, size_t n) noexcept {
char const* default_parts[64];
char const** parts, **new_parts;
char const* const p_end = p + n;
size_t parts_depth, parts_size, new_parts_size, length, i;
// POSIX paths may begin with one or two slashes, but three or more
// are treated as a single slash
for ( ; (p < p_end) && (*p == '/'); ++p) {
if (length < dest_max) {
dest[length] = '/';
}
if (++length > 2) {
length = 1;
break;
}
}
// tokenize the local path and normalize parts into stack
parts = default_parts;
parts_size = sizeof(default_parts) / sizeof(default_parts[0]);
parts_depth = 0;
for ( ; p < p_end; ++p) {
// skip part if curdir, pop top part off stack if pardir, and if
// the path is absolute (length is non-zero), eat all of the
// redundant pardir parts
if (*p == '/') {
continue;
}
else if ((p[0] == '.') && ((p[1] == '/') || (p[1] == '\0'))) {
++p;
continue;
}
else if ((p[0] == '.') && (p[1] == '.') && ((p[2] == '/') || (p[2] == '\0'))) {
if (parts_depth) {
--parts_depth;
p += 2;
continue;
}
else if (length) {
p += 2;
continue;
}
}
// resize the path parts stack if space is exhausted
if (parts_depth >= parts_size) {
if (parts_size >= (SIZE_MAX / (3 * sizeof(char const*)))) {
errno = ENOMEM;
length = SIZE_MAX;
goto cleanup;
}
// push part onto stack
parts[parts_depth] = p;
++parts_depth;
// find start of next part
for ( ; (p < p_end) && (*p != '/'); ++p) ;
}
// rejoin the path parts in normalized form
for (i = 0; i < parts_depth; ++i) {
for (p = parts[i]; (p < p_end) && (*p != '/'); ++length, ++p) {
if (length < dest_max) {
dest[length] = *p;
}
}
if (i != (parts_depth - 1)) {
if (length < dest_max) {
dest[length] = '/';
}
++length;
}
}
>>15 Use Boost.Tokenizer.
And import nearly 17 million lines of half-assed code into your project and quadruple compile-times? No thanks. And I don't use strtok, but rather the safer reenterant version, strtok_r.
C++ is a good language. It is not a perfect language because it inherits from C. C is a flawed language where many things are left undefined. C is an ancient artifact that serves no purpose outside of the domain of kernel design. Because of the improvements made upon C to form C++, beginning programmers and veteran programmers alike may be led astray, thinking that modern C usage is a good idea. It is a mistake to believe the success of C++ justifies the continued use and popularity of C. Just because C++ is successful does not mean the language it has inherited from is of high quality.
Name:
Anonymous2012-06-19 17:03
This isn't C, also your code is unreasonably complicated.
Name:
Anonymous2012-06-19 17:05
>>19
Whether you like it or not (and I don't see how anyone can like it), the C++ standard library, libstdc++ or whatever your compiler uses, is required for pretty much all useful C++ language features
>>19
>Boost also increases executable bloat
No it does not, not any more than using other libraries or even your own handwritten stuff to accomplish the same thing.
Name:
Anonymous2012-06-19 18:36
>>23
Yes it does. When you're doing game development or artificial intelligence where every byte and cycle can count, you can't afford using inefficient language or library features. Even C++ exception handling causes problems because it increases the instruction count for stack unwinding, which can increase the number of cache misses.
Name:
Anonymous2012-06-19 18:37
>>22
Namespaces, lambdas, const, noexcept, constexpr, r-value references don't require any use of the Standard C++ Library.
>>28
Boost is written in C++, and not required to be inefficient. If it isn't inefficient, then it must be decently implemented.
Name:
Anonymous2012-06-19 21:15
>>27
Yes it is. Prove that it isn't. Extraordinary claims require extraordinary evidence. Show me the benchmarks between Boost.Filesystem and regular C-string processing + POSIX syscalls.
>>28
C is not required to be efficient either, ``faggot.''
Name:
Anonymous2012-06-20 3:19
Check out these doubles
Name:
Anonymous2012-06-20 3:22
>>31
Yeah, ignore him, he's probably using the Tiny C Compiler.
Name:
Anonymous2012-06-20 3:23
>>33
You're lucky, if >>34 was 4 seconds faster, you would have missed your dubs.
Name:
Anonymous2012-06-20 4:56
>>25
>Namespaces
Fairly useless
>lambdas
Gimmick
>const
Not a C++ feature
>noexcept
Is the point of this solely to force the compiler into stupid situations? throw() in C++ generally results in worse code
>constexpr
Pretty meh. for a feature
>r-value references
How is this any different from: bool operator=(SomeObject& rhs)
And it wouldn't surprise me if more than one of these features required the C++ standard library. Hell, even virtual functions requires it.
>>14
You don't need dynamic allocation AT ALL. Your code is still much more complicated and inefficient than necessary. Here's a skeleton of what it should look like:
I didn't read it through, but I like your style. I think as long as it's just one function that you're not changing often, you're going to be fine.
Name:
Anonymous2012-06-21 6:29
>>39
Yes, I suppose you could just walk back to the previous '/' separator in the out buffer when ".." is encountered, and have some additional logic to handle when the end is reached.
The current code doesn't actually allocate from the heap any storage in most cases. Most paths aren't going to be more than 64 levels deep. It's just for rare edge cases.
That said, doing it your way is probably better, it doesn't need to touch characters in the input buffer more than once, and has a lesser instruction count.
Name:
Anonymous2012-06-21 20:19
>>41
Actually, I realized I do need the parts stack, because I can't rely on the destination buffer being there or being large enough, so once its exhausted it's useless for back-tracking to previous parts. Users pass in a null/empty destination buffer when wanting to compute the length of the normalized path in characters, so they can allocate storage and recall the function.
But then I realized that there's always an upper-bound for the depth of the path as a function of its length. It'll never be more than half of the path's length in characters. So then I just malloca the parts stack once and cut out the huge block for resizing it and its a big win. malloca is simply a preprocessor macro which uses alloca for small allocations that fit on the guard page (typically 1024 or 4096 bytes depending on the platform) or malloc otherwise, so for paths of 256 characters or less, there's almost not memory allocation overhead. In the end, it shaved off a few KB of machine instructions.
// POSIX paths may begin with one or two slashes, but three or more
// are treated as a single slash
for (length = 0; (p < p_end) && (*p == '/'); ++p) {
if (length < dest_max) {
dest[length] = '/';
}
if (++length > 2) {
length = 1;
break;
}
}
// allocate parts stack with upper-bound of n / 2
parts_depth = 0;
parts_size = n / 2;
parts = static_cast<char const**>(malloca(parts_size * sizeof(char const*)));
// tokenize the local path and normalize parts into stack
while (p < p_end) {
if (*p == '/') {
++p;
continue;
}
else if ((p[0] == '.') && ((p[1] == '/') || (p[1] == '\0'))) {
p += 2;
continue;
}
else if ((p[0] == '.') && (p[1] == '.') && ((p[2] == '/') || (p[2] == '\0'))) {
if (parts_depth) {
--parts_depth;
p += 3;
continue;
}
else if (length) {
p += 3;
continue;
}
}
// rejoin the path parts in normalized form
for (i = 0; i < parts_depth; ++i) {
for (p = parts[i]; (p < p_end) && (*p != '/'); ++length, ++p) {
if (length < dest_max) {
dest[length] = *p;
}
}
if (i != (parts_depth - 1)) {
if (length < dest_max) {
dest[length] = '/';
}
++length;
}
}
// free temporary buffers
freea(parts);
// check for error conditions
if (length > SSIZE_MAX) {
errno = EOVERFLOW;
return -1;
}
// check for empty path, normalize to current directory
if (!length) {
++length;
if (dest_max) {
dest[0] = '.';
}
}
// null terminate the destination buffer
if (length < dest_max) {
dest[length] = '\0';
}
else if (dest_max) {
dest[dest_max - 1] = '\0';
}
>>48
Software from the kernel, the compiler and the core utilities are well served with C. Low level libraries like GTK and X11 are fine with C. Extending the system means having bindings to high level languages through solutions like Guile or Swig. The base of a complex software (such as Gimp) can be implemented using C and then extended using Scheme.
>>52
GTK+ is the base for the GUI analogous to the win32api. It is low level software by that measure
Name:
Anonymous2012-06-22 11:38
>>53
I thought the equivalent of the win32 API was X11, I haven't ever touched it though. Either way GTK+ is very high level considering it brings in object oriented and metaobject facilities.
Name:
Anonymous2012-06-22 11:40
>>42
Why don't you just use C++ features if you're going to use C++ instead of C?