So, /prog/, I'm writing a general library in C, with the aim of being somewhat inspired by libraries such as glib, but far lighter and cleaner. So far, I have a linked list implementation and a Unicode string implementation (this is incomplete), totalling 670 SLOC. The build system is cmake, the code is fully documented with doxygen. Of course, the library is in its infancy and far from finished; what other things should I implement? Any data structures? OS abstractions?
You'll end up inventing a shitty implementation of Common Lisp, so stop now.
Name:
Anonymous2011-09-12 23:31
>>4
Question: how should I implement type agnosticism with the data structures? Void pointers or macros? Currently my linked list uses void pointers, because I feel dirty using macros.
Name:
Anonymous2011-09-12 23:35
>>6
Use void pointers. If you do it with macros, you'll end up with unreadable garbage (from a user's perspective).
First I read: with the aim of being somewhat inspired by libraries such as glib, but far lighter and cleaner.
But then, immediately after:
>So far, I have a linked list implementation
How about you get rid of the linked-list implementation altogether, linked lists are poor data structures on modern computers. Dynamic arrays with overcommitting of memory to ammortize allocation cost is far superior. If you want to be lean and mean, focus only on the core data-structures that can be widely used with good performance, and less on the data-structures that are only taught about in elementary first year programming classes at university for historical reasons.
It's why ArrayList in Java and List in C#/.NET are in fact dynamic arrays and not linked-lists.
>Unicode string implementation (this is incomplete), totalling 670 SLOC.
A unicode string implementation in C in under 670 LOC? Something tells me you're doing it wrong. There's no way you can have proper conformance for UTF-8/UTF-16/UTF-32 character decoding, which is very important for security reasons. Do you know how big of a security problem shoddy Unicode text decoding code has caused the world?
>>9
I think I'll keep the linked list implementation, because it's fairly complete and nice. I'll also implement dynamic arrays.
The Unicode string implementation is incomplete, and by incomplete, I mean, it's currently unusable; I've basically just implemented the data structures. There is no UTF-{8,16,32} encoding/decoding yet, nor are there even formatting or searching functions.
Name:
Anonymous2011-09-12 23:57
>>9 Dynamic arrays with overcommitting of memory to ammortize allocation cost is far superior.
Enjoy your O(n) insertion.
Name:
Anonymous2011-09-13 0:05
>>9
I don't totally agree. A linked list with a proper memory manager with a freelist and whatnot is just fine, performance-wise.
>>12
only occasionally. That's the point of the overcommitting part. But dynamic arrays have problems with separation and composition. In a manually memory managed language, separation and composition aren't a good idea anyway, but, I mean, nobody would use a Lisp where cells were actually an abstraction over dynamic arrays. It would be horribly slow. Every call to car and cdr would have to break apart an array. Gross.
My point is that lists lend themselves really well to functional programming/recursion/automatic memory management. They are even, marginally, more optimized for those tasks.
The more I think about what you're trying to accomplish, the more I realize that you and the people who have created similar libraries for C in the past are going completely against the grain regarding the strengths of C and falling into one of the major weaknesses of the languages.
General purpose data structure toolkits in C, including glib and friends, are all gargbage. And I'm afraid that anything you concoct will have no use other than to satisfy your curiosity. They have far worse performance compared to equivalents in the C++ Standard Library, due to function call overhead, pointer indirection with void pointers with the underlying objects allocated out-of-band, emphasis on linked lists and trees, and so on and so forth. And with C data structure toolkits, there's no easy way to abstract the underlying allocator like in C++... typically, everything is hardcoded to call malloc/free or global function pointers to functions with the same signature.
You want a dynamic array of objects of a given structure type?
Given:
typedef struct
{
int value;
char* name;
/* some other shit */
}
entry;
How fucking hard is it to write:
[code]
typedef struct
{
entry *entries;
size_t num_entries;
// whole bunch of other arrays and shit for all of the other data you want to keep track within this subsystem or module of the program
}
higher_level_data_structure;
higher_level_data_structure ctx;
ctx.num_entries = /* whatever */;
ctx.entries = (entry *)malloc(sizeof(entry) * ctx.num_entries);
// initialize other data within ctx here
And changing it to use a linear arena allocator instead of malloc isn't that difficult. On the otherhand dressing it up in a bunch of shit like glib doesn't add anything to your program.
As for things like associative data structures like trees or hash-maps, you don't need them for most thing. A lot of time, most of your data is immutable or constant, so you can just use qsort to sort your arrays and use bsearch (or a manual inlined binary-search loop, it's not that much code) to get O(log n) searches, like a tree, but with better performance due to better cache locality.
And where you actually need constant time insertion or removal, writing an optimized hard-coded associative hash-table with quadratic probing is only like 200 lines of code if you only implement the actual operations you need for your use cases. That's fucking nothing.
Stop fighting against the weaknesses of C and embrace the strengths.
Name:
Anonymous2011-09-13 0:13
>>14
good post. If you're trying to actually get stuff done in C, get stuff done in C the C way. Otherwise go all out and write/download a high level language implementation.
Name:
Anonymous2011-09-13 0:17
>>12
You don't need insertion for most things, you just need to add it to the end of the sequence, in which case dynamic arrays have ammortized constant time tail insertion.
>>13 My point is that lists lend themselves really well to functional programming/recursion/automatic memory management.
Sounds like Lisp hacker syndrome. Every call to car or cdr would not need to break apart an array, they need only provide a bounded view overtop of the underlying immutable array. How the fuck due you think lists work in Clojure? They sure as hell aren't linked lists underneath the hood. However, unlike CL, Clojure has strong referential transparency and immutability, so perhaps it wouldn't work quite as well for CL.
So in fact, I would argue that (immutable) arrays lend themselves far better to real functional than lists.
>>20
Failure aside, the guy did have a point. Algorithmic toolkit libraries in C are considered harmful.
Name:
Anonymous2011-09-13 0:42
[code]dicks[/code]
Name:
Anonymous2011-09-13 0:48
>>16
most dynamic arrays are implemented in such a way that they never over-commit more than double the memory. (ie they double in size every time they grow)
I think it's totally reasonable to assume that the average case for an append to be two lists of the same size.
So no I'd say the average case would require O(n), not constant time.
Also one of the basic things in lisp is decomposing lists into function arguments with the "apply" function. I guess... I guess you could come up with a hack for a dynamic array implementation, but it would not be nearly as straightforward as "the list is now the argument stack. call the function."
also here's a purely functional function: one that takes a list and returns a new list that is twice as big and has each member repeated twice in a row. (a b c) --> (a a b b c c). (it's in a sort of lispy/haskelly pseudocode.)
(recursive-function (a)
when a
append (list (car a) (car a)) (self (cdr a)))
and so: first it would make a bunch of small arrays, and then go about appending them together in the worst way possible because the "head" would move /backward/ each time.
Gross, right? with cells you just move pointers around. Kills the cache but at least the time complexity is 1) good and 2) easy to reason about.
And Clojure's implementation is based on the JVM. I think it's safe to say that we are talking about some kind of typical bit of modern hardware, and not the JVM.
(when a
cons (car a) (cons (car a) (self (cdr a))))
in this case the linked list implementation is much faster, and the dynamic array implementation is still very slow. Even worse, probably, because it appends backward big arrays onto the tails of small arrays.
Name:
n3n7i2011-09-13 1:40
Linked lists are easy to sort, but struggle with random access?
Name:
Anonymous2011-09-13 1:48
>>26
when do you actually use random access? I can think of one situation: image processing.
Everywhere else you are probably going to write some kind of tree data structure to optimize access for special purposes. At least this is what physics and graphcis algorithms do.
arrays are actually not that useful. The only nice thing about them is that they let you more easily utilize the structure of memory (which not coincidentally is more like a tree than a giant 1D plane) to obtain better performance through cache coherency. So instead of making your own trees and doing whatever you want with them, low level advocates say we should work with the trees we are given.
Name:
n3n7i2011-09-13 2:14
I've been mucking around with databases... they're everywhere
...Could you index a linked-list[ for random(ID-based) access]? i guess you'd need an ..array? of pointers? =)
Name:
Anonymous2011-09-13 2:23
>>27
maybe image processing is the key to the economy like the Internet was in the 90's. Huge databases like Google are scary, but you get used to them. Can you imagine what the 1890's would have been like with electrification!
Name:
n3n7i2011-09-13 2:47
>>... they're everywhere [-Direct- Random access on at least one table// using Primary / foreign keys] (Retrieving Data..?) // Sorting isn't super important in this case..
...An array of int's can effectively act as pointers to the elements in another array, probably with more 'stable' results? (No serializing plain old int's either...)
Can also Look-up in both directions // ID->Pos & Pos->ID
And Merge tables !
Name:
n3n7i2011-09-13 3:30
[tblX]
PK-[Xid], { tblXIndex-[Xindex] }, data here....
a dense(?) index field
Should allow for the usual array style sorting on look-up tables?...
Eg. get item id W on a sorted array?
check tblX[item W].Xid == W /or?/ tblX[item W].Xindex = W
true ->item hasn't moved // we got it
else Xindex should point to our moved item...
tblX[ tblX[W].Xindex ].Xid = W (Ref. Integrity?)
if it doesn't ... the index hasn't been updated properly(?)
but it should be able to be fixed (its redundant?)
...that ichess code *remember?* used more than one foreign key (int --pointer) per record, so it kind of was a tree-like(??) struct across many 1D tables..?
{Each Board -> many(?) new possible boards} main tree
{all boards are split up and stored} inv-tree (Roots?)
Name:
Anonymous2011-09-13 9:27
>>9 A unicode string implementation in C in under 670 LOC? Something tells me you're doing it wrong. There's no way you can have proper conformance for UTF-8/UTF-16/UTF-32 character decoding, which is very important for security reasons. Do you know how big of a security problem shoddy Unicode text decoding code has caused the world?
Well, after some time working on the string implementation, I have a fully conformant UTF-8 decoder in 67 lines:
gstring gstring_from_utf8(gstring_serial t) {
gstring s = gvector_alloc();
intmax_t i, w = 0, ww = 0, b;
gchar c = 0;
for (i = 0; i < t->l; i++) {
b = (intmax_t) t->d[i];
if (b < 0x80) {
/* ASCII byte */
if (w) {
/* premature ending to previous multi-byte sequence */
gvector_append(s, (void *) 0xfffd);
w = 0;
}
/* append the character as it is */
gvector_append(s, (void *) b);
} else if (b < 0xc0) {
/* continuation byte */
if (!w) {
/* unwanted continuation byte */
gvector_append(s, (void *) 0xfffd);
} else {
/* keep constructing temporary character */
c |= (b & 0x3f) << (--w * 6);
if (!w) {
/* construction done; append the character */
if (gstring_char_valid(c)) {
/* character is valid Unicode */
if ((c < 0x80) ||
(c < 0x800 && ww > 1) ||
(c < 0x10000 && ww > 2) ||
(c < 0x200000 && ww > 3) ||
(c < 0x4000000 && ww > 4) ||
(ww > 5)) {
/* overlong; reject the character */
gvector_append(s, (void *) 0xfffd);
} else {
/* good, allow the character */
gvector_append(s, (void *) c);
}
} else {
gvector_append(s, (void *) 0xfffd);
}
}
}
} else if (b < 0xfe) {
/* start byte of multi-byte sequence */
if (w) {
/* premature ending to previous multi-byte sequence */
gvector_append(s, (void *) 0xfffd);
}
w = ww =
(b < 0xe0) ? 1 :
(b < 0xf0) ? 2 :
(b < 0xf8) ? 3 :
(b < 0xfc) ? 4 : 5;
c = (b & ((1 << (6 - w)) - 1)) << (w * 6);
} else {
/* invalid bytes 0xFE/0xFF */
gvector_append(s, (void *) 0xfffd);
}
}
if (w) {
/* end of stream and we're still waiting for continuation bytes */
gvector_append(s, (void *) 0xfffd);
}
return s;
}
I've tested this implementation with incomplete sequences, unwanted continuation bytes, invalid bytes and overlong forms, and the function will not crash, nor output non-standard characters. A simple test program proves its usability:
>>37 2011 not using a Linux-based OS, most of which include DejaVu Sans
Condescending smirks and baseball bats aside, I seriously hope you don't do this.
Name:
Anonymous2011-09-13 9:47
>>38
I don't normally run amateur OSes, but when i do i pick LoseThos.
Name:
Anonymous2011-09-13 9:51
>>39
you can't read /prog/ in LoseThos. It can be coded in EBCDIC for all its worth.
Name:
Anonymous2011-09-13 10:29
ooc already has you beat by miles.
Name:
Anonymous2011-09-13 15:17
>>28
use keys that aren't numbers. They are just better.
hints you shouldn't be using a number for something: adding two of them doesn't make sense and is not useful.
>>42
...Care to explain? I can't see how i'd use anything else..
Yeah adding two keys doesn't make a lot of sense...
Being able to increment a key with X+=1; is kind of nice though...
>>46,48-49 Please stop decreasing the average quality of posts where the name field is equal to the string "n3n7i".
Thank you.
Name:
n3n7i2011-09-14 3:29
>>50 Technically, you could just about call it a linked list?
Except index isn't actually linked to the record it resides in...
It could easily be turned into a linked version though?
Name:
Anonymous2011-09-14 3:38
n3n7i is trying so hard, it's adorable
Name:
n3n7i2011-09-14 3:40
if Index is taken to represent a Next pointer
aStruct[1].Value = "B" // .Next = 4
aStruct[4].Value = "A" // .Next = 3
aStruct[3].Value = "D" // .Next = 2
aStruct[2].Value = "C" // .Next = 1 (Loop?)
...Sorted without even moving the array entries?
...Its a linked list Array =)
...This may be useful >> ? Co-ordinate systems in databases??
Eg. you use two ints to locate an entry, with one representing the TABLE!, and the other the position in that table?
Name:
n3n7i2011-09-15 23:13
=) ... hee hee =)
... Or I could use tries >> ? Then build an IA =)
... heeheehee =) play chess >> though? with Tries? ...
... like n=b >> b<n =)
Name:
n3n7i2011-09-15 23:19
... Sorry my bad, not Tries... Mean hash >> hash tables? =)
Or ... Just linked list for index, ...? =)
indexes with >> linked list
... Might try some libs ... =)
... heehee =)
>>65 + >>69 should be enough for a basic word-store, at least
//to clarify; how do i 'Define' a structure-variant on the fly, malloc-ing a hard-coded struct definition is no trouble, but can I 'soft-code' a struct?
ghioiod kliorud io'vm ghavmiobngh a seiotzrurue gherue
ctzghruklghru fjtzghaghbn
Name:
Anonymous2011-09-17 6:46
>>1
C should have all this in stdlib, but it sucks so it doesn't. Let's see...
- I hope the Unicode strings library is not zero-terminated.
- A similar non-zero-terminated strings library for binrary-safe octet streams without any specific character encoding should be useful too, unless your Unicode strings library can handle invalid data and be used for this.
- Cons lists (as in Lisp)
- Variable-length, dynamically allocated vectors (as in Python)
- Dictionaries (as they work Python)
- Utilities to deal with and convert these as necessary
- Optional garbage collector? Don't implement a new one, see if you can include any other project.
- I'd leave OS abstractions last as you can use POSIX functionality, but if you get to it, overflows and shit C is missing, common terminal handling, low-level I/O, common ioctls, files and directories, sockets, threads and common process launching, handling and signaling are my top preferences.
>>85
Yes. I haven't seen any C compilers that do JIT. The only way a programming language can use cons as a base building block and not be slow ass fuck is if it has a JIT, or a REALLY intelligent optimizer.