/prog/ - Inserting

Name: Anonymous 2009-09-03 7:48

are there any open/write functions for C , which do not overwrite chars in a file , but insert them ?

or do i have to write a shitty loop that just reads , stores and writes a char ?

also discuss file and string handling libraries because most that i have seen suck .

Name: Anonymous 2009-09-03 8:03

No, there aren't. You'll have to use your shitty loop, although you can make it QUITE a bit less shitty (in terms of performance) by reading/writing large blocks instead.

What exactly is sucking about the I/O ("file handling") routines in C?

Name: Anonymous 2009-09-03 8:14

>>2
its just pissing me off that there is no builtin functionalty for this . small shit like this is easy to implement but it still should be part of the standard library i think .

also , i assume C++ streams work almost the same ?

Name: Anonymous 2009-09-03 8:23

>>3
Other languages are much the same

Name: Anonymous 2009-09-03 8:28

>>3
small shit like this is easy to implement but it still should be part of the standard library i think .
It's not as easy as you seem to think. The hard disk can't insert data like that (unless the data comes in, say, 4kB blocks), so crappy manipulation is in fact needed at some point, and this manipulation is quite expensive; and the amount of crappy manipulation needed is proportional not to the amount of data inserted, but the number of steps in which you insert data.

For example, if you insert a block of 10 bytes, the whole file after the insertion point has to be rewritten. If you insert a block of 1 byte 10 times, however, the whole file after the insertion point has to be rewritten 10 times as well.

This fact makes having a standard library here of limited use. In the most cases, the proper way to solve this problem is to read the whole file into memory, do your manipulation work there, and write everything back to disk when you're done. If that's not possible (if the file is too big to keep in memory), it may be best to write a modified copy of the data to a temporary file, and then replace the old file by the new file. (Of course, this has downsides as well.)

also , i assume C++ streams work almost the same ?
Yes, they do, for the same reasons.

Name: Anonymous 2009-09-03 8:35

>>5
and this is what fucking bothers me so much the only way to implement these is to copy the whole file to memory and opperate from there which would eat up lots of space . or a "Read -> Store -> Write" loop , which would have to go trough the whole file , either way not very ellegant solutions . i just wanted to know if there was any "easy" and efficient way to do it .
seems like not , not a big deal .

Name: Anonymous 2009-09-03 8:37

>>6
It is easy and efficient, if you are willing to redesign the computer.

Name: Anonymous 2009-09-03 8:40

>>6
If you'd like to learn more of this matter, look into filesystem designs. They do lots of work to make common file operations as efficient as possible.

Out of interest, why do you have to insert data in the middle of a file in the first place? I think that's quite rare (it's not a common operation in any case).

Name: Anonymous 2009-09-03 8:49

>>8
i want to patch certain large files fast and with low heap usage .

BTW : there is no function in libc which can insert into buffers ? i assume i would have to implement a read store write loop for a buffer too ? after allocating memory ofcourse .

Name: Anonymous 2009-09-03 8:50

>>9
I don't think there is one in libc, but you could have a look at what some text editors use and pilfer their functions.

Name: Anonymous 2009-09-03 8:56

>>9
Unfortunately, you can't patch large files fast. For this reason, many file formats that are designed to be both large and edited often tend to be designed like filesystems.

BTW : there is no function in libc which can insert into buffers ? i assume i would have to implement a read store write loop for a buffer too ? after allocating memory ofcourse .
Not that I know of, but that's easily fixed:



void meminsert(char *buffer, int index, char *insertData, char *insertDataSize)

{

    // assumes buffer has room for the extra insertDataSize bytes

    memmove(buffer + index + insertDataSize, buffer + index, insertDataSize);

    memcpy(buffer + index, insertData, insertDataSize);

}

Note that this is still rather slow, and that calling it once with much data to insert is way faster than calling it often with small chunks to insert.

Name: Anonymous 2009-09-03 9:01

>>10
Many text editors use a buffer per line, with two buffers for the line on which the cursor is: one for the part before the cursor, one for the part after.

Name: Anonymous 2009-09-03 10:25

>>11
Of course, I meant "int insertDataSize" instead of "char *insertDataSize" :/

Name: Anonymous 2009-09-03 10:37

>>12
Many^which? text editors use a buffer per line, with two buffers for the line on which the cursor is: one for the part before the cursor, one for the part after.^{[citation needed]}

Name: Anonymous 2009-09-03 11:07

>>5
unless the data comes in, say, 4kB blocks
And even then, the fucking filesystems provide no interface to do so, so you're fucked either way. You can truncate at the end, but that's it.

Allowing arbitrary byte editing wouldn't hurt as much as some people assume (you'd just need 3 extra bytes per file extent and it would hurt performance a bit when used, but it'd be better than rewriting the whole file - but it's correct it's not a very big use case, not worth implementing)

>>12
This is false. They use two buffers in total: one for everything before the last insertion, and one for everything after. One of them has a bit of padding to avoid a realloc on every new char.

Source: recovering text files from memory dumps of several crahed/hung editors. Also read it somewhere a long time ago.

Name: Anonymous 2009-09-03 11:07

>>14
I know that kate uses the former technique; I'm not sure whether it uses the latter. I haven't worked with the code of any other text editors, but I've heard both techniques being described as common in discussions about the topic. No citations though, and I can't guarantee that the source in question knew what he was talking about.

Name: Anonymous 2009-09-03 11:14

>>15
And even then, the fucking filesystems provide no interface to do so, so you're fucked either way. You can truncate at the end, but that's it.
Very true, it's just theoretically possible.

Allowing arbitrary byte editing wouldn't hurt as much as some people assume (you'd just need 3 extra bytes per file extent and it would hurt performance a bit when used, but it'd be better than rewriting the whole file - but it's correct it's not a very big use case, not worth implementing)
Could you expand on that? I don't see a way that wouldn't hurt performance significantly in the long run (read: when this feature has been used lots of times).

This is false. They use two buffers in total: one for everything before the last insertion, and one for everything after.
I know that this isn't true for kate, see <<16. I may have been wrong on the "many" part though.

Name: Anonymous 2009-09-03 11:26

>>17
Extents have 3 parts (in ext4 at least): logical position inside the file, length, physical block (you need all this data for random access to extents, and also to allow for stuff like sparse files). You'd need to add 12 bits (for 4K blocks) to "logical position inside the file" and another 12 bits to "length", converting these to bytes instead of blocks, and you're done.

It'd hurt performance, of course, but consider it's the same with sparse files (and also filesystem-level compression). The only new thing is that you'd lose block-level alignment, but that's not too bad. It'd be a new tool to use. Systems that need it now (databases come to mind) just reimplement their own filesystem-inside-file and are paying the cost already, with their own fragmentation and internal free lists.

Name: Anonymous 2009-09-03 11:53

>>18
You'd lose quite a bit of speed because it now takes lots of effort to reassemble all the extents. It's also very inefficient in terms of disk space usage: a 1-byte extent would require ~16 bytes of metadata.

On second thought, most of the speed you lose is CPU speed, which isn't quite as limited as hard disk speed. Maybe in this age of ever-growing processors and only slowly growing (in terms of speed) hard disks, this is in fact a good design.

Name: Anonymous 2009-09-03 16:12

GAP BUFFERS! GAP BUFFERS! GAP BUFFERS ARE THE STANDARD!!!

Name: Anonymous 2009-09-04 0:33

>>11
This is how a lot of simple text editors work, and since even the early days of computing, CPUs can move the entire contents of main memory forward one byte in a few ms, it's not surprising that this method has remained quite standard.

Name: Anonymous 2009-09-04 9:51

>>21
Duff's Device?

Name: Anonymous 2010-12-17 1:23

Are you GAY?
Are you a NIGGER?
Are you a GAY NIGGER?

If you answered "Yes" to all of the above questions, then GNAA (GAY NIGGER ASSOCIATION OF AMERICA) might be exactly what you've been looking for!

Name: Anonymous 2010-12-21 5:58

Name: Anonymous 2013-08-31 18:27

If you think of it as a logarithmic catch up rate with a critical point (the flash) signifying the point where their time line comes into phase and is assimilated into the true present then it makes sense.

Inserting

1 Name: Anonymous 2009-09-03 7:48

2 Name: Anonymous 2009-09-03 8:03

3 Name: Anonymous 2009-09-03 8:14

4 Name: Anonymous 2009-09-03 8:23

5 Name: Anonymous 2009-09-03 8:28

6 Name: Anonymous 2009-09-03 8:35

7 Name: Anonymous 2009-09-03 8:37

8 Name: Anonymous 2009-09-03 8:40

9 Name: Anonymous 2009-09-03 8:49

10 Name: Anonymous 2009-09-03 8:50

11 Name: Anonymous 2009-09-03 8:56

12 Name: Anonymous 2009-09-03 9:01

13 Name: Anonymous 2009-09-03 10:25

14 Name: Anonymous 2009-09-03 10:37

15 Name: Anonymous 2009-09-03 11:07

16 Name: Anonymous 2009-09-03 11:07

17 Name: Anonymous 2009-09-03 11:14

18 Name: Anonymous 2009-09-03 11:26

19 Name: Anonymous 2009-09-03 11:53

20 Name: Anonymous 2009-09-03 16:12

21 Name: Anonymous 2009-09-04 0:33

22 Name: Anonymous 2009-09-04 9:51

23 Name: Anonymous 2010-12-17 1:23

24 Name: Anonymous 2010-12-21 5:58

25 Name: Anonymous 2013-08-31 18:27

Name: Anonymous 2009-09-03 7:48

Name: Anonymous 2009-09-03 8:03

Name: Anonymous 2009-09-03 8:14

Name: Anonymous 2009-09-03 8:23

Name: Anonymous 2009-09-03 8:28

Name: Anonymous 2009-09-03 8:35

Name: Anonymous 2009-09-03 8:37

Name: Anonymous 2009-09-03 8:40

Name: Anonymous 2009-09-03 8:49

Name: Anonymous 2009-09-03 8:50

Name: Anonymous 2009-09-03 8:56

Name: Anonymous 2009-09-03 9:01

Name: Anonymous 2009-09-03 10:25

Name: Anonymous 2009-09-03 10:37

Name: Anonymous 2009-09-03 11:07

Name: Anonymous 2009-09-03 11:07

Name: Anonymous 2009-09-03 11:14

Name: Anonymous 2009-09-03 11:26

Name: Anonymous 2009-09-03 11:53

Name: Anonymous 2009-09-03 16:12

Name: Anonymous 2009-09-04 0:33

Name: Anonymous 2009-09-04 9:51

Name: Anonymous 2010-12-17 1:23

Name: Anonymous 2010-12-21 5:58

Name: Anonymous 2013-08-31 18:27