* I don't have a problem with write blocking until all data was written, in fact I think that's the Right behaviour.
Okay, good. This is what plan9 does specifically, and what probably all serious UNIX operating systems normally do, even though POSIX allows for early return.
* After some consideration, I think that it should be guaranteed that `read` will return early if there is insufficient data in buffer, because it is necessary for interactive programs to work and is a more or less natural, expected behaviour.
This is what read(2) always has done. It fills the buffer with bytes read from the file descriptor, and if it's less than the size of the buffer, either you've reached end-of-file, or you're reading from a pipe, terminal or socket. However, read(2) might block until there's at least something to read, which might happen when the writing process block buffers its output. If you want to check whether reading will block, poll the file or perform an explicit nonblocking read (which might return nothing).
* It must not be guaranteed that read will never return less data than the corresponding write has written; in other words the OS should be allowed to introduce additional fragmentation.
There is no corresponding write to a read. You don't write(2) into a read(2), you write to file descriptors and read from file descriptors. Once written to a file descriptor there's no information whether it was written in one call to write(2) or ten.
Nobody does what you are criticizing, but if they did, you'd be correct in that it'd be harmful. But this is not what the "one write" that read(1) promises is about. read(1) will write once, one line, which will be read in full and parsed again for newline on the receiving end. Whether this is read with one call to read(2) or ten doesn't matter. If you run read(1) with the option of reading two lines, it might write both before any single line has even been read.
* It probably should not be guaranteed that read will never return more data than a single write has written; in other words the OS should be allowed to remove fragmentation when it has enough data available.
Yes, of course, and nobody does that. Again, there's no mapping between individual calls to write(2) and read(2), you only read and write to file descriptors, usually asynchronously.
Name:
Anonymous2013-03-03 11:53
I LIKE MY WOMEN LIKE I LIKE MY FILE-SYSTEMS
REISERFS
* I don't have a problem with fuck blocking until all sticky cum was came, in fact I think that's the Right behaviour.
Okay, good. This is what plan9 does specifically, and what probably all serious UNIX operating systems normally do, even though POSIX allows for early return.
* After some consideration, I think that it should be guaranteed that `receive cum` will return early if there is insufficient sticky cum in vagina, because it is necessary for interactive programs to work and is a more or less natural, expected behaviour.
This is what get_fucked(2) always has done. It fills the vagina with bytes received from the bitch descriptor, and if it's less than the size of the vagina, either you've reached end-of-bitch, or you're receiving cum from a pipe, terminal or socket. However, get_fucked(2) might block until there's at least something to receive cum, which might happen when the writing process block vaginas its output. If you want to check whether receiving cum will block, poll the bitch or perform an explicit nonblocking receive cum (which might return nothing).
* It must not be guaranteed that receive cum will never return less sticky cum than the corresponding fuck has came; in other words the OS should be allowed to introduce additional fragmentation.
There is no corresponding fuck to a receive cum. You don't fuck(2) into a get_fucked(2), you fuck to bitch descriptors and receive cum from bitch descriptors. Once came to a bitch descriptor there's no information whether it was came in one call to fuck(2) or ten.
Nobody does what you are criticizing, but if they did, you'd be correct in that it'd be harmful. But this is not what the "one fuck" that get_fucked(1) promises is about. get_fucked(1) will fuck once, one line, which will be receive cum in full and parsed again for newline on the receiving end. Whether this is receive cum with one call to get_fucked(2) or ten doesn't matter. If you run get_fucked(1) with the option of receive cuming two lines, it might fuck both before any single line has even received cum.
Name:
Anonymous2013-03-03 17:26
>>121 * It must not be guaranteed that read will never return less data than the corresponding write has written; in other words the OS should be allowed to introduce additional fragmentation. Nobody does what you are criticizing, but if they did, you'd be correct in that it'd be harmful. read(1) will write once, one line, which will be read in full and parsed again for newline on the receiving end.
If the OS is allowed to introduce fragmentation, then when you execute read <8k_char_line.txt | myprogram, and the pipe actually pipes data through a network socket, then `myprogram` will receive an 8k character line produced by a single write(2) by read(1) in 5 or so 1400 byte reads(2), even if it passes an 8k buffer to it.
This is sane behaviour.
Plan9 documentation strongly implies that Plan9 is guaranteed to implement insane behaviour, where upon execution of write(2) by read(1) the OS will actually send the buffer as a message, with a header saying that it's 8k characters long, and on the other end Plan9 will patiently wait until it gets all data, dynamically reallocating the rcv buffer, and only then will allow `myprogram` to read(2) from it, and will return the entire buffer at once if `myprogram` supplies a sufficiently large buffer of its own.
This is insane behaviour.
http://swtch.com/usr/local/plan9/src/cmd/read.c strongly suggests that the documentation is correctly documenting the insanity of the approach, otherwise read(1) could use a static buffer just like cat(1) and split the output across several writes(2), because why not if the OS is allowed to do that anyway?
>>124 http://swtch.com/usr/local/plan9/src/cmd/read.c strongly suggests that the documentation is correctly documenting the insanity of the approach, otherwise read(1) could use a static buffer just like cat(1) and split the output across several writes(2), because why not if the OS is allowed to do that anyway?
Avoiding syscall overhead, perhaps? In some cases it might be better to do multiple realloc's (since many of these will return immediately without brk'ing into the system) to avoid doing multiple writes.
Name:
Anonymous2013-03-04 8:23
>>130 Avoiding syscall overhead, perhaps?
Why doesn't cat do the same then? Why don't they use a much bigger buffer? Nah, that's grasping at straws, they do exactly what they say they do, and what they say is a harmful thing.
First, the file descriptor (arg[0]) is turned into a channel, then the channel device's write function is called directly with the same arguments. There's nothing special the OS does directly, rather it's all handled by the device. The device writes the code in a suitable way, depending on its nature. devtab is made at compile time with the mkdevc script.
There's nothing special the OS does directly, rather it's all handled by the device. The device writes the code in a suitable way, depending on its nature.
So if the device does not preserve the guarantees made in the documentation, then documentation ends up being wrong, the retardation you can see in the read(1) code ends up being pointless, and all userspace programs written by naive morons who trusted the documentation are broken.
I don't quite follow what you're trying to prove.
Name:
Anonymous2013-03-04 17:17
>>133
Look, you do see what is the difference between cat(1) and read(1), correct? Every character is checked so the newline can be found and be written immediately. When the output is piped to another process, it'll be able to process that line directly. Another program, for instance cat(1) would sometimes wait until an 8k buffer is filled or EOF before writing anything. That is inconsequential for batch processing, but bad for interactive use. Then you want your line as soon as it's available.
You're getting hung up on the "single write" in the man page, and read too much into it. A program that read and wrote a single character at a time could be used for the same purpose as read(1). The line would be available straight away. The only difference is that read(1) buffers until a whole line has been read. One write instead of n. Is the dynamic buffer size bad and overkill? Maybe. Ask the author why a static buffer and multiple writes weren't used. Again, most lines are shorter than 8k, the buffer size cat(1) uses, and most lines will fill up quickly.
If your input is line unbuffered or line buffered already, you don't gain anything. If the process that writes to read(1) buffers its output for more than a line, this is also very little help. But can you now tell what the difference between these two lines are?
And you seemed convinced that the operating system did this or that based on your interpretation of the manual of a user land program. Don't be so quick to jump to conclusions just because Uriel claiming cat -v wad bad made you sad. Look at the sources directly and you can see what write(1) does and doesn't do.
>>134
tl;dr read(1) limits buffering to 1 line to avoid blocking the next process in an interactive pipe; cat doesn't care about this; >>133 has his panties in a twist. Sounds good.
Name:
Anonymous2013-03-04 22:12
C is fucking shit.
Name:
Anonymous2013-03-05 6:21
>>134 Ask the author why a static buffer and multiple writes weren't used.
Because the documentation says that `read` uses a single write. It doesn't say that read issues a write immediately after encountering newline, it's pretty clear about what it does.
But can you now tell what the difference between these two lines are?
I don't think you meant cat/read my-file, you wanted them to read from the terminal, no? With all that interactivity and stuff... Now you seem to be missing the fact that cat does not read the entire 8k buffer before writing, `read(2)` returns immediately and `cat(1)` writes current input immediately as soon as it gets to the end of data you gave it interactively so far. Meditate on that bro.
There's no difference between
cat | my-interactive-program
read -m | my-interactive-program
assuming that my-interactive-program is not retarded. None whatsoever.
And you seemed convinced that the operating system did this or that based on your interpretation of the manual of a user land program.
You seem convinced that the OS works in a non-retarded way despite all available evidence in form of documentation and read(1) code, written by the same guy who wrote the OS itself by the way. That is funny.
`cat(1)` writes current input immediately as soon as it gets to the end of data you gave it interactively so far.
Oh, you are a troll. cat cannot know how big the input is, it doesn't look for newlines, so unless it gets an EOF it will happily sit and wait for that 8K buffer to fill up.
If you think the OS is cooking the the input to read(1), you're insane. If that were true there would be no need to have read(1) and cat(1) be different programs as under your insane presumption they would behave identically for lines under 8K in length.
>>140
lisp may be dead, it's irrelevant, i like python more and it's much alive
manual memory management makes me cry ><
Name:
Anonymous2013-03-05 13:01
>>139 `cat(1)` writes current input immediately as soon as it gets to the end of data you gave it interactively so far. Oh, you are a troll. cat cannot know how big the input is, it doesn't look for newlines, so unless it gets an EOF it will happily sit and wait for that 8K buffer to fill up.
Holy shit. Go run cat(1) from your terminal and observe that it echoes your input immediately, line by line.
You're an idiot bro. Not because you didn't know that, but because it should have taken you about five seconds to check, but you didn't.
The reason cat on GNU/Linux spits out your line upon "enter" is due to the way the kernel is managing I/O events and buffers, not due to the cat program itself. Christ.
read(2) will block until there's some input, it doesn't block by block-size. The kernel will attempt to wait for certain block sizes, but it isn't going to wait forever if there's data available but it doesn't match the block size.
It's a detail of the libc / kernel / environment of the OS, not a detail of the userland program.
Name:
Anonymous2013-03-06 8:26
>>148 It's a detail of the libc / kernel / environment of the OS, not a detail of the userland program.
Read the fucking thread, please. Especially >>139, and then earlier comments to see what the fuss is about.
Only the original UNIXv7 cat ever set a buffer, with setbuf(stdout, stdbuf).
BSD cat uses -u to set the buffer to NULL, but in the code there it doesn't ever seem to set the buffer otherwise, so it's likely always unbuffered, or buffered to the kernel's discretion.
I've read the fucking thread, ``faggot'', seeing as I posted when this discussion started, here >>76.
I'm not sure what the fuck you retards are even still arguing about.
The kernel does try to do block ``optimizations'' on read, even if it's set to be unbuffered.
You can check by sending 1 byte of data to another process with a delay between sends, and sending blocks equivalent to whatever the kernel's buffer is set to with a higher delay.
As a last post, you're both retarded and neither of you have any idea about what you're saying, because:
`cat(1)` writes current input immediately as soon as it gets to the end of data you gave it interactively so far.
Oh, you are a troll. cat cannot know how big the input is, it doesn't look for newlines, so unless it gets an EOF it will happily sit and wait for that 8K buffer to fill up.
Both those statements are wrong. There is no "end of the data" you give it interactively. There's just what read(2) has returned. Regardless of whatever fucking OS you're on, and the Plan 9 cat you linked follows this same behaviour.
It knows the output has "finished" on EOF. Which is either CTRL+D in the terminal, or as returned through an actual file entry. Any program that uses read in blocking mode isn't aware of "waiting" for anything. It just gets data as the OS passes it to read.
Discussion over.
Name:
Anonymous2013-03-06 10:13
OK, nobody understands what the argument is about anymore.
Suppose you want to implement piping between local processes. The most straigtforward way: when one process calls write it blocks, then when another process calls read on the receiving end, the data is copied directly from the buffer supplied to write, and the write call doesn't return until all data is copied away. One pleasant property of this approach is that you're guaranteed interactivity, there will never be any data stuck in some internal OS buffer, because there are no internal OS buffers.
Another property is that as long as the buffer used for reading is big enough, each chunk returned by read corresponds directly to a chunk written by write, so you can write a tokenizing program like read(1) with -m option, which splits input into lines and writes each line with a single write call, then each chunk of data returned by read(2) on the other end corresponds to a line and the program there doesn't need to strtok it again or anything.
Then there is buffering, like when you are sending data over a network and it would be inefficient to send a packet every time a program executes write, what if it writes data byte by byte, better to accumulate it in an internal OS buffer and send all at once. Unfortunately this would mean that while you get better throughput, you mostly lose interactivity.
A feeble mind might then conclude that these are only two available options, so since we value interactivity in our pipes, we can assume that a functional equivalent of the first approach, with all its side effects guaranteed.
This is wrong. The guarantee of interactivity is much weaker than the guarantee of 1-to-1 correspondence between writes and reads. Consider this: the OS guarantees that it will always mark written data for immediate sending, and always return all available data with read, but: when the network interface becomes available, the OS sends data immediately, but also all data it has accumulated so far, and only as much data as can fit in one packet, obviously. So suppose you execute write(1 byte), write(1000 bytes), write(1000 bytes), this results in the network packets containing 1 byte, 1400 bytes, 600 bytes.
You still get your interactivity, the OS never introduces any delays, you always get your data across as fast as the channel latency and throughput allow, but when you fill the channel capacity you get all the joys of a buffered channel, both in performance and in the fact that the OS cuts and splices your writes however it wants.
Therefore, a sane stream abstraction should guarantee interactivity but should not guarantee any correspondence between writes and reads, because that's an implementation detail produced by a particular naive implementation, and is not in any way or shape required for providing interactivity. The moment you want to pipe shit between programs running on two cores, you should be able to switch to a buffered implementation and have your programs run in parallel while minimizing synchronization frequency (but still preserving interactivity, of course!).
Now, back to our cats. Plan9 cat implementation guarantees interactivity because it doesn't do any buffering of its own, it always calls write as soon as read returns, therefore the guarantees provided by the OS regarding those are preserved. The fact that cat uses a fixed size buffer means that it might introduce fragmentation, but that doesn't matter because a sane OS doesn't guarantee the lack of fragmentation anyway. The same should apply to read(1), it should use the same 8k buffer and write it out whenever it fills up. The only important guarantee regarding read(1) is that when it encounters newline, it doesn't read past it, calls write and returns immediately. read -m should, on a sane OS, be in all respects identical to cat, and therefore shouldn't exist.
The fact that Plan9 read does have the -m option, goes out of its way to preserve the non-sane guarantee that it will not call write before it sees the newline or EOF, and has this behaviour documented, means that Rob Pike and whoever else wrote/reviewed it don't realize how harmful this shit is, that they added a documented misfeature to the stream abstraction, a feature that has nothing to do with it, is by and large useless, and makes distributed piping unnecessarily complicated and inefficient. Until they pull that particular log out of their collective eye their circlejerk about things Considered Harmful is ridiculous.
when the network interface becomes available, the OS sends data immediately, but also all data it has accumulated so far, and only as much data as can fit in one packet, obviously. So suppose you execute write(1 byte), write(1000 bytes), write(1000 bytes), this results in the network packets containing 1 byte, 1400 bytes, 600 bytes.
Are you describing a theoretical scenario, or alluding to the fact that this actually happens? Because it does not, at least, not over TCP, boundaries for send or write are not preserved in TCP packets. Neither are they preserved when doing a read or recv on a socket descriptor. It'll send whatever data was in the send buffer all at once, and compiles the packets based on that. They're preserved in UDP, but there you lose the 1-to-1 correspondence anyway, due to the fact that the packets aren't guaranteed to arrive.
when the network interface becomes available, the OS sends data immediately, but also all data it has accumulated so far, and only as much data as can fit in one packet, obviously. So suppose you execute write(1 byte), write(1000 bytes), write(1000 bytes), this results in the network packets containing 1 byte, 1400 bytes, 600 bytes. Are you describing a theoretical scenario, or alluding to the fact that this actually happens? Because it does not, at least, not over TCP, boundaries for send or write are not preserved in TCP packets.
Are you drunk?
Name:
154 2013-03-06 12:21
By the way, I was pretty close, writing 1, 1000, 1000 over an actual network socket (not the loopback adapter!) results in 1, 1460, 540 bytes received.