/prog/ - considered harmful

Name: Anonymous 2013-03-06 10:13

OK, nobody understands what the argument is about anymore.

Suppose you want to implement piping between local processes. The most straigtforward way: when one process calls write it blocks, then when another process calls read on the receiving end, the data is copied directly from the buffer supplied to write, and the write call doesn't return until all data is copied away. One pleasant property of this approach is that you're guaranteed interactivity, there will never be any data stuck in some internal OS buffer, because there are no internal OS buffers.

Another property is that as long as the buffer used for reading is big enough, each chunk returned by read corresponds directly to a chunk written by write, so you can write a tokenizing program like read(1) with -m option, which splits input into lines and writes each line with a single write call, then each chunk of data returned by read(2) on the other end corresponds to a line and the program there doesn't need to strtok it again or anything.

Then there is buffering, like when you are sending data over a network and it would be inefficient to send a packet every time a program executes write, what if it writes data byte by byte, better to accumulate it in an internal OS buffer and send all at once. Unfortunately this would mean that while you get better throughput, you mostly lose interactivity.

A feeble mind might then conclude that these are only two available options, so since we value interactivity in our pipes, we can assume that a functional equivalent of the first approach, with all its side effects guaranteed.

This is wrong. The guarantee of interactivity is much weaker than the guarantee of 1-to-1 correspondence between writes and reads. Consider this: the OS guarantees that it will always mark written data for immediate sending, and always return all available data with read, but: when the network interface becomes available, the OS sends data immediately, but also all data it has accumulated so far, and only as much data as can fit in one packet, obviously. So suppose you execute write(1 byte), write(1000 bytes), write(1000 bytes), this results in the network packets containing 1 byte, 1400 bytes, 600 bytes.

You still get your interactivity, the OS never introduces any delays, you always get your data across as fast as the channel latency and throughput allow, but when you fill the channel capacity you get all the joys of a buffered channel, both in performance and in the fact that the OS cuts and splices your writes however it wants.

Therefore, a sane stream abstraction should guarantee interactivity but should not guarantee any correspondence between writes and reads, because that's an implementation detail produced by a particular naive implementation, and is not in any way or shape required for providing interactivity. The moment you want to pipe shit between programs running on two cores, you should be able to switch to a buffered implementation and have your programs run in parallel while minimizing synchronization frequency (but still preserving interactivity, of course!).

Now, back to our cats. Plan9 cat implementation guarantees interactivity because it doesn't do any buffering of its own, it always calls write as soon as read returns, therefore the guarantees provided by the OS regarding those are preserved. The fact that cat uses a fixed size buffer means that it might introduce fragmentation, but that doesn't matter because a sane OS doesn't guarantee the lack of fragmentation anyway. The same should apply to read(1), it should use the same 8k buffer and write it out whenever it fills up. The only important guarantee regarding read(1) is that when it encounters newline, it doesn't read past it, calls write and returns immediately. read -m should, on a sane OS, be in all respects identical to cat, and therefore shouldn't exist.

The fact that Plan9 read does have the -m option, goes out of its way to preserve the non-sane guarantee that it will not call write before it sees the newline or EOF, and has this behaviour documented, means that Rob Pike and whoever else wrote/reviewed it don't realize how harmful this shit is, that they added a documented misfeature to the stream abstraction, a feature that has nothing to do with it, is by and large useless, and makes distributed piping unnecessarily complicated and inefficient. Until they pull that particular log out of their collective eye their circlejerk about things Considered Harmful is ridiculous.

considered harmful

1 Name: Anonymous 2013-02-25 6:06

154 Name: Anonymous 2013-03-06 10:13

Name: Anonymous 2013-02-25 6:06

Name: Anonymous 2013-03-06 10:13