>>32
Thanks for taking the time to review all features!
But it is slightly irritating that FS designers think of this as a FS problem rather than an OS or VFS problem.
I agree wholeheartedly. I'd love to have this feature provided by the OS, I'd be using it already. Problem is, I can't find any, and since metadata without search is useless, I'm going to solve both issues in one shoot by using a database. It'll be tied to this filesystem, but at least it'll get the job done, until something better is available in a higher level.
Doesn't ZFS have that?
I don't know ZFS, will see if it does and how it's done and requested. I suppose a regular cp of files shouldn't use copy on write because it'd be pretty hard for the FS to detect that you're copying files, and cp would read the source files; what I plan to do is a special FS call or ioctl that create copy on write files without even reading them.
If two applications start a transaction on the same (set of) files concurrently, what happens when they both commit?
Transactions are (by default) isolated from each other, so neither would see the changes of the other. If a transaction changes something changed by (or depended from) another ongoing transaction, the later transaction gets blocked until the former transaction either commits or rolls back, i.e. until it's decidable whether this change would be acceptable or not.
Transactions are great for several scenarios. Here are three of them:
- Atomicity: you can ensure a series of changes to the filesystem, for example hot replacing production files in a web server, will be done atomically. The changes become visible all at once, and the older data is used until it's all complete.
- Consistency: you can guarantee data is consistent before and after a transaction. In my example, if the system crashes while you're copying files, nothing gets copied, and you don't end with a mixed version of the production files.
- Undo: you can undo a block of operations. Say you're writing an install script. You edit and delete all sorts of files, but something could fail anytime. Instead of thinking of a way to backup everything and having to restore or undo all of your changes in case something fails later on, you can just use a transaction and rollback on error.
Where are you going to get the extra info? Does the VFS provide the separate "metadata modify" and "data modify" timestamps for you to keep track of?
I'll do what I can to detect it. I can surely tell when the user is editing metadata via my special interface (ioctls or whatever, still undecided).
I don't know what the database is needed for.
The database provides B-Tree indexes, transactions, various isolation levels, consistent snapshots, replication, network filesystems, etc. by itself, so if I use a database, creating a filesystem with such a pile of features would be much easier.
Why is your FS even concerning itself with where it is located?
It's the database who does, because everything (including the data) would be in the database. However, in the case of MySQL's InnoDB engine, the database won't really care either, as it works with both regular files and raw devices.
Ok, now I am convinced you're not just talking about making a FS.
Heh, yeah. Using the database, I could do pull kinds of stunts. For example, you'd have something like NFS if the database is installed somewhere else; the FS can connect to its database using TCP/IP. Only this "NFS" would support locking, transactions, and everything. The database can also replicate to a remote database, so I could support filesystem mirroring that way. I could even attempt to support 2-way/circular replication. Anything the database supports, my FS will support.
You mean like tail files?
More like what NTFS does: if the file is small enough, its contents are inserted in the metadata (the MFT in NTFS's case, the directory table in my case). There's no risk of people accessing others' files in case of corruption, and the database takes care of repair (as it would repair any table).
You've got an enormous amount of work ahead of you.
Lol, the catch is 95% of this work is already done and tested extensively, sitting there, waiting for me to connect to it from a FUSE driver.