I am writing small, simple script that deals with a flat-file database. It only uses 1 text file for this and it is quite small at the moment. However, this text file has the potential to become big because as my script is used, it continually adds new information to this text file.
To keep it simple, this is what my script does: It adds a new line to the top of the db file.
I have two options here:
1. Load the entire db file into a list (memory),
unshift new line,
then write it back into the file.
2. Create a temporary file,
write new line to temp file,
using a while loop (line-by-line) write the entire db file into the temp file,
then replace (rename) the original db file with the newer temp file.
It seems to me that option 2 is better because it doesn't load the entire file into memory. Of course, I'm just a novice and drawbacks are not as obvious to me.
So I guess my questions are: Am I correct in my assumption that option 2 is better? Or am I missing something? Is one option more cpu-intensive than the other?
Name:
Anonymous2010-05-07 12:07
Go with option 3: Add the line to the bottom of the file instead.
Name:
Anonymous2010-05-07 12:09
echo $newline > tmpfile
cat db >> tmpfile
mv tmpfile db
Otherwise, just use mmap.
Name:
Anonymous2010-05-07 12:11
>>2
But then when I try to read the db file, I would need to reverse the order so that the most recent entry is at the top. If I remember correctly, that would involve having to write the entire file into memory anyways so I might as well use option 1.
Name:
Anonymous2010-05-07 12:13
>>4
What language? You can fseek to the end of the file. That's pretty quick.
Name:
Anonymous2010-05-07 12:22
>>5
I'm using Perl. I understand Perl has this sort of function, but my script lists all the entries from most recent to oldest. Unless there's a way to seek to the end, and then go backwards line-by-line.
I wouldn't use a single flat file, at least not while this program is running. Depending on the kind of data you would be collecting you should organize categories of the data and write information to its specific category. Allow each category to have a buffer and, only when that buffer is full, write a new file with it in that specific category's subdirectory; use timestamps and keywords to keep track of data.
At a preset scheduled time, take all the then-existing files in a category and compile the data in them into a single file based around these keywords and the timestamps for the category. Either remove all data for every given keyword except the most recent timestamp or organize the timestamped entries for a given keyword. Do this only as infrequently as you need as you as regulating all of the time consuming tasks to this scheduled task.
You do nothing but writing at first; your upkeep is a lot of reading, a sort, then one write.
>>10
Knuth is a moron, and while that root of all evil quote isn't the dumbest thing he's ever said (particularly in light of his recent religious rants), it's definitely in the top three.
>>11
Was that thing about programming languages being for humans to read and only incidentally for computers to execute one of his? Because if it wasn't I can't think of #2.
Well, I guess he did say that multiprocessing was overrated because he only ever did anything that used more than one processor for fifteen minutes in a week. I suppose that counts.
>>13
Nope that's a quote from SICP (preface to the first edition). Although given that Knuth created "literate programming" you'd hope he agreed with that sentiment.
Write comments only where the code is incapable of explaining itself.
Prefer self-explanatory code over explanatory comments. Avoid
`literate programming' like the plague.
Rationale: If the code is often incapable of explaining itself, then
perhaps it should be written in a more expressive language. This may
mean using a different programming language altogether, or, since we
are talking about Lisp, it may mean simply building a combinator
language or a macro language for the purpose. `Literate programming'
is the logical conclusion of languages incapable of explaining
themselves; it is a direct concession of the inexpressiveness of the
computer language implementing the program, to the extent that the
only way a human can understand the program is by having it rewritten
in a human language.
>>19
I have not. It's a ridiculous exaggeration that only serves to promulgate the myth that Ruby is acceptable. It's entirely possible to come up with catchy one-liners that aren't wrong.