Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-4041-

[BEGINNER] Trying to write efficient code

Name: Anonymous 2010-05-07 12:03

I am writing small, simple script that deals with a flat-file database. It only uses 1 text file for this and it is quite small at the moment. However, this text file has the potential to become big because as my script is used, it continually adds new information to this text file.

To keep it simple, this is what my script does: It adds a new line to the top of the db file.

I have two options here:

1. Load the entire db file into a list (memory),
unshift new line,
then write it back into the file.

2. Create a temporary file,
write new line to temp file,
using a while loop (line-by-line) write the entire db file into the temp file,
then replace (rename) the original db file with the newer temp file.

It seems to me that option 2 is better because it doesn't load the entire file into memory. Of course, I'm just a novice and drawbacks are not as obvious to me.

So I guess my questions are: Am I correct in my assumption that option 2 is better? Or am I missing something? Is one option more cpu-intensive than the other?

Name: Anonymous 2010-05-07 12:07

Go with option 3: Add the line to the bottom of the file instead.

Name: Anonymous 2010-05-07 12:09

echo $newline > tmpfile
cat db >> tmpfile
mv tmpfile db

Otherwise, just use mmap.

Name: Anonymous 2010-05-07 12:11

>>2
But then when I try to read the db file, I would need to reverse the order so that the most recent entry is at the top. If I remember correctly, that would involve having to write the entire file into memory anyways so I might as well use option 1.

Name: Anonymous 2010-05-07 12:13

>>4
What language? You can fseek to the end of the file. That's pretty quick.

Name: Anonymous 2010-05-07 12:22

>>5
I'm using Perl. I understand Perl has this sort of function, but my script lists all the entries from most recent to oldest. Unless there's a way to seek to the end, and then go backwards line-by-line.

Name: Anonymous 2010-05-07 12:27

use File::ReadBackwards;

Name: Anonymous 2010-05-07 12:34

>>3
do this and  you'll be fine

Name: Anonymous 2010-05-07 12:39

I wouldn't use a single flat file, at least not while this program is running.  Depending on the kind of data you would be collecting you should organize categories of the data and write information to its specific category.  Allow each category to have a buffer and, only when that buffer is full, write a new file with it in that specific category's subdirectory; use timestamps and keywords to keep track of data.

At a preset scheduled time, take all the then-existing files in a category and compile the data in them into a single file based around these keywords and the timestamps for the category.  Either remove all data for every given keyword except the most recent timestamp or organize the timestamped entries for a given keyword.  Do this only as infrequently as you need as you as regulating all of the time consuming tasks to this scheduled task.

You do nothing but writing at first; your upkeep is a lot of reading, a sort, then one write.

Name: Anonymous 2010-05-07 16:36

Premature optimization is the root of billable hours of consultancy.

Name: Anonymous 2010-05-07 17:45

>>10
Knuth is a moron, and while that root of all evil quote isn't the dumbest thing he's ever said (particularly in light of his recent religious rants), it's definitely in the top three.

Name: Anonymous 2010-05-07 17:56

>>11
You really think that? Wow.

Name: Anonymous 2010-05-07 18:04

>>11
Was that thing about programming languages being for humans to read and only incidentally for computers to execute one of his? Because if it wasn't I can't think of #2.

Well, I guess he did say that multiprocessing was overrated because he only ever did anything that used more than one processor for fifteen minutes in a week. I suppose that counts.

Name: Anonymous 2010-05-07 18:23

>>13
Nope that's a quote from SICP (preface to the first edition). Although given that Knuth created "literate programming" you'd hope he agreed with that sentiment.

Name: Anonymous 2010-05-07 18:31

>>14

Write comments only where the code is incapable of explaining itself.
Prefer self-explanatory code over explanatory comments.  Avoid
`literate programming' like the plague.

  Rationale:  If the code is often incapable of explaining itself, then
  perhaps it should be written in a more expressive language.  This may
  mean using a different programming language altogether, or, since we
  are talking about Lisp, it may mean simply building a combinator
  language or a macro language for the purpose.  `Literate programming'
  is the logical conclusion of languages incapable of explaining
  themselves; it is a direct concession of the inexpressiveness of the
  computer language implementing the program, to the extent that the
  only way a human can understand the program is by having it rewritten
  in a human language.


http://mumble.net/~campbell/scheme/style.txt

Name: Anonymous 2010-05-07 18:31

>>14
I know the Abelson said it as well, but I think it was originally a Knuth quote. Either way it's dumb.

Name: Anonymous 2010-05-07 18:42

>>15
Unfortunately, most Schemers don't write self-explanatory code and don't comment either ;_;

>>16
Care to elucidate? What's so dumb about prizing code legibility?

Name: Anonymous 2010-05-07 18:59

>>17
There's nothing wrong with readable code. The problem is pretending you can get away with pretending the computer is just an afterthought.

Name: Anonymous 2010-05-07 20:18

>>18
Congratulations: you've misunderstood the quote.

Name: Anonymous 2010-05-07 20:19

I only use high-performance language constructs for high-performance individuals such as myself.

Name: Anonymous 2010-05-07 20:32

>>19
I have not. It's a ridiculous exaggeration that only serves to promulgate the myth that Ruby is acceptable. It's entirely possible to come up with catchy one-liners that aren't wrong.

Name: Anonymous 2010-05-07 20:32

>>19
CONGRATULATE MY ANUS

Name: Anonymous 2010-05-07 20:45

>>21
Ruby
Really.

Name: Anonymous 2010-05-07 21:07

>>21
Ruby is acceptable.

Name: Anonymous 2010-05-07 21:24

>>24
See? This is what I'm fucking talking about.

Name: Anonymous 2010-05-07 22:50

>>25
Slow down, you were asking for that one.

Name: Anonymous 2010-05-07 22:54

>>25
It's certainly is acceptable for the tasks it's intended to do.
If you misuse a language it's your own fault.

Name: Anonymous 2010-05-07 23:46

>>21
...the quote is about authoring code, not designing or implementing languages, you nitwit.

Name: Anonymous 2010-05-08 12:55

>>28
AUTHOR MYANUS

Name: Anonymous 2010-05-08 13:18

>>28
The distinction is meaningless, you cockpouch.

Name: Anonymous 2010-05-08 13:21

>>30
MEANINGLESS MY ANUS

Name: Anonymous 2010-05-08 16:25

Only the first 9 posts are on topic.

This truly is /prog/.

Name: Anonymous 2010-05-08 16:41

>>32
The topic was dumb. We made it more interesting.

Name: Anonymous 2010-05-08 17:02

>>33
Says who?

Name: Anonymous 2010-05-08 17:11

>>34
Says >>33. Do you have difficulty reading?

Name: Anonymous 2010-05-08 17:45

>>35
I know you said something but all I read was Says >>33. Do you have difficulty reading?

Name: Anonymous 2010-05-08 18:31

>>30
You have five posts to prove that.

Name: Anonymous 2010-05-08 19:08

>>37
PROVE MY ANUS

Name: Anonymous 2010-05-08 23:04

>>38
haxio ergo anus est

Name: Anonymous 2010-05-09 0:06

Speaking of the topic, no one actually answered OP's question.

In other words, these 39 posts have been an elaborate NO EXCEPTIONS

Never change, /prog/.

Name: Anonymous 2010-05-09 2:11

>>1
Actually, both options do virtually similar things if you think about it.  Where "x" are the number of characters in the file and "y" is the length of the new entry:
1:
a. Read entire file (time: x reads)
b. Append new entry (time: y writes)
c. Write entire file (time: x + y writes)
Total: 2x + 2y

2:
a. Write new entry (time: y)
b. Read first line (cumulative time: x reads)
c. Read first line (cumulative time: x writes)
Time: 2x + y


Timing is also important: if you do (1) it would probably be faster to load all data once at the start of the program, append continuously until the program closes, then write in one shot when done.  The trade-off is that you will need to push around a lot of volatile data in memory at any given time, especially if you expect to add sorted data.  (2), on the other hand, will not work efficiently unless you only create temporary files throughout the program's run and only compile all created files only at a fixed interval; and, even then, that will still bog your program down in file writing and management processes and render new data added to it inconvenient to search.

Think of it this way: if your program is expected to run at a consistent rate, it will have to consistently call file reading and writing services using either your methods at face value.  So, no, (2) will probably be not very efficient for a constantly running and polling program, at least not one with an exceptional growth curve, but (1) presents its own concerns.

Don't change these.
Name: Email:
Entire Thread Thread List