for root, dirs, files in os.walk(os.path.abspath('')):
for f in files:
e = os.path.splitext(f)[1]
if (e and e not in [".py",".db"]):
h = md5.new(file(f,'rb').read()).hexdigest()
n = h + e
try:
os.rename(f,n)
except WindowsError:
os.unlink(n)
os.rename(f,n)
print n + " " + f
How do I optimize this, I'm watching SICP right now
Name:
Anonymous2009-01-21 10:07
If you're working with files of any significant size, reading them in chunks will speed things up, save memory, and increase your number of lines of code, thus providing job security.
shit = hashlib.md5()
bitch = file(whatever, 'rb')
fuck = 'nigger'
while fuck:
fuck = bitch.read(4096) # whatever. look up the size of your disk blocks, make it a multiple of that
shit.update(fuck)
dick = shit.hexdigest()
But really, the speed of this code is pretty much entirely dependent on the implementation of your hashing algorithm and the speed of your disk, and since an md5 isn't thoroughly processor intensive, the speed of even a modest implementation will surpass typical disk read speeds. You can tell where the major bottleneck lies by comparing the processor load and disk activity graphs in a performance monitoring tool; I'd be almost certain that, for your code, you will be spending more time waiting for I/O than actually computing anything, and only a tiny fraction of that is in the Python interpreter.
... oh, fuck, I just accidentally started writing a serious answer. Sorry.