Recently, a bug was found in the software my employer makes in which the same record number was issued to two or more clients, so that there would be different records with the same number on more than one client.
The bug was fixed, but we wanted to find out if any of our customers had been affected by this bug. A normal directory compare wouldn't work, because all the software we found could only compare between two folders; however, we need to compare the folders of multiple clients to see if any of them share file names with any others, which would be an indication that the bug had occurred.
So, I came up with an algorithm and wrote the code to do just that. The C# program I wrote can find 200,000+ duplicated records in 60+ network folders in about two minutes.
Challenge: come up with an algorithm that, given multiple folders, gives a list of file names shared between two or more folders, and for each duplicated file name found, a list of folders in which it is found.
>>5
I don't think you thought your approach through, buddy, because it is, y'know, stupid as fuck. What you should do is create a dictionary using the filenames as keys and lists of directories as values.
what I would do is:
have a dictionary where keys = filenames and value = list of directories
walk through directories
obtain list for current filename in dictionary
append current directory to list
store list for current filename in dictionary
after:
foreach key,value in dictionary
print key ;; filename
if value.length > 1: print value ;; directories
Name:
Anonymous2009-01-19 20:38
>>11
I'd estimate that to be roughly 3 lines of Haskell, 11 lines of perl, 25 lines of Python, 250 lines of C
Name:
Anonymous2009-01-19 20:39
>>12 250 lines of C
Excluding implementation of dictionary. Easily >1k lines of code
[Damn, I really need to clean out my image collection, seems I have some stuff in triplicate.]
Name:
Anonymous2009-01-19 22:22
>>16
Uh, the whole point is that different records could have the same name, so a checksum wouldn't accomplish the task!
>>7 >>11
Yep, that's basically what I did at first. The final implementation is a bit different, though. I use a case-insensitive SortedDictionary<string, TreeNode>, because I display the results in a TreeView control, and I was told to sort the results in it.
>>8
More like 400 lines of C# now, not including the generated GUI designer code.
>>12
If you use a hash table, it'll be more than 3 lines of Haskell, at least if you want it to be legible, because the included hash table uses IO out the ass.
Name:
DADDYUNIQ2009-01-19 22:32
Namesin
i'll tell you if you payme enough ;)
Name:
Anonymous2009-01-19 23:01
4 lines of Haskell, if you don't count the imports.
import System
import System.Directory
import Data.Map hiding (map, filter)
Seriously though, I'm interested in the expressiveness of commonly used languages. Does Python fare that bad? I thought it was similar to Perl in code size (for legible code).
Name:
Anonymous2009-01-20 0:47
>>23
Programming is not about program size but efficiency.
Name:
Anonymous2009-01-20 0:54
>>26
Shorter programs are invariably more efficient to write. The only effective way to reduce program size is to create better abstractions, which just happen to be the best way to write code more efficiently.
you didn't seriously write 250 lines of code in C# for that, did you?
Name:
Anonymous2009-01-20 2:55
one line of any language.
well, any language except assembly or FIOC.
Name:
Anonymous2009-01-20 3:12
i dunno but this sounds stupifying simple to me, finding unique filenames across MULTIPLE folders. so basically you can just pull up an anonymous list of ALL files in ALL the folders searched and sort it uniquely
like someone already mentioned, this can be done with posix tools like ls and sort in any posix shell
The advantage of languages which require hundreds of lines of code is that you can optimize and improve every algorithm.
With higher abstractions you can't do nothing but use the common built-in components which cannot be optimized.
_________________________
orbis terrarum delenda est
Name:
Anonymous2009-01-20 4:03
>>33
You can use a good optimizing compilter, or you could kep 2 versions, one where you simply implemented it concisely and it worked properly, and one where you went and OMGOPTIMIZED it. It's common for some types of soft to have a routine written in a language, then have the same routine written in assembly(where speed is critical) with SIMD instructions, then the application may chose to use the optimized version if the CPU allows it. The original goal is to get everything working and implemented according to your vision/reference/whatever, then to optimize what you can when everything is already in good state. As they say: "premature optimization is the root of all evil."
Now go READ YOUR SICP
Name:
Anonymous2009-01-20 4:08
>>33
Without the ability to create decent abstractions, complex algorithms become prohibitive to implement.
>>37
Just use the functions you wrote earlier to solve the same problem. e.g. every script i use might use this one:
function tag(x,y){if(!y){return document.getElementsByTagName(x)}else{return x.getElementsByTagName(y)}};
_________________________
orbis terrarum delenda est