Recently, a bug was found in the software my employer makes in which the same record number was issued to two or more clients, so that there would be different records with the same number on more than one client.
The bug was fixed, but we wanted to find out if any of our customers had been affected by this bug. A normal directory compare wouldn't work, because all the software we found could only compare between two folders; however, we need to compare the folders of multiple clients to see if any of them share file names with any others, which would be an indication that the bug had occurred.
So, I came up with an algorithm and wrote the code to do just that. The C# program I wrote can find 200,000+ duplicated records in 60+ network folders in about two minutes.
Challenge: come up with an algorithm that, given multiple folders, gives a list of file names shared between two or more folders, and for each duplicated file name found, a list of folders in which it is found.
>>5
I don't think you thought your approach through, buddy, because it is, y'know, stupid as fuck. What you should do is create a dictionary using the filenames as keys and lists of directories as values.
what I would do is:
have a dictionary where keys = filenames and value = list of directories
walk through directories
obtain list for current filename in dictionary
append current directory to list
store list for current filename in dictionary
after:
foreach key,value in dictionary
print key ;; filename
if value.length > 1: print value ;; directories
Name:
Anonymous2009-01-19 20:38
>>11
I'd estimate that to be roughly 3 lines of Haskell, 11 lines of perl, 25 lines of Python, 250 lines of C
Name:
Anonymous2009-01-19 20:39
>>12 250 lines of C
Excluding implementation of dictionary. Easily >1k lines of code
[Damn, I really need to clean out my image collection, seems I have some stuff in triplicate.]
Name:
Anonymous2009-01-19 22:22
>>16
Uh, the whole point is that different records could have the same name, so a checksum wouldn't accomplish the task!
>>7 >>11
Yep, that's basically what I did at first. The final implementation is a bit different, though. I use a case-insensitive SortedDictionary<string, TreeNode>, because I display the results in a TreeView control, and I was told to sort the results in it.
>>8
More like 400 lines of C# now, not including the generated GUI designer code.
>>12
If you use a hash table, it'll be more than 3 lines of Haskell, at least if you want it to be legible, because the included hash table uses IO out the ass.
Name:
DADDYUNIQ2009-01-19 22:32
Namesin
i'll tell you if you payme enough ;)
Name:
Anonymous2009-01-19 23:01
4 lines of Haskell, if you don't count the imports.
import System
import System.Directory
import Data.Map hiding (map, filter)
Seriously though, I'm interested in the expressiveness of commonly used languages. Does Python fare that bad? I thought it was similar to Perl in code size (for legible code).
Name:
Anonymous2009-01-20 0:47
>>23
Programming is not about program size but efficiency.
Name:
Anonymous2009-01-20 0:54
>>26
Shorter programs are invariably more efficient to write. The only effective way to reduce program size is to create better abstractions, which just happen to be the best way to write code more efficiently.
you didn't seriously write 250 lines of code in C# for that, did you?
Name:
Anonymous2009-01-20 2:55
one line of any language.
well, any language except assembly or FIOC.
Name:
Anonymous2009-01-20 3:12
i dunno but this sounds stupifying simple to me, finding unique filenames across MULTIPLE folders. so basically you can just pull up an anonymous list of ALL files in ALL the folders searched and sort it uniquely
like someone already mentioned, this can be done with posix tools like ls and sort in any posix shell
The advantage of languages which require hundreds of lines of code is that you can optimize and improve every algorithm.
With higher abstractions you can't do nothing but use the common built-in components which cannot be optimized.
_________________________
orbis terrarum delenda est
Name:
Anonymous2009-01-20 4:03
>>33
You can use a good optimizing compilter, or you could kep 2 versions, one where you simply implemented it concisely and it worked properly, and one where you went and OMGOPTIMIZED it. It's common for some types of soft to have a routine written in a language, then have the same routine written in assembly(where speed is critical) with SIMD instructions, then the application may chose to use the optimized version if the CPU allows it. The original goal is to get everything working and implemented according to your vision/reference/whatever, then to optimize what you can when everything is already in good state. As they say: "premature optimization is the root of all evil."
Now go READ YOUR SICP
Name:
Anonymous2009-01-20 4:08
>>33
Without the ability to create decent abstractions, complex algorithms become prohibitive to implement.
>>37
Just use the functions you wrote earlier to solve the same problem. e.g. every script i use might use this one:
function tag(x,y){if(!y){return document.getElementsByTagName(x)}else{return x.getElementsByTagName(y)}};
_________________________
orbis terrarum delenda est
What difference does it make if an algorithm finishes in a second or a millisecond? Wouldn't the fact that you can write the program in a fraction of the time matter more?
>>41
"What difference does it make if an algorithm finishes in a second or a millisecond?"
What difference would it make if algorithm finishes in 100hours or 6 minutes?
_________________________
orbis terrarum delenda est
1 second=1000milliseconds
100hours=100*60=6000minutes(about 4 days)
6 minutes=6000/1000minutes
The difference between 4 days and 6minutes is obvious,now?
_________________________
orbis terrarum delenda est
>>30
Actually, the total program is about 350 lines, but that includes all the GUI shit. The algorithm is about 40 lines, but that includes extra shit to handle creating TreeNodes and filtering to only look at certain files.
Name:
Anonymous2009-01-20 13:14
>>55
shit, C# still sucks. Also why write a GUI for it? Isn't a command line tool enough for your needs?
>>56
are you going to open a command line prompt and type out the command with parameters every time you use the tool?
For single-purpose one-use program its ok,but what about more permanent solution,like a convenient GUI interface?
_________________________
orbis terrarum delenda est
Name:
Anonymous2009-01-20 13:29
>>57
While your main point is correct, >>1's program fits in the category of single or rare use
>>58
However its not confined to the domain of problem of >>1
and can be used with any duplicated files.
_________________________
orbis terrarum delenda est
>>60
The more important question is, who will use it? I rightly assume that our clients and my manager aren't familiar enough with the command line, so I made a nice GUI. If this were purely a one-time internal tool, I might not care to write a GUI, but this tool will be used by multiple people at multiple times, to ensure that they aren't affected by the bug, and that they won't be in the future if the bug regresses.
Name:
Anonymous2009-01-20 15:22
Write a man page. Thread over.
Name:
Anonymous2009-01-20 15:58
>>57
Leave the command line open all the time. /thread