I have a set of data:
Sally has attribute A and attribute B
Bob has attribute D and attribute R
John has attribute S and attribute T
Melissa has attribute G and attribute M
etc. There are over 300 names and 21 different attributes (let's call them A, B, ..., T, U) in varying combinations.
I want to pick out a set of 5 names with 2 attributes each that fit in a pattern of (for example) A,B,C,D,E,F,H,I,J,K
What's the best way to go about this? I'm programming language neutral, though I do have some experience in bash, python and perl. It would have to take input from a text file (either tab delimited or csv) and output which names to pick and what their attributes are.
Languages don't really matter for something like this as much as the data structure you hold the names in. It also depends how many *times* you want to do this. If you're doing it once, just search naively: it's the fastest you'll get. Otherwise, you're better off storing names that have a certain attribute together than storing the names as a hashmap to their attributes.
Name:
Anonymous2010-01-14 19:26
SQL comes to mind for some reason.
Name:
Anonymous2010-01-14 19:30
>>4
Since he's reading from a file perhaps txtsushi? Does anyone have experience with it?
>>12
Use hashtables (or something equivalent) then, but while doing a full bruteforce search would have bad complexity, your input seems tiny enough that it wouldn't matter much.
Name:
Anonymous2010-01-15 0:25
>>12
Don't let the number of times you have to loop intimidate you. There are plenty of canned sort algorithms that you should be able to modify to help you. In the case of the specific problem, you might want to sort all the names into "buckets" that represent different attributes; their contents is the people's names. Whether you want to make these "buckets" into simple arrays or whatnot, that's up to you. The thing is that you don't have to search each person's name, you just need to check what your pattern is, get the next part of the pattern, then search the corresponding attribute buckets for any same name.
For example:
Sally has attribute A and attribute B
Bob has attribute D and attribute E
John has attribute C and attribute A
Melissa has attribute B and attribute C
is stored as:
A: Sally, John
B: Sally, Melissa
C: John, Melissa
D: Bob
E: Bob
(defun lookup-persons-with-attributes (attributes attribute-hashtable)
(remove-duplicates
(loop
for attribute in attributes
appending (lookup-attribute attribute attribute-hashtable))
:test #'equal))
;;; And an example usage scenario:
;;; You'll normally have to parse the input somehow, but for
;;; simplicity's sake, I'm using an alist here.
;(defparameter *person-attributes*
; '(("Sally" . (;; one could use strings instead of keywords if required
; :a :b))
; ("Bob" . (:d :r))
; ("John" . (:s :t))
; ("Melissa" . (:g :m)))
; "Person attribute association list.")
On the other hand, that code might not do exactly what the OP wanted as I didn't understand the problem fully. In that example the search is inclusive(OR), but the OP might have wanted something like AND instead. Of course such a thing would be trivial to implement as well (one line change to lookup-persons-with-attributes to use set-intersection instead of appending). If one wants to limit the number of results, either have the loop break early, or just subseq the final result (another line of code tops).
>>27
Yes, OP's problem can be trivially solved in SQL as well, but OP has been quite ambigous about his requirements. I can solve this in just about any general purpose language, however the amount of code that would be needed depends on the capabilities of the language. Special-purpose languages like SQL also fit the problem domain fine.
So I have a data set of 300 zoos with two animals in each. There are 21 species of animal across these zoos and no zoo has two of the same animal. So it'd be something like:
Zoo1 Hippo Panda
Zoo2 Elephant Camel
Zoo3 Hippo Zebra
Zoo4 Bear Giraffe
Zoo5 Giraffe Zebra
Zoo6 Lion Tiger
Zoo7 Camel Hippo
Zoo8 Bear Tiger
...
How would I take a list of those zoos (again, 300 zoos in the list) and create a combination of 3 of these zoos which only has a specific set of animsl (bear, camel, giraffe, hippo, lion and tiger for instance) with no duplicates between them?
Step 1: cabal install txt-sushi
Step 2: Write SQL query using first line headers from csv as column names and tssql
Step 3: Use awk or something to get unique results
>>31
He is trying to get the animals to interbreed, having two of the same species would only distract them.
Name:
Anonymous2010-01-15 17:33
So let me get this straight, would this solve your problem?
1) generate all possible zoo triples
2) filter out all results where duplicate animals are seen
3) inspect the remaining results for specified animals
Next comes your query. Now, I'm going to assume we're only working in SQL. Because you have a list of criteria, and not just a scalar value, we'll use the structure SQL gives us for lists (tables! wooo) and create a temporary table to select against.
CREATE TEMPORARY TABLE criteria (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
animal_id INT NOT NULL
);
Next, let's fill the table with our criteria. I like cats, so let's do one like this.
INSERT INTO criteria (animal_id) VALUES
((SELECT id FROM animals a WHERE a.name = "tiger")),
((SELECT id FROM animals a WHERE a.name = "lion")),
((SELECT id FROM animals a WHERE a.name = "kitten"));
All that's left now is to conjuring the true spirit of SQL and do our selection.
Here is one method to achieve the results you need.
SELECT l.name
FROM locations l
WHERE l.id = (
SELECT r.location_id
FROM relationships AS r, criteria AS c
WHERE r.animal_id = c.animal_id
GROUP BY r.location_id
HAVING COUNT(r.animal_id) = (
SELECT COUNT(animal_id) FROM criteria
)
);
+-------------+
| name |
+-------------+
| St. Louis |
+-------------+
To explain this query, let's go from farthest-nested-in outward.
1. First, we gather the size of the criteria (which will be 3).
2. Then, we select from our relationships table where the animal_ids match the criteria table. Simple enough. But that's not all -- we also count how many animal_ids were matched for each location. If the amount is the same size as our criteria table, then we know that every animal in our criteria is accounted for in that location.
3. The last step is simply resolving the location_ids we got from our criteria selection into readable names.
If you're using MySQL, you might have trouble with this query because of some limitation they have with temporary tables. If that happens, you can work around the problem using a stored variable.
SET @criteria_size = (SELECT COUNT(animal_id) FROM criteria);
SELECT l.name
FROM locations l
WHERE l.id = (
SELECT r.location_id
FROM relationships AS r, criteria AS c
WHERE r.animal_id = c.animal_id
GROUP BY r.location_id
HAVING COUNT(r.animal_id) = @criteria_size
);
(No worries: that temporary variable will be deleted once you disconnect.)
>>39
It's still going to take bugger all time to run, or are you running a computer from the 1980s?
Name:
Anonymous2010-01-16 12:23
My computer has vacuum tubes.
Name:
Anonymous2010-01-16 12:59
>>37 Python 2.6.4 (r264:75706, Nov 2 2009, 14:38:03)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from math import factorial
>>> factorial(300)
306057512216440636035370461297268629388588804173576999416776741259476533176716867465515291422477573349939147888701726368864263907759003154226842927906974559841225476930271954604008012215776252176854255965356903506788725264321896264299365204576448830388909753943489625436053225980776521270822437639449120128678675368305712293681943649956460498166450227716500185176546469340112226034729724066333258583506870150169794168850353752137554910289126407157154830282284937952636580145235233156936482233436799254594095276820608062232812387383880817049600000000000000000000000000000000000000000000000000000000000000000000000000L