/prog/ - what is the best language to do this in?

Name: Anonymous 2010-01-14 19:13

I have a set of data:
Sally has attribute A and attribute B
Bob has attribute D and attribute R
John has attribute S and attribute T
Melissa has attribute G and attribute M

etc. There are over 300 names and 21 different attributes (let's call them A, B, ..., T, U) in varying combinations.

I want to pick out a set of 5 names with 2 attributes each that fit in a pattern of (for example) A,B,C,D,E,F,H,I,J,K

What's the best way to go about this? I'm programming language neutral, though I do have some experience in bash, python and perl. It would have to take input from a text file (either tab delimited or csv) and output which names to pick and what their attributes are.

Name: Anonymous 2010-01-14 19:17

I'm going to guess Prolog

Name: Anonymous 2010-01-14 19:19

Languages don't really matter for something like this as much as the data structure you hold the names in. It also depends how many *times* you want to do this. If you're doing it once, just search naively: it's the fastest you'll get. Otherwise, you're better off storing names that have a certain attribute together than storing the names as a hashmap to their attributes.

Name: Anonymous 2010-01-14 19:26

SQL comes to mind for some reason.

Name: Anonymous 2010-01-14 19:30

>>4
Since he's reading from a file perhaps txtsushi? Does anyone have experience with it?

Name: Anonymous 2010-01-14 19:33

Sometimes with support for lists, hashtables and possibly symbols should work fine for this.

Name: Anonymous 2010-01-14 19:38

>>6
We could call it Lithp.

Name: Anonymous 2010-01-14 19:41

>>7
Yes. Very yes.

Name: Anonymous 2010-01-14 19:43

>>1
perl, of course. next question please.

Name: Anonymous 2010-01-14 20:56

You shouldn't be asking about language. You should be asking about algorithm and implementation.

Name: Anonymous 2010-01-14 22:16

use amb

Name: Anonymous 2010-01-14 23:59

>>3
Well the 300 odd people have attributes anywhere in the range of A to U. But I am only looking for certain 12 attribute combinations (6 people).

>>10
How should I go about implementing this? I was thinking just a straight loop but for 300 people that would be a lot of loops.

Name: Anonymous 2010-01-15 0:09

>>12
Use hashtables (or something equivalent) then, but while doing a full bruteforce search would have bad complexity, your input seems tiny enough that it wouldn't matter much.

Name: Anonymous 2010-01-15 0:25

>>12
Don't let the number of times you have to loop intimidate you. There are plenty of canned sort algorithms that you should be able to modify to help you. In the case of the specific problem, you might want to sort all the names into "buckets" that represent different attributes; their contents is the people's names. Whether you want to make these "buckets" into simple arrays or whatnot, that's up to you. The thing is that you don't have to search each person's name, you just need to check what your pattern is, get the next part of the pattern, then search the corresponding attribute buckets for any same name.

For example:
Sally has attribute A and attribute B
Bob has attribute D and attribute E
John has attribute C and attribute A
Melissa has attribute B and attribute C

is stored as:
A: Sally, John
B: Sally, Melissa
C: John, Melissa
D: Bob
E: Bob

Name: Anonymous 2010-01-15 0:52

(defun build-attribute-hashtable (person-attribute-alist) (let ((attribute-hashtable (make-hash-table :test #'equal))) (dolist (person-attribute-pair person-attribute-alist attribute-hashtable) (destructuring-bind (name . attributes) person-attribute-pair (dolist (attribute attributes) (push name (gethash attribute attribute-hashtable))))))) (declaim (inline lookup-attribute)) (defun lookup-attribute (attribute attribute-hashtable) (gethash attribute attribute-hashtable)) (defun lookup-persons-with-attributes (attributes attribute-hashtable) (remove-duplicates (loop for attribute in attributes appending (lookup-attribute attribute attribute-hashtable)) :test #'equal)) ;;; And an example usage scenario: ;;; You'll normally have to parse the input somehow, but for ;;; simplicity's sake, I'm using an alist here. ;(defparameter *person-attributes* ; '(("Sally" . (;; one could use strings instead of keywords if required ; :a :b)) ; ("Bob" . (:d :r)) ; ("John" . (:s :t)) ; ("Melissa" . (:g :m))) ; "Person attribute association list.") ;CL-USER> (build-attribute-hashtable *person-attributes*) ;#<HASH-TABLE :TEST EQUAL :COUNT 8 {24CB7489}> ;;; Upon inspection, the hashtable looks like this: ;#<HASH-TABLE {24CB7489}> ;-------------------- ;Count: 8 ;Size: 16 ;Test: EQUAL ;Rehash size: 1.5 ;Rehash threshold: 1.0 ;[clear hashtable] ;Contents: ;:A = ("Sally") [remove entry] ;:B = ("Sally") [remove entry] ;:D = ("Bob") [remove entry] ;:G = ("Melissa") [remove entry] ;:M = ("Melissa") [remove entry] ;:R = ("Bob") [remove entry] ;:S = ("John") [remove entry] ;:T = ("John") [remove entry] ;CL-USER> (lookup-persons-with-attributes '(:a :b :d :r :s) *) ;("Sally" "Bob" "John")
Easy isn't it?

Name: Anonymous 2010-01-15 10:03

>Sally has attribute A and attribute B
Which one is the cup size?

Name: Anonymous 2010-01-15 10:23

>>16
A is the cup size of course. Have you forgotten where you are?

Name: Anonymous 2010-01-15 11:09

>>15
what in god's name is that

Name: Anonymous 2010-01-15 11:12

>>18
A program that does what OP asked. It only took me some 15 minutes to write and test, so why not?

Name: Anonymous 2010-01-15 11:41

Here's a collect macro using amb that is not terribly robust or scalable.

(define-syntax collect

  (syntax-rules ()

    ((_ ((a b ...) c ...) e ...)

     (let ((collection empty))

       (let ((*failure* (λ() (reverse collection))))

         (letrec-syntax ((amb (syntax-rules ()

                                ((amb) (*failure*))

                                ((amb ?x) ?x)

                                ((amb ?x ?y)

                                 (let ((old-failure *failure*))

                                   ((call-with-current-continuation

                                     (lambda (cc)

                                       (set! *failure* 

                                             (lambda () 

                                               (set! *failure* old-failure)

                                               (cc (lambda () ?y))))

                                       (lambda () ?x))))))

                                ((amb ?x ?rest (... ...))

                                 (amb ?x (amb ?rest (... ...))))))

                         (ambify (syntax-rules ()

                                   ((_ ((?a ?b (... ...)) ?c (... ...)) ?proc ?arg (... ...))

                                    (let ((?a (amb ?b (... ...))))

                                      (ambify (?c (... ...)) ?proc ?a ?arg (... ...))))

                                   ((_ () ?proc ?arg (... ...)) (let ((result (?proc ?arg (... ...))))

                                                                  (if result

                                                                      (if (eq? result

                                                                               (if (empty? collection)

                                                                                   empty

                                                                                   (first collection)))

                                                                          (amb)

                                                                          (set! collection (cons result collection)))

                                                                      (amb))

                                                                  (amb))))))

           (ambify ((a b ...) c ...) e ...)))))))



(collect ((hot 'harry 'draco 'hermione)

          (wizard 'harry 'draco 'hermione 'ron 'dumbledore)

          (male 'harry 'draco 'ron 'dumbledore)

          (female 'hermione)

          (nice-hair 'hermione 'draco))

         (λ traits

           (let ((hot (first traits))

                 (nice-hair (last traits)))

             (if (eq? hot nice-hair)

                 hot

                 #f))))

; => (draco hermione)



(let ((dice-rolls (collect ((die1 1 2 3 4 5 6)

                            (die2 1 2 3 4 5 6)

                            (die3 1 2 3 4 5 6))

                           (λ dice dice))))

  dice-rolls)

 ; => ((1 1 1)

 ;     (2 1 1) ...) all the ways to roll three dice

Name: Anonymous 2010-01-15 12:25

Forget it, it's NP-complete.

Name: Anonymous 2010-01-15 12:37

>>21
what does that mean?

Name: Anonymous 2010-01-15 12:38

>>19
what lang?

Name: >>15 2010-01-15 12:46

>>23
Common Lisp.

>>20's implementatation is in Scheme.

Name: Anonymous 2010-01-15 12:47

>>22
It means that the only solution is "try all posibilities".

Name: >>15 2010-01-15 12:58

On the other hand, that code might not do exactly what the OP wanted as I didn't understand the problem fully. In that example the search is inclusive(OR), but the OP might have wanted something like AND instead. Of course such a thing would be trivial to implement as well (one line change to lookup-persons-with-attributes to use set-intersection instead of appending). If one wants to limit the number of results, either have the loop break early, or just subseq the final result (another line of code tops).

Name: Anonymous 2010-01-15 15:08

>>26
It seemed highly analogous to SELECT ... WHERE to me, which is basically just list comprehension.

Name: Anonymous 2010-01-15 16:14

>>27
Yes, OP's problem can be trivially solved in SQL as well, but OP has been quite ambigous about his requirements. I can solve this in just about any general purpose language, however the amount of code that would be needed depends on the capabilities of the language. Special-purpose languages like SQL also fit the problem domain fine.

Name: Anonymous 2010-01-15 16:34

>>28
What's ambiguous? Maybe I can clarify.

So I have a data set of 300 zoos with two animals in each. There are 21 species of animal across these zoos and no zoo has two of the same animal. So it'd be something like:
Zoo1 Hippo Panda
Zoo2 Elephant Camel
Zoo3 Hippo Zebra
Zoo4 Bear Giraffe
Zoo5 Giraffe Zebra
Zoo6 Lion Tiger
Zoo7 Camel Hippo
Zoo8 Bear Tiger
...

How would I take a list of those zoos (again, 300 zoos in the list) and create a combination of 3 of these zoos which only has a specific set of animsl (bear, camel, giraffe, hippo, lion and tiger for instance) with no duplicates between them?

Name: Anonymous 2010-01-15 17:09

Step 1: cabal install txt-sushi
Step 2: Write SQL query using first line headers from csv as column names and tssql
Step 3: Use awk or something to get unique results

Name: Anonymous 2010-01-15 17:13

No duplicates? What?

Name: Anonymous 2010-01-15 17:18

>>31
He is trying to get the animals to interbreed, having two of the same species would only distract them.

Name: Anonymous 2010-01-15 17:33

So let me get this straight, would this solve your problem?
1) generate all possible zoo triples
2) filter out all results where duplicate animals are seen
3) inspect the remaining results for specified animals

Would this solve your problem?

Name: Anonymous 2010-01-15 18:14

>>33
lol enjoy your O(n!) complexity

Name: Anonymous 2010-01-15 18:40

>>34
I will, along with my job

Name: Anonymous 2010-01-15 19:22

>>35
You're not impressing in me a good opinion of your employer if you were hired despite giving inefficient advice.

Name: Anonymous 2010-01-15 19:28

>>36
It's only 300 lines of data, I think his employer would care more that he just got on with it and let the program run for the whole x seconds.

Name: Anonymous 2010-01-16 1:57

OP, you have a set of data that you'd like to query for a particular result set. Therefore, your best option is an SQL.

As for your particular problem: in SQL, that would be relational division. It can be tough to wrap your head around.

First, we'll set up a database:

CREATE DATABASE zoos;



CREATE TABLE zoos.locations (

  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,

  name VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL

);



CREATE TABLE zoos.animals (

  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,

  name VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL

);



CREATE TABLE zoos.relationships (

  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,

  location_id INT NOT NULL,

  animal_id INT NOT NULL

);

Next, let's insert some dummy data:

INSERT INTO animals (name) VALUES

('tiger'),

('lion'),

('cheetah'),

('giraffe'),

('cobra'),

('kitten');



INSERT INTO locations (name) VALUES

('Bronx'),

('San Diego'),

('Chicago'),

('St. Louis');



INSERT INTO relationships (location_id, animal_id) VALUES

(1, 1),

(1, 3),

(1, 4),

(1, 6),

(2, 1),

(2, 3),

(3, 2),

(3, 5),

(3, 6),

(4, 1),

(4, 2),

(4, 6);

Next comes your query. Now, I'm going to assume we're only working in SQL. Because you have a list of criteria, and not just a scalar value, we'll use the structure SQL gives us for lists (tables! wooo) and create a temporary table to select against.

CREATE TEMPORARY TABLE criteria (

  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,

  animal_id INT NOT NULL

);

Next, let's fill the table with our criteria. I like cats, so let's do one like this.

INSERT INTO criteria (animal_id) VALUES

((SELECT id FROM animals a WHERE a.name = "tiger")),

((SELECT id FROM animals a WHERE a.name = "lion")),

((SELECT id FROM animals a WHERE a.name = "kitten"));

All that's left now is to conjuring the true spirit of SQL and do our selection.

Here is one method to achieve the results you need.

SELECT l.name

FROM locations l

WHERE l.id = (

  SELECT r.location_id

  FROM relationships AS r, criteria AS c

  WHERE r.animal_id = c.animal_id

  GROUP BY r.location_id

  HAVING COUNT(r.animal_id) = (

    SELECT COUNT(animal_id) FROM criteria

  )

);

+-------------+

|  name       |

+-------------+

|  St. Louis  |

+-------------+

To explain this query, let's go from farthest-nested-in outward.

1. First, we gather the size of the criteria (which will be 3).
2. Then, we select from our relationships table where the animal_ids match the criteria table. Simple enough. But that's not all -- we also count how many animal_ids were matched for each location. If the amount is the same size as our criteria table, then we know that every animal in our criteria is accounted for in that location.
3. The last step is simply resolving the location_ids we got from our criteria selection into readable names.

If you're using MySQL, you might have trouble with this query because of some limitation they have with temporary tables. If that happens, you can work around the problem using a stored variable.

SET @criteria_size = (SELECT COUNT(animal_id) FROM criteria);

SELECT l.name

FROM locations l

WHERE l.id = (

  SELECT r.location_id

  FROM relationships AS r, criteria AS c

  WHERE r.animal_id = c.animal_id

  GROUP BY r.location_id

  HAVING COUNT(r.animal_id) = @criteria_size

);

(No worries: that temporary variable will be deleted once you disconnect.)

Name: Anonymous 2010-01-16 3:13

>>37
it's only 126000 possible triples

Name: Anonymous 2010-01-16 5:56

>>39
It's still going to take bugger all time to run, or are you running a computer from the 1980s?

Name: Anonymous 2010-01-16 12:23

My computer has vacuum tubes.

Name: Anonymous 2010-01-16 12:59

>>37

Python 2.6.4 (r264:75706, Nov  2 2009, 14:38:03)

[GCC 4.4.1] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> from math import factorial

>>> factorial(300)

306057512216440636035370461297268629388588804173576999416776741259476533176716867465515291422477573349939147888701726368864263907759003154226842927906974559841225476930271954604008012215776252176854255965356903506788725264321896264299365204576448830388909753943489625436053225980776521270822437639449120128678675368305712293681943649956460498166450227716500185176546469340112226034729724066333258583506870150169794168850353752137554910289126407157154830282284937952636580145235233156936482233436799254594095276820608062232812387383880817049600000000000000000000000000000000000000000000000000000000000000000000000000L

Name: Anonymous 2010-01-16 13:05

>>42
Back to /combinatorics101/, please.

Name: Anonymous 2010-01-16 13:12

>>42
He needs to pick a triple out of 300 options => 300 * 300 * 300 => 27000000

Name: Anonymous 2010-01-16 13:26

Decorator Pattern!!!!!!1

Name: Anonymous 2010-01-16 13:39

>>44
Nope, it's 300*21*20, read his posts again. Or maybe you'd better off without doing it.

Name: Anonymous 2010-01-16 13:53

Total number of rolls of 4 6-sided die: 6³. See if you can figure it out fags.

Name: Anonymous 2010-01-16 14:03

>>47
3 dice. 3.

Name: Anonymous 2010-12-06 9:53

^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^{^Ba}c}k}}t}o}}/}b}/},}}`}`}G}N}A}A}}F}a}g}g}o}t}'}'}

what is the best language to do this in?

1 Name: Anonymous 2010-01-14 19:13

2 Name: Anonymous 2010-01-14 19:17

3 Name: Anonymous 2010-01-14 19:19

4 Name: Anonymous 2010-01-14 19:26

5 Name: Anonymous 2010-01-14 19:30

6 Name: Anonymous 2010-01-14 19:33

7 Name: Anonymous 2010-01-14 19:38

8 Name: Anonymous 2010-01-14 19:41

9 Name: Anonymous 2010-01-14 19:43

10 Name: Anonymous 2010-01-14 20:56

11 Name: Anonymous 2010-01-14 22:16

12 Name: Anonymous 2010-01-14 23:59

13 Name: Anonymous 2010-01-15 0:09

14 Name: Anonymous 2010-01-15 0:25

15 Name: Anonymous 2010-01-15 0:52

16 Name: Anonymous 2010-01-15 10:03

17 Name: Anonymous 2010-01-15 10:23

18 Name: Anonymous 2010-01-15 11:09

19 Name: Anonymous 2010-01-15 11:12

20 Name: Anonymous 2010-01-15 11:41

21 Name: Anonymous 2010-01-15 12:25

22 Name: Anonymous 2010-01-15 12:37

23 Name: Anonymous 2010-01-15 12:38

24 Name: >>15 2010-01-15 12:46

25 Name: Anonymous 2010-01-15 12:47

26 Name: >>15 2010-01-15 12:58

27 Name: Anonymous 2010-01-15 15:08

28 Name: Anonymous 2010-01-15 16:14

29 Name: Anonymous 2010-01-15 16:34

30 Name: Anonymous 2010-01-15 17:09

31 Name: Anonymous 2010-01-15 17:13

32 Name: Anonymous 2010-01-15 17:18

33 Name: Anonymous 2010-01-15 17:33

34 Name: Anonymous 2010-01-15 18:14

35 Name: Anonymous 2010-01-15 18:40

36 Name: Anonymous 2010-01-15 19:22

37 Name: Anonymous 2010-01-15 19:28

38 Name: Anonymous 2010-01-16 1:57

39 Name: Anonymous 2010-01-16 3:13

40 Name: Anonymous 2010-01-16 5:56

41 Name: Anonymous 2010-01-16 12:23

42 Name: Anonymous 2010-01-16 12:59

43 Name: Anonymous 2010-01-16 13:05

44 Name: Anonymous 2010-01-16 13:12

45 Name: Anonymous 2010-01-16 13:26

46 Name: Anonymous 2010-01-16 13:39

47 Name: Anonymous 2010-01-16 13:53

48 Name: Anonymous 2010-01-16 14:03

50 Name: Anonymous 2010-12-06 9:53

51 Name: Anonymous 2010-12-24 6:36