/prog/ - percentages

Name: Anonymous 2011-11-09 20:15

Help /prog/ I suck at maths.

I have a range of countries with associated percentages.

How can I base an if condition around that percentage so if I have Switzerland at 44%, there is a 44% chance of evaluating to Switzerland.

Name: Anonymous 2011-11-09 20:20

Create a 100 element integer array, fill each element with a country id according to its percentage.

Then choose a random number < 100, and that's your country.

For more precision increase the array length as necessary (10000 elements won't be a problem at all and gives you two decimal places).

Name: Anonymous 2011-11-09 20:22

Oh derp.

Thanks anon.

Name: Anonymous 2011-11-09 20:38

A slightly less wasteful in memory than >>2 suggested is to just
sum up the precentages (or in my case, I prefer to think of it in a more general case: weights), then pick a random number between 0 and the total, and select the item that matches when the weights are added up to the selected random number. It will have the same effect, but not be limited to normalized 1.0 (or 100% total) probabilies, instead you can just use whatever numbers as weights. Here's some sample code that I wrote a while ago for doing this:
(defun weighted-random-pick (groups) (let* ((total-weight (reduce #'+ (mapcar #'second groups))) (choice (random total-weight)) (current 0) (last-total 0)) (loop for (name weight) in groups do (setq last-total current) (incf current weight) (when (<= last-total choice current) (return name)))))

And a test:
; seen as precentages CL-USER> (loop repeat 10 collect (weighted-random-pick '((Macaroni 30) (Pizza 50) (/prog/-flavored-aisu-cream 20)))) (PIZZA MACARONI MACARONI /PROG/-FLAVORED-AISU-CREAM MACARONI MACARONI PIZZA MACARONI PIZZA PIZZA) ; seen as weights (in this case, having it choose something based on personal interest in doing that activity at that moment) CL-USER> (weighted-random-pick '((study 60) (code 50) (watch-anime 40) (play-games 15.5))) STUDY CL-USER> (weighted-random-pick '((study 60) (code 50) (watch-anime 40) (play-games 15.5))) CODE

Name: Anonymous 2011-11-09 20:46

A slightly less wasteful in /prog/ posts and trolling value than >>4 suggested is to just
google weighted random numbers

It also scales way better.

Name: Anonymous 2011-11-09 20:50

>>4
The algorithm is probably correct but that "test" is hardly reassuring. http://dilbert.com/fast/2001-10-25/

Name: Anonymous 2011-11-09 20:59

>>6
PRNG will be PRNG (My implementation is SBCL which uses some optimized version of MT19937(Mersenne twister)).

(defun mean (sequence) (/ (reduce #'+ sequence) (length sequence))) CL-USER> (loop repeat 5 collect (mean (loop for i from 1 to 10 collect (random 1.0)))) (0.5666071 0.45750922 0.53701687 0.48007727 0.70273066)
As you can see, given n=10 samples, it can sometimes go quite far from the ideal average of 0.5.
The larger number of samples, the smaller the deviation
CL-USER> (loop repeat 5 collect (mean (loop for i from 1 to 1000 collect (random 1.0)))) (0.49154118 0.48997444 0.50767094 0.48656407 0.48607567) CL-USER> (loop repeat 5 collect (mean (loop for i from 1 to 100000 collect (random 1.0)))) (0.4987231 0.50044996 0.49934453 0.500932 0.50077873)

Name: Anonymous 2011-11-09 21:04

>>7
The question is why the original test used ten (and two!) samples, instead of 1000 or 100000. Of course you can't tell much from so few samples; why even post the test in that case?

Name: Anonymous 2011-11-09 21:07

>>8
It was mostly to illustrate how it's supposed to be used, it wasn't meant to be a test of how random a particular implementation's PRNG is. It still picked the most likely case on average, which is enough of an indication that it likely works. The actual randomness depends on the implementation of the random function (or you could always use a better source of randomness if needed).

Name: Anonymous 2011-11-10 1:28

>>4

This implementation will take linear time in the worst case. If you create an array of key value pairs, where the key is the partial sum of the weights and the value is the object associated with the current weight, and then do binary search on this array using a random number as the key to look for, you will get a log n query time. The array method mentioned above will take constant time, and the number of slots needed isn't necessarily that large. If all of the weights are integers, then you can calculate the greatest common divisor of all the weights, and divide out that factor. Then the total weight will be the sum of these.

Name: Anonymous 2011-11-10 9:17

>>10
Wouldn't you still have to load the numbers into the array, which scales linearly with time? Then wouldn't the complexity of the array be n+logn while the other is just like n?

Name: 4 2011-11-10 22:59

>>10
The implementation that should be used depends on >>1's needs, which he didn't state exactly.
If the data is subject to change (precentages, items to be chosen), or one-time-use only, my implementation will take O(n) time, which would be optimal. If the data is not subject to change and will be reused, your array-based implementation is faster as the memory only needs to be initialized once.

percentages

1 Name: Anonymous 2011-11-09 20:15

2 Name: Anonymous 2011-11-09 20:20

3 Name: Anonymous 2011-11-09 20:22

4 Name: Anonymous 2011-11-09 20:38

5 Name: Anonymous 2011-11-09 20:46

6 Name: Anonymous 2011-11-09 20:50

7 Name: Anonymous 2011-11-09 20:59

8 Name: Anonymous 2011-11-09 21:04

9 Name: Anonymous 2011-11-09 21:07

10 Name: Anonymous 2011-11-10 1:28

11 Name: Anonymous 2011-11-10 9:17

12 Name: 4 2011-11-10 22:59