On Monte Carlo Probability syntax

Question

Let 20 people, including exactly 3 women, seat themselves randomly at 4 tables (denoted (A,B,C,D)) of 5 persons each, with all arrangements equally likely. Let X be the number of tables at which no women sit. Write a numpy Monte Carlo simulation to estimate the expectation of X and also estimate the probability p that no women sit at table A. Run the simulation for 3 cases (100,1000,10000)

I would like to define a function that utilizes numpy's random.permutation function to calculate the expected value of X given an a variable number of trials I understand how to do this on pen and paper, iterating through my collection of probabilities and multiplying them by each other such that I can calculate the total probability of the event. This is what I have so far

T = 4       # number of tables
N = 20      # number of persons. Assumption: N is a multiple of T.
K = 5       # capacity per table
W = 3       # number of women. Assumption: first W of N persons are women.
M =100      #number of trials

collection = []

for i in range(K):


    x = (((N-W)-i)/(N-i))

    collection.append(x)

If I examine my collection, this is my output :[0.85, 0.8421052631578947, 0.8333333333333334, 0.8235294117647058, 0.8125]

Answer 1

Implementation

Here is naïve implementation of your Monte-Carlo simulation. It is not designed to be performant, instead it allows you to cross check setup and see details:

import collections
import numpy as np

def runMonteCarlo(nw=3, nh=20, nt=4, N=20):
    """
    Run Monte Carlo Simulation
    """

    def countWomen(c, nt=4):
        """
        Count Number of Women per Table
        """
        x = np.array(c).reshape(nt, -1).T  # Split permutation into tables
        return np.sum(x, axis=0)           # Sum woman per table

    # Initialization:
    comp = np.array([1]*nw + [0]*(nh-nw)) # Composition: 1=woman, 0=man
    x = []                                # Counts of tables without any woman
    p = 0                                 # Probability of there is no woman at table A  

    for k in range(N):
        c = np.random.permutation(comp)   # Random permutation, table composition
        w = countWomen(c, nt=nt)          # Count Woman per table
        nc = np.sum(w!=0)                 # Count how many tables with women 
        x.append(nt - nc)                 # Store count of tables without any woman
        p += int(w[0]==0)                 # Is table A empty?
        #if k % 100 == 0:
            #print(c, w, nc, nt-nc, p)

    # Rationalize (count->frequency)
    r = collections.Counter(x)
    r = {k:r.get(k, 0)/N for k in range(nt+1)}
    p /= N
    return r, p

Performing the job:

for n in [100, 1000, 10000]:
    s = runMonteCarlo(N=n)
    E = sum([k*v for k,v in s[0].items()])
    print('N=%d, P(X=k) = %s, p=%s, E[X]=%s' % (n, *s, E))

Returns:

N=100, P(X=k) = {0: 0.0, 1: 0.43, 2: 0.54, 3: 0.03, 4: 0.0}, p=0.38, E[X]=1.6
N=1000, P(X=k) = {0: 0.0, 1: 0.428, 2: 0.543, 3: 0.029, 4: 0.0}, p=0.376, E[X]=1.601
N=10000, P(X=k) = {0: 0.0, 1: 0.442, 2: 0.5235, 3: 0.0345, 4: 0.0}, p=0.4011, E[X]=1.5924999999999998

Plotting the distribution, it leads to:

import pandas as pd
axe = pd.DataFrame.from_dict(s[0], orient='index').plot(kind='bar')
axe.set_title("Monte Carlo Simulation")
axe.set_xlabel('Random Variable, $X$')
axe.set_ylabel('Frequency, $F(X=k)$')
axe.grid()

Divergence with alternative version

Caution: this method does not answer the stated problem!

If we implement an another version of the simulation where we change the way random experiment is performed as follow:

import random
import collections

def runMonteCarlo2(nw=3, nh=20, nt=4, N=20):
    """
    Run Monte Carlo Simulation
    """

    def one_experiment(nt, nw):
        """
        Table setup (suggested by @Inon Peled)
        """
        return set(random.randint(0, nt-1) for _ in range(nw)) # Sample nw times from 0 <= k <= nt-1

    c = collections.Counter()             # Empty Table counter
    p = 0                                 # Probability of there is no woman at table A  

    for k in range(N):
        exp = one_experiment(nt, nw)      # Select table with at least one woman
        c.update([nt - len(exp)])         # Update Counter X distribution
        p += int(0 not in exp)            # There is no woman at table A (table 0)

    # Rationalize:
    r = {k:c.get(k, 0)/N for k in range(nt+1)}
    p /= N

    return r, p

It returns:

N=100, P(X=k) = {0: 0.0, 1: 0.41, 2: 0.51, 3: 0.08, 4: 0.0}, p=0.4, E[X]=1.67
N=1000, P(X=k) = {0: 0.0, 1: 0.366, 2: 0.577, 3: 0.057, 4: 0.0}, p=0.426, E[X]=1.691
N=1000000, P(X=k) = {0: 0.0, 1: 0.37462, 2: 0.562787, 3: 0.062593, 4: 0.0}, p=0.42231, E[X]=1.687973

This second version converges towards another values, and it is clearly not equivalent to the first version, it does not answer the same question.

Discussion

To discriminate which implementation is the correct one I have computed sampled spaces and probabilities for both implementations. It seems the first version is the correct one because it takes into account that probability of a woman to seat at a table is dependent of who have been selected before. The second version does not take it into account, this is why it does not need to know about how many humans there are and how many people can seat per table.

This is a nice problem to ask because both answers provide close results. A important part of the work is to well setup the Monte Carlo inputs.

Answer 2

You can multiply items inside a collection using functools.reduce in Python 3.x .

from functools import reduce
event_probability = reduce(lambda x, y: x*y, collection)

So in your code:

from functools import reduce

T = 4       # number of tables
N = 20      # number of persons. Assumption: N is a multiple of T.
K = 5       # capacity per table
W = 3       # number of women. Assumption: first W of N persons are women.
M = 100      #number of trials

collection = []

for i in range(K):
    x = (((N-W)-i)/(N-i))
    collection.append(x)

event_probability = reduce(lambda x, y: x*y, collection)

print(collection)
print(event_probability)

Output:

[0.85, 0.8421052631578947, 0.8333333333333334, 0.8235294117647058, 0.8125] # collection
0.3991228070175438 # event_probability

Then you can use the result to complete your code.

Answer 3

Do you have to explicitly simulate the sittings? If not, then simply draw 3 times at random with replacement from 1..4 to simulate one sitting, that is:

def one_experiment():
    return set(random.randint(1, 4) for _ in range(3))  # Distinct tables with women.

The desired values are then obtained as follows, where N is the number of experiments for any case.

expectation_of_X = sum(4 - len(one_experiment()) for _ in range(N)) / float(N)
probability_no_women_table_1 = sum(1 not in one_experiment() for _ in range(N)) / float(N)

For large N, the values you get should be approximately p = (3 / 4)^3 and E[X] = (3^3) / (4^2).

On Monte Carlo Probability syntax

Question

3 answers

solution1
2 ACCPTED 2019-03-20 09:22:09

Implementation

Divergence with alternative version

Discussion

solution2
1 2019-03-20 08:50:43

solution3
0 2019-03-20 08:07:24

On Monte Carlo Probability syntax

Question

3 answers

solution1 2 ACCPTED 2019-03-20 09:22:09

Implementation

Divergence with alternative version

Discussion

solution2 1 2019-03-20 08:50:43

solution3 0 2019-03-20 08:07:24

solution1
2 ACCPTED 2019-03-20 09:22:09

solution2
1 2019-03-20 08:50:43

solution3
0 2019-03-20 08:07:24