简体   繁体   English

关于蒙特卡洛概率语法

[英]On Monte Carlo Probability syntax

Let 20 people, including exactly 3 women, seat themselves randomly at 4 tables (denoted (A,B,C,D)) of 5 persons each, with all arrangements equally likely. 让20个人(包括恰好3位女性)随机坐在4张桌子(分别表示(A,B,C,D))中,每张桌子由5个人组成,所有布置的可能性均等。 Let X be the number of tables at which no women sit. 令X为没有女人坐在的桌子数。 Write a numpy Monte Carlo simulation to estimate the expectation of X and also estimate the probability p that no women sit at table A. Run the simulation for 3 cases (100,1000,10000) 编写一个numpy的蒙特卡洛模拟,以估计X的期望值,并还估计没有女性坐在表A上的概率p 。运行3个案例(100,1000,10000)

I would like to define a function that utilizes numpy's random.permutation function to calculate the expected value of X given an a variable number of trials I understand how to do this on pen and paper, iterating through my collection of probabilities and multiplying them by each other such that I can calculate the total probability of the event. 我想定义一个函数,该函数利用numpy的random.permutation函数来计算X的期望值(在无数次试验的情况下),我了解如何在笔和纸上进行此操作,遍历我的概率集合并将其乘以每个其他,这样我就可以计算出该事件的总概率。 This is what I have so far 这就是我到目前为止

T = 4       # number of tables
N = 20      # number of persons. Assumption: N is a multiple of T.
K = 5       # capacity per table
W = 3       # number of women. Assumption: first W of N persons are women.
M =100      #number of trials

collection = []

for i in range(K):


    x = (((N-W)-i)/(N-i))

    collection.append(x)

If I examine my collection, this is my output :[0.85, 0.8421052631578947, 0.8333333333333334, 0.8235294117647058, 0.8125] 如果我检查我的收藏,这是我的输出:[0.85,0.8421052631578947,0.8333333333333334,0.8235294117647058,0.8125]

Implementation 履行

Here is naïve implementation of your Monte-Carlo simulation. 这是您的蒙特卡洛模拟的简单实施。 It is not designed to be performant, instead it allows you to cross check setup and see details: 它并非旨在提高性能,而是允许您交叉检查设置并查看详细信息:

import collections
import numpy as np

def runMonteCarlo(nw=3, nh=20, nt=4, N=20):
    """
    Run Monte Carlo Simulation
    """

    def countWomen(c, nt=4):
        """
        Count Number of Women per Table
        """
        x = np.array(c).reshape(nt, -1).T  # Split permutation into tables
        return np.sum(x, axis=0)           # Sum woman per table

    # Initialization:
    comp = np.array([1]*nw + [0]*(nh-nw)) # Composition: 1=woman, 0=man
    x = []                                # Counts of tables without any woman
    p = 0                                 # Probability of there is no woman at table A  

    for k in range(N):
        c = np.random.permutation(comp)   # Random permutation, table composition
        w = countWomen(c, nt=nt)          # Count Woman per table
        nc = np.sum(w!=0)                 # Count how many tables with women 
        x.append(nt - nc)                 # Store count of tables without any woman
        p += int(w[0]==0)                 # Is table A empty?
        #if k % 100 == 0:
            #print(c, w, nc, nt-nc, p)

    # Rationalize (count->frequency)
    r = collections.Counter(x)
    r = {k:r.get(k, 0)/N for k in range(nt+1)}
    p /= N
    return r, p

Performing the job: 执行工作:

for n in [100, 1000, 10000]:
    s = runMonteCarlo(N=n)
    E = sum([k*v for k,v in s[0].items()])
    print('N=%d, P(X=k) = %s, p=%s, E[X]=%s' % (n, *s, E))

Returns: 返回:

N=100, P(X=k) = {0: 0.0, 1: 0.43, 2: 0.54, 3: 0.03, 4: 0.0}, p=0.38, E[X]=1.6
N=1000, P(X=k) = {0: 0.0, 1: 0.428, 2: 0.543, 3: 0.029, 4: 0.0}, p=0.376, E[X]=1.601
N=10000, P(X=k) = {0: 0.0, 1: 0.442, 2: 0.5235, 3: 0.0345, 4: 0.0}, p=0.4011, E[X]=1.5924999999999998

Plotting the distribution, it leads to: 绘制分布图,将导致:

import pandas as pd
axe = pd.DataFrame.from_dict(s[0], orient='index').plot(kind='bar')
axe.set_title("Monte Carlo Simulation")
axe.set_xlabel('Random Variable, $X$')
axe.set_ylabel('Frequency, $F(X=k)$')
axe.grid()

在此处输入图片说明

Divergence with alternative version 替代版本的差异

Caution: this method does not answer the stated problem! 注意:此方法不能解决所陈述的问题!

If we implement an another version of the simulation where we change the way random experiment is performed as follow: 如果我们实现仿真的另一个版本,则在其中更改执行随机实验的方式如下:

import random
import collections

def runMonteCarlo2(nw=3, nh=20, nt=4, N=20):
    """
    Run Monte Carlo Simulation
    """

    def one_experiment(nt, nw):
        """
        Table setup (suggested by @Inon Peled)
        """
        return set(random.randint(0, nt-1) for _ in range(nw)) # Sample nw times from 0 <= k <= nt-1

    c = collections.Counter()             # Empty Table counter
    p = 0                                 # Probability of there is no woman at table A  

    for k in range(N):
        exp = one_experiment(nt, nw)      # Select table with at least one woman
        c.update([nt - len(exp)])         # Update Counter X distribution
        p += int(0 not in exp)            # There is no woman at table A (table 0)

    # Rationalize:
    r = {k:c.get(k, 0)/N for k in range(nt+1)}
    p /= N

    return r, p

It returns: 它返回:

N=100, P(X=k) = {0: 0.0, 1: 0.41, 2: 0.51, 3: 0.08, 4: 0.0}, p=0.4, E[X]=1.67
N=1000, P(X=k) = {0: 0.0, 1: 0.366, 2: 0.577, 3: 0.057, 4: 0.0}, p=0.426, E[X]=1.691
N=1000000, P(X=k) = {0: 0.0, 1: 0.37462, 2: 0.562787, 3: 0.062593, 4: 0.0}, p=0.42231, E[X]=1.687973

This second version converges towards another values, and it is clearly not equivalent to the first version, it does not answer the same question. 第二个版本趋向另一个价值,并且显然不等同于第一个版本,它没有回答相同的问题。

在此处输入图片说明 在此处输入图片说明 在此处输入图片说明

Discussion 讨论

To discriminate which implementation is the correct one I have computed sampled spaces and probabilities for both implementations. 为了区分哪种实现是正确的,我计算了两种实现的采样空间和概率 It seems the first version is the correct one because it takes into account that probability of a woman to seat at a table is dependent of who have been selected before. 第一个版本似乎是正确的版本,因为它考虑到女性坐在一张桌子旁的可能性取决于之前被选中的人。 The second version does not take it into account, this is why it does not need to know about how many humans there are and how many people can seat per table. 第二个版本没有考虑到这一点,这就是为什么它不需要知道每桌有多少人以及可以坐多少人的原因。

This is a nice problem to ask because both answers provide close results. 这是一个很好的问题,因为两个答案都提供了接近的结果。 A important part of the work is to well setup the Monte Carlo inputs. 工作的重要部分是正确设置蒙特卡洛输入。

You can multiply items inside a collection using functools.reduce in Python 3.x . 您可以在Python 3.x中使用functools.reduce在集合中functools.reduce项目。

from functools import reduce
event_probability = reduce(lambda x, y: x*y, collection)

So in your code: 因此,在您的代码中:

from functools import reduce

T = 4       # number of tables
N = 20      # number of persons. Assumption: N is a multiple of T.
K = 5       # capacity per table
W = 3       # number of women. Assumption: first W of N persons are women.
M = 100      #number of trials

collection = []

for i in range(K):
    x = (((N-W)-i)/(N-i))
    collection.append(x)

event_probability = reduce(lambda x, y: x*y, collection)

print(collection)
print(event_probability)

Output: 输出:

[0.85, 0.8421052631578947, 0.8333333333333334, 0.8235294117647058, 0.8125] # collection
0.3991228070175438 # event_probability

Then you can use the result to complete your code. 然后,您可以使用结果完成代码。

Do you have to explicitly simulate the sittings? 您是否必须明确模拟坐姿? If not, then simply draw 3 times at random with replacement from 1..4 to simulate one sitting, that is: 如果不是,则简单地随机绘制3次,从1..4替换以模拟一个坐姿,即:

def one_experiment():
    return set(random.randint(1, 4) for _ in range(3))  # Distinct tables with women.

The desired values are then obtained as follows, where N is the number of experiments for any case. 然后按以下方式获得所需值,其中N是任何情况下的实验次数。

expectation_of_X = sum(4 - len(one_experiment()) for _ in range(N)) / float(N)
probability_no_women_table_1 = sum(1 not in one_experiment() for _ in range(N)) / float(N)

For large N, the values you get should be approximately p = (3 / 4)^3 and E[X] = (3^3) / (4^2). 对于较大的N,您获得的值应约为p =(3/4)^ 3和E [X] =(3 ^ 3)/(4 ^ 2)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM