使用來自pythons pandas數據幀的數據來從正態分布中進行采樣

Question

我正在嘗試使用存儲在pandas DataFrame中的均值和標准偏差從正態分布中進行采樣。

例如：

means= numpy.arange(10)
means=means.reshape(5,2)

生產：

和：

sts=numpy.arange(10,20)
sts=sts.reshape(5,2)

生產：

如何生成具有相同形狀但使用相應平均值和標准偏差從正態分布中采樣的值的另一個pandas數據幀。

即，該新數據幀的位置0,0將從正態分布中采樣，其中mean=0且standard deviation=10 ，依此類推。

到目前為止我的功能：

    def make_distributions(self):
        num_data_points,num_species= self.means.shape
        samples=[]
        for i,j in zip(self.means,self.stds):
            for k,l in  zip(self.means[i],self.stds[j]):
                samples.append( numpy.random.normal(k,l,self.n) )

我會從發行版中抽樣，但是我很難將數據放回到與平均值和標准差dfs相同的形狀數據框中。 有沒有人對如何做到這一點有任何建議？

提前致謝。

Answer 1

您可以使用numpy.random.normal從隨機正態分布中進行采樣。
IIUC，那么這可能是最簡單的，利用broadcasting ：

import numpy as np
np.random.seed(1) # only for demonstration
np.random.normal(means,sts)

array([[ 16.24345364,  -5.72932055],
       [ -4.33806103, -10.94859209],
       [ 16.11570681, -29.52308045],
       [ 33.91698823,  -5.94051732],
       [ 13.74270373,   4.26196287]])

檢查它是否有效：

np.random.seed(1)
print np.random.normal(0,10)
print np.random.normal(1,11)

16.2434536366
-5.72932055015

如果你需要一個pandas DataFrame：

import pandas as pd
pd.DataFrame(np.random.normal(means,sts))

Answer 2

我將使用字典來構造這個數據幀。 假設平均值和標准的索引和列相同：

means= numpy.arange(10)
means=pd.DataFrame(means.reshape(5,2))
stds=numpy.arange(10,20)
stds=pd.DataFrame(sts.reshape(5,2))

samples={}
for i in means.columns:
    col={}
    for j in means.index:
        col[j]=numpy.random.normal(means.ix[j,i],stds.ix[j,i],2)
    samples[i]=col

print(pd.DataFrame(samples))

#                                  0                                1
#0  [0.0760974520154, 3.29439282825]  [11.1292510583, 0.318246201796]
#1   [-25.4518020981, 19.2176263823]   [17.0826945017, 9.36179435872]
#2    [14.5402484325, 8.33808246538]   [6.96459947914, 26.5552235093]
#3  [0.775891790613, -2.09168601369]   [2.38723023677, 15.8099942902]
#4  [-0.828518484847, 45.4592922652]   [26.8088977308, 16.0818556353]

或者重置DataFrame的dtype並重新分配值：

import itertools
samples = means * 0
samples = samples.astype(object)

for i,j in itertools.product(means.index, means.columns):
    samples.set_value(i,j,numpy.random.normal(means.ix[i,j],stds.ix[i,j],2))

使用來自pythons pandas數據幀的數據來從正態分布中進行采樣

問題描述

2 個解決方案

解決方案1
4 已采納 2016-03-18 15:19:05

解決方案2
1 2016-03-18 15:03:23

使用來自pythons pandas數據幀的數據來從正態分布中進行采樣

問題描述

2 個解決方案

解決方案1 4 已采納 2016-03-18 15:19:05

解決方案2 1 2016-03-18 15:03:23

解決方案1
4 已采納 2016-03-18 15:19:05

解決方案2
1 2016-03-18 15:03:23