简体   繁体   English

来自不同形状的NumPy数组集合的组合均值和标准差

[英]Combined mean and standard deviation from a collection of NumPy arrays of different shapes

Let's say I have Numpy arrays with shapes 假设我的Numpy数组具有形状

(682, 89, 138)
(2668, 76, 89)
(491, 62, 48)

How should I calculate the mean and standard deviation of all three arrays combined? 我应该如何计算所有三个数组的平均值和标准偏差? If they were the same shapes, I could use np.stack() and then get the mean and std of the resulting array. 如果它们是相同的形状,则可以使用np.stack() ,然后获取结果数组的均值和标准差。

Is it possible to do this with different sized dimensions? 可以使用不同尺寸的尺寸来做到这一点吗? Or would I have to reshape before getting the mean and std? 还是在获得平均值和标准前必须重塑?

We could use the formula of standard deviation and mean to compute those two scalar values for all input arrays without concatenating/stacking (that could be costly specially on large NumPy arrays). 我们可以使用standard deviation的公式和mean来计算所有输入数组的两个标量值,而无需级联/堆叠(这在大型NumPy数组上可能特别昂贵)。 Let's do it in steps - mean and then standard deviation, as it seems we could use mean in std computations. 让我们逐步进行操作-均值,然后是标准偏差,因为我们似乎可以在std计算中使用mean

Getting the combined mean value : 获取组合平均值:

So, we will start with the mean/averaging. 因此,我们将从均值/平均数开始。 For this, we would get the summation scalar for each array. 为此,我们将获得每个数组的总和标量。 Then, get the total summation and finally divide by the number of elements in all arrays. 然后,获得总和,最后除以所有数组中的元素数。

Getting the combined standard deviation value : 获得组合的标准偏差值:

For standard deviation, we have the formula as : 对于标准偏差,我们的公式为:

在此处输入图片说明

So, we will use the combined mean value obtained from previous step, use the std formula to get the squared differentiation, divide by the total number of elements across all arrays and then apply square root. 因此,我们将使用从上一步获得的组合平均值,使用std公式获得平方微分,除以所有数组中元素的总数,然后应用平方根。

Implementation 履行

Let's say the input arrays are a and b , we would have one solution, like so - 假设输入数组是ab ,我们将有一个解决方案,像这样-

N = float(a.size + b.size)
mean_ = (a.sum() + b.sum())/N
std_ = np.sqrt((((a - mean_)**2).sum() + ((b - mean_)**2).sum())/N)

Sample run for verification 样品运行以进行验证

In [266]: a = np.random.rand(3,4,2)
     ...: b = np.random.rand(2,5,3)
     ...: 

In [267]: N = float(a.size + b.size)
     ...: mean_ = (a.sum() + b.sum())/N
     ...: std_ = np.sqrt((((a - mean_)**2).sum() + ((b - mean_)**2).sum())/N)
     ...: 

In [268]: mean_
Out[268]: 0.47854757879348042

In [270]: std_
Out[270]: 0.27890341338373376

Now, to verify, let's stack and then use relevant ufuncs - 现在,进行验证,让我们堆叠然后使用相关的ufunc-

In [271]: A = np.hstack((a.ravel(), b.ravel()))

In [273]: A.mean()
Out[273]: 0.47854757879348037

In [274]: A.std()
Out[274]: 0.27890341338373376

List of arrays as input 数组列表作为输入

For a list holding all those arrays, we need to iterate through them, like so - 对于包含所有这些数组的列表,我们需要遍历它们,就像这样-

A = [a,b,c] # input list of arrays

N = float(sum([i.size for i in A]))
mean_ = sum([i.sum() for i in A])/N
std_ = np.sqrt(sum([((i-mean_)**2).sum() for i in A])/N)

Sample run - 样品运行-

In [301]: a = np.random.rand(3,4,2)
     ...: b = np.random.rand(2,5,3)
     ...: c = np.random.rand(7,4)
     ...: 

In [302]: A = [a,b,c] # input list of arrays
     ...: N = float(sum([i.size for i in A]))
     ...: mean_ = sum([i.sum() for i in A])/N
     ...: std_ = np.sqrt(sum([((i-mean_)**2).sum() for i in A])/N)
     ...: print mean_, std_
     ...: 
0.47703535428 0.293308550786

In [303]: A = np.hstack((a.ravel(), b.ravel(), c.ravel()))
     ...: print A.mean(), A.std()
     ...: 
0.47703535428 0.293308550786

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM