[英]How to use numpy to calculate mean and standard deviation of an irregular shaped array
I have a numpy array that has many samples in it of varying length 我有一个numpy数组,其中有许多不同长度的样本
Samples = np.array([[1001, 1002, 1003],
... ,
[1001, 1002]])
I want to (elementwise) subtract the mean of the array then divide by the standard deviation of the array. 我想(基本)减去数组的平均值,然后除以数组的标准偏差。 Something like:
就像是:
newSamples = (Samples-np.mean(Samples))/np.std(Samples)
Except that doesn't work for irregular shaped arrays, 除非这不适用于不规则形状的阵列,
np.mean(Samples) causes np.mean(Samples)原因
unsupported operand type(s) for /: 'list' and 'int'
due to what I assume to be it having set a static size for each axis and then when it encounters a different sized sample it can't handle it. 由于我假设它已经为每个轴设置了静态大小,然后在遇到不同大小的样本时无法处理它。 What is an approach to solve this using numpy?
使用numpy解决此问题的方法是什么?
example input: 输入示例:
Sample = np.array([[1, 2, 3],
[1, 2]])
After subtracting by the mean and then dividing by standard deviation: 用平均值减去然后除以标准偏差后:
Sample = array([[-1.06904497, 0.26726124, 1.60356745],
[-1.06904497, 0.26726124]])
Don't make ragged arrays. 不要制作参差不齐的数组。 Just don't.
只是不要。
Numpy
can't do much with them, and any code you might make for them will always be unreliable and slow because numpy
doesn't work that way. Numpy
不能对它们做太多事情,并且您可能为它们编写的任何代码总是不可靠且缓慢,因为numpy
不能那样工作。 It turns them into object
dtypes: 它将它们变成
object
dtype:
Sample
array([[1, 2, 3], [1, 2]], dtype=object)
Which almost no numpy
functions work on. 几乎没有
numpy
函数可以使用。 In this case those objects are list
objects, which makes your code even more confusing as you either have to switch between list
and ndarray
methods, or stick to list-safe numpy
methods. 在这种情况下,这些对象是
list
对象,这使您的代码更加混乱,因为您必须在list
和ndarray
方法之间切换,或者坚持使用列表安全的numpy
方法。 This a recipe for disaster as anyone noodling around with the code later (even yourself if you forget) will be dancing in a minefield. 这是一个灾难的秘诀,因为任何人稍后在代码中闲逛(即使您自己也忘记了),都将在雷区中跳舞。
There's two things you can do with your data to make things work better: 您可以通过两件事来使数据工作得更好:
First method is to index and flatten. 第一种方法是索引和展平。
i = np.cumsum(np.array([len(x) for x in Sample]))
flat_sample = np.hstack(Sample)
This preserves the index of the end of each sample in i
, while keeping the sample as a 1D array 这样可以保留
i
中每个样本结尾的索引,同时将样本保留为一维数组
The other method is to pad one dimension with np.nan
and use nan
-safe functions 另一种方法是使用
np.nan
填充一维并使用nan
np.nan
函数
m = np.array([len(x) for x in Sample]).max()
nan_sample = np.array([x + [np.nan] * (m - len(x)) for x in Sample])
So to do your calculations, you can use flat_sample
and do similar to above: 因此,要进行计算,可以使用
flat_sample
并执行与上面类似的操作:
new_flat_sample = (flat_sample - np.mean(flat_sample)) / np.std(flat_sample)
and use i
to recreate your original array (or list of arrays, which I recommend:, see np.split
). 并使用
i
重新创建您的原始数组(或我建议的数组列表:,请参阅np.split
)。
new_list_sample = np.split(new_flat_sample, i[:-1])
[array([-1.06904497, 0.26726124, 1.60356745]),
array([-1.06904497, 0.26726124])]
Or use nan_sample
, but you will need to replace np.mean
and np.std
with np.nanmean
and np.nanstd
或使用
nan_sample
,但您需要将np.mean
和np.std
替换为np.nanmean
和np.nanstd
new_nan_sample = (nan_sample - np.nanmean(nan_sample)) / np.nanstd(nan_sample)
array([[-1.06904497, 0.26726124, 1.60356745],
[-1.06904497, 0.26726124, nan]])
@MichaelHackman (following the comment remark). @MichaelHackman(在评论之后)。 That's weird because when I compute the overall std and mean then apply it, I obtain different result (see code below).
这很奇怪,因为当我计算总体std并平均然后应用它时,我得到了不同的结果(请参见下面的代码)。
import numpy as np
Samples = np.array([[1, 2, 3],
[1, 2]])
c = np.hstack(Samples) # Will gives [1,2,3,1,2]
mean, std = np.mean(c), np.std(c)
newSamples = np.asarray([(np.array(xi)-mean)/std for xi in Samples])
print newSamples
# [array([-1.06904497, 0.26726124, 1.60356745]), array([-1.06904497, 0.26726124])]
edit : Add np.asarray(), put mean,std
computation outside loop following Imanol Luengo's excellent comments (Thanks!) 编辑 :添加np.asarray(),在Imanol Luengo的精彩评论之后
mean,std
在循环外放置mean,std
计算(谢谢!)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.