[英]Binning a numpy array
I have a numpy array which contains time series data. 我有一个包含时间序列数据的numpy数组。 I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not the same size) and then calculate the mean of each of those bins. 我想将该数组分成给定长度的相等分区(如果它的大小不同,可以删除最后一个分区),然后计算每个分区的平均值。
I suspect there is numpy, scipy, or pandas functionality to do this. 我怀疑有这样的numpy,scipy或pandas功能。
example: 例:
data = [4,2,5,6,7,5,4,3,5,7]
for a bin size of 2: 对于bin大小为2:
bin_data = [(4,2),(5,6),(7,5),(4,3),(5,7)]
bin_data_mean = [3,5.5,6,3.5,6]
for a bin size of 3: 对于bin大小为3:
bin_data = [(4,2,5),(6,7,5),(4,3,5)]
bin_data_mean = [7.67,6,4]
Just use reshape
and then mean(axis=1)
. 只需使用reshape
然后mean(axis=1)
。
As the simplest possible example: 作为最简单的例子:
import numpy as np
data = np.array([4,2,5,6,7,5,4,3,5,7])
print data.reshape(-1, 2).mean(axis=1)
More generally, we'd need to do something like this to drop the last bin when it's not an even multiple: 更一般地说,当它不是偶数倍时,我们需要做这样的事情来删除最后一个bin:
import numpy as np
width=3
data = np.array([4,2,5,6,7,5,4,3,5,7])
result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)
print result
Since you already have a numpy array, to avoid for loops, you can use reshape
and consider the new dimension to be the bin: 由于您已经有一个numpy数组,为了避免for循环,您可以使用reshape
并将新维度视为bin:
In [33]: data.reshape(2, -1)
Out[33]:
array([[4, 2, 5, 6, 7],
[5, 4, 3, 5, 7]])
In [34]: data.reshape(2, -1).mean(0)
Out[34]: array([ 4.5, 3. , 4. , 5.5, 7. ])
Actually this will just work if the size of data
is divisible by n
. 实际上,如果data
的大小可被n
整除,这将起作用。 I'll edit a fix. 我会编辑一个修复程序。
Looks like Joe Kington has an answer that handles that. 看起来Joe Kington 有一个解决方案 。
Try this, using standard Python (NumPy isn't necessary for this). 尝试使用标准Python(NumPy不是必需的)。 Assuming Python 2.x is in use: 假设Python 2.x正在使用中:
data = [ 4, 2, 5, 6, 7, 5, 4, 3, 5, 7 ]
# example: for n == 2
n=2
partitions = [data[i:i+n] for i in xrange(0, len(data), n)]
partitions = partitions if len(partitions[-1]) == n else partitions[:-1]
# the above produces a list of lists
partitions
=> [[4, 2], [5, 6], [7, 5], [4, 3], [5, 7]]
# now the mean
[sum(x)/float(n) for x in partitions]
=> [3.0, 5.5, 6.0, 3.5, 6.0]
I just wrote a function to apply it to all array size or dimension you want. 我刚写了一个函数将它应用到你想要的所有数组大小或维度。
func is the function you want to apply to the bin (np.max for maxpooling, np.mean for an average ...) func是你想要应用于bin的函数(对于maxpooling是np.max,对于平均值是np.mean ...)
def binArray(data, axis, binstep, binsize, func=np.nanmean): data = np.array(data) dims = np.array(data.shape) argdims = np.arange(data.ndim) argdims[0], argdims[axis]= argdims[axis], argdims[0] data = data.transpose(argdims) data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)] data = np.array(data).transpose(argdims) return data
In you case it will be : 在你的情况下,它将是:
data = [4,2,5,6,7,5,4,3,5,7]
bin_data_mean = binArray(data, 0, 2, 2, np.mean)
or for the bin size of 3: 或者对于3的bin大小:
bin_data_mean = binArray(data, 0, 3, 3, np.mean)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.