简体   繁体   English

Binning a numpy array

[英]Binning a numpy array

I have a numpy array which contains time series data. 我有一个包含时间序列数据的numpy数组。 I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not the same size) and then calculate the mean of each of those bins. 我想将该数组分成给定长度的相等分区(如果它的大小不同,可以删除最后一个分区),然后计算每个分区的平均值。

I suspect there is numpy, scipy, or pandas functionality to do this. 我怀疑有这样的numpy,scipy或pandas功能。

example: 例:

data = [4,2,5,6,7,5,4,3,5,7]

for a bin size of 2: 对于bin大小为2:

bin_data = [(4,2),(5,6),(7,5),(4,3),(5,7)]
bin_data_mean = [3,5.5,6,3.5,6]

for a bin size of 3: 对于bin大小为3:

bin_data = [(4,2,5),(6,7,5),(4,3,5)]
bin_data_mean = [7.67,6,4]

Just use reshape and then mean(axis=1) . 只需使用reshape然后mean(axis=1)

As the simplest possible example: 作为最简单的例子:

import numpy as np

data = np.array([4,2,5,6,7,5,4,3,5,7])

print data.reshape(-1, 2).mean(axis=1)

More generally, we'd need to do something like this to drop the last bin when it's not an even multiple: 更一般地说,当它不是偶数倍时,我们需要做这样的事情来删除最后一个bin:

import numpy as np

width=3
data = np.array([4,2,5,6,7,5,4,3,5,7])

result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)

print result

Since you already have a numpy array, to avoid for loops, you can use reshape and consider the new dimension to be the bin: 由于您已经有一个numpy数组,为了避免for循环,您可以使用reshape并将新维度视为bin:

In [33]: data.reshape(2, -1)
Out[33]: 
array([[4, 2, 5, 6, 7],
       [5, 4, 3, 5, 7]])

In [34]: data.reshape(2, -1).mean(0)
Out[34]: array([ 4.5,  3. ,  4. ,  5.5,  7. ])

Actually this will just work if the size of data is divisible by n . 实际上,如果data的大小可被n整除,这将起作用。 I'll edit a fix. 我会编辑一个修复程序。

Looks like Joe Kington has an answer that handles that. 看起来Joe Kington 有一个解决方案

Try this, using standard Python (NumPy isn't necessary for this). 尝试使用标准Python(NumPy不是必需的)。 Assuming Python 2.x is in use: 假设Python 2.x正在使用中:

data = [ 4, 2, 5, 6, 7, 5, 4, 3, 5, 7 ]

# example: for n == 2
n=2
partitions = [data[i:i+n] for i in xrange(0, len(data), n)]
partitions = partitions if len(partitions[-1]) == n else partitions[:-1]

# the above produces a list of lists
partitions
=> [[4, 2], [5, 6], [7, 5], [4, 3], [5, 7]]

# now the mean
[sum(x)/float(n) for x in partitions]
=> [3.0, 5.5, 6.0, 3.5, 6.0]

I just wrote a function to apply it to all array size or dimension you want. 我刚写了一个函数将它应用到你想要的所有数组大小或维度。

  • data is your array 数据就是你的数组
  • axis is the axis you want to been axis是您想要的轴
  • binstep is the number of points between each bin (allow overlapping bins) binstep是每个bin之间的点数(允许重叠的bin)
  • binsize is the size of each bin binsize是每个bin的大小
  • func is the function you want to apply to the bin (np.max for maxpooling, np.mean for an average ...) func是你想要应用于bin的函数(对于maxpooling是np.max,对于平均值是np.mean ...)

     def binArray(data, axis, binstep, binsize, func=np.nanmean): data = np.array(data) dims = np.array(data.shape) argdims = np.arange(data.ndim) argdims[0], argdims[axis]= argdims[axis], argdims[0] data = data.transpose(argdims) data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)] data = np.array(data).transpose(argdims) return data 

In you case it will be : 在你的情况下,它将是:

data = [4,2,5,6,7,5,4,3,5,7]
bin_data_mean = binArray(data, 0, 2, 2, np.mean)

or for the bin size of 3: 或者对于3的bin大小:

bin_data_mean = binArray(data, 0, 3, 3, np.mean)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM