Binning a numpy array

Question

I have a numpy array which contains time series data. 我有一个包含时间序列数据的numpy数组。 I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not the same size) and then calculate the mean of each of those bins. 我想将该数组分成给定长度的相等分区（如果它的大小不同，可以删除最后一个分区），然后计算每个分区的平均值。

I suspect there is numpy, scipy, or pandas functionality to do this. 我怀疑有这样的numpy，scipy或pandas功能。

example: 例：

data = [4,2,5,6,7,5,4,3,5,7]

for a bin size of 2: 对于bin大小为2：

bin_data = [(4,2),(5,6),(7,5),(4,3),(5,7)]
bin_data_mean = [3,5.5,6,3.5,6]

for a bin size of 3: 对于bin大小为3：

bin_data = [(4,2,5),(6,7,5),(4,3,5)]
bin_data_mean = [7.67,6,4]

Answer 1

Just use reshape and then mean(axis=1) . 只需使用reshape然后mean(axis=1) 。

As the simplest possible example: 作为最简单的例子：

import numpy as np

data = np.array([4,2,5,6,7,5,4,3,5,7])

print data.reshape(-1, 2).mean(axis=1)

More generally, we'd need to do something like this to drop the last bin when it's not an even multiple: 更一般地说，当它不是偶数倍时，我们需要做这样的事情来删除最后一个bin：

import numpy as np

width=3
data = np.array([4,2,5,6,7,5,4,3,5,7])

result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)

print result

Answer 2

Since you already have a numpy array, to avoid for loops, you can use reshape and consider the new dimension to be the bin: 由于您已经有一个numpy数组，为了避免for循环，您可以使用reshape并将新维度视为bin：

In [33]: data.reshape(2, -1)
Out[33]: 
array([[4, 2, 5, 6, 7],
       [5, 4, 3, 5, 7]])

In [34]: data.reshape(2, -1).mean(0)
Out[34]: array([ 4.5,  3. ,  4. ,  5.5,  7. ])

Actually this will just work if the size of data is divisible by n . 实际上，如果data的大小可被n整除，这将起作用。 I'll edit a fix. 我会编辑一个修复程序。

Looks like Joe Kington has an answer that handles that. 看起来Joe Kington 有一个解决方案。

Answer 3

Try this, using standard Python (NumPy isn't necessary for this). 尝试使用标准Python（NumPy不是必需的）。 Assuming Python 2.x is in use: 假设Python 2.x正在使用中：

data = [ 4, 2, 5, 6, 7, 5, 4, 3, 5, 7 ]

# example: for n == 2
n=2
partitions = [data[i:i+n] for i in xrange(0, len(data), n)]
partitions = partitions if len(partitions[-1]) == n else partitions[:-1]

# the above produces a list of lists
partitions
=> [[4, 2], [5, 6], [7, 5], [4, 3], [5, 7]]

# now the mean
[sum(x)/float(n) for x in partitions]
=> [3.0, 5.5, 6.0, 3.5, 6.0]

Answer 4

I just wrote a function to apply it to all array size or dimension you want. 我刚写了一个函数将它应用到你想要的所有数组大小或维度。

data is your array 数据就是你的数组
axis is the axis you want to been axis是您想要的轴
binstep is the number of points between each bin (allow overlapping bins) binstep是每个bin之间的点数（允许重叠的bin）
binsize is the size of each bin binsize是每个bin的大小

func is the function you want to apply to the bin (np.max for maxpooling, np.mean for an average ...) func是你想要应用于bin的函数（对于maxpooling是np.max，对于平均值是np.mean ...）

 def binArray(data, axis, binstep, binsize, func=np.nanmean): data = np.array(data) dims = np.array(data.shape) argdims = np.arange(data.ndim) argdims[0], argdims[axis]= argdims[axis], argdims[0] data = data.transpose(argdims) data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)] data = np.array(data).transpose(argdims) return data

In you case it will be : 在你的情况下，它将是：

data = [4,2,5,6,7,5,4,3,5,7]
bin_data_mean = binArray(data, 0, 2, 2, np.mean)

or for the bin size of 3: 或者对于3的bin大小：

bin_data_mean = binArray(data, 0, 3, 3, np.mean)

Binning a numpy array

问题描述

4 个解决方案

解决方案1
19 已采纳 2014-02-20 22:42:12

解决方案2
6 2014-02-20 22:40:13

解决方案3
5 2014-02-20 22:39:15

解决方案4
0 2017-02-03 12:49:34

Binning a numpy array

问题描述

4 个解决方案

解决方案1 19 已采纳 2014-02-20 22:42:12

解决方案2 6 2014-02-20 22:40:13

解决方案3 5 2014-02-20 22:39:15

解决方案4 0 2017-02-03 12:49:34

解决方案1
19 已采纳 2014-02-20 22:42:12

解决方案2
6 2014-02-20 22:40:13

解决方案3
5 2014-02-20 22:39:15

解决方案4
0 2017-02-03 12:49:34