Numpy - Averaging multiple columns of a 2D array

Question

Right now I am doing this by iterating, but there has to be a way to accomplish this task using numpy functions. My goal is to take a 2D array and average J columns at a time, producing a new array with the same number of rows as the original, but with columns/J columns.

So I want to take this:

J = 2 // two columns averaged at a time

[[1 2 3 4]
 [4 3 7 1]
 [6 2 3 4]
 [3 4 4 1]]

and produce this:

[[1.5 3.5]
 [3.5 4.0]
 [4.0 3.5]
 [3.5 2.5]]

Is there a simple way to accomplish this task? I also need a way such that if I never end up with an unaveraged remainder column. So if, for example, I have an input array with 5 columns and J=2, I would average the first two columns, then the last three columns.

Any help you can provide would be great.

Answer 1

data.reshape(-1,j).mean(axis=1).reshape(data.shape[0],-1)

If your j divides data.shape[1] , that is.

Example:

In [40]: data
Out[40]: 
array([[7, 9, 7, 2],
       [7, 6, 1, 5],
       [8, 1, 0, 7],
       [8, 3, 3, 2]])

In [41]: data.reshape(-1,j).mean(axis=1).reshape(data.shape[0],-1)
Out[41]: 
array([[ 8. ,  4.5],
       [ 6.5,  3. ],
       [ 4.5,  3.5],
       [ 5.5,  2.5]])

Answer 2

First of all, it looks to me like you're not averaging the columns at all, you're just averaging two data points at a time. Seems to me like your best off reshaping the array, so your that you have a Nx2 data structure that you can feed directly to mean . You may have to pad it first if the number of columns isn't quite compatible. Then at the end, just do a weighted average of the padded remainder column and the one before it. Finally reshape back to the shape you want.

To play off of the example provided by TheodrosZelleke:

In [1]: data = np.concatenate((data, np.array([[5, 6, 7, 8]]).T), 1)

In [2]: data
Out[2]: 
array([[7, 9, 7, 2, 5],
       [7, 6, 1, 5, 6],
       [8, 1, 0, 7, 7],
       [8, 3, 3, 2, 8]])

In [3]: cols = data.shape[1]

In [4]: j = 2

In [5]: dataPadded = np.concatenate((data, np.zeros((data.shape[0], j - cols % j))), 1)

In [6]: dataPadded
Out[6]: 
array([[ 7.,  9.,  7.,  2.,  5.,  0.],
       [ 7.,  6.,  1.,  5.,  6.,  0.],
       [ 8.,  1.,  0.,  7.,  7.,  0.],
       [ 8.,  3.,  3.,  2.,  8.,  0.]])

In [7]: dataAvg = dataPadded.reshape((-1,j)).mean(axis=1).reshape((data.shape[0], -1))

In [8]: dataAvg
Out[8]: 
array([[ 8. ,  4.5,  2.5],
       [ 6.5,  3. ,  3. ],
       [ 4.5,  3.5,  3.5],
       [ 5.5,  2.5,  4. ]])

In [9]: if cols % j:
    dataAvg[:, -2] = (dataAvg[:, -2] * j + dataAvg[:, -1] * (cols % j)) / (j + cols % j)
    dataAvg = dataAvg[:, :-1]
   ....:     

In [10]: dataAvg
Out[10]: 
array([[ 8.        ,  3.83333333],
       [ 6.5       ,  3.        ],
       [ 4.5       ,  3.5       ],
       [ 5.5       ,  3.        ]])

Numpy - Averaging multiple columns of a 2D array

Question

2 answers

solution1
4 2013-01-18 14:40:23

solution2
1 2013-01-18 14:40:57

Numpy - Averaging multiple columns of a 2D array

Question

2 answers

solution1 4 2013-01-18 14:40:23

solution2 1 2013-01-18 14:40:57

solution1
4 2013-01-18 14:40:23

solution2
1 2013-01-18 14:40:57