计算表中每 x 行的平均值并创建新表

Question

I have a long table of data (~200 rows by 50 columns) and I need to create a code that can calculate the mean values of every two rows and for each column in the table with the final output being a new table of the mean values.我有一个很长的数据表（约 200 行 x 50 列），我需要创建一个代码来计算表中每两行和每一列的平均值，最终输出是一个新的平均值表值。 This is obviously crazy to do in Excel!这在 Excel 中显然很疯狂！ I use python3 and I am aware of some similar questions: here , here and here .我使用 python3，我知道一些类似的问题： here 、 here和here 。 But none of these helps as I need some elegant code to work with multiple columns and produces an organised data table.但是这些都没有帮助，因为我需要一些优雅的代码来处理多列并生成一个有组织的数据表。 By the way my original datatable has been imported using pandas and is defined as a dataframe but could not find an easy way to do this in pandas.顺便说一下，我的原始数据表是使用 Pandas 导入的，并被定义为一个数据框，但在 Pandas 中找不到一种简单的方法来做到这一点。 Help is much appreciated.非常感谢帮助。

An example of the table (short version) is:该表的一个示例（简短版本）是：

a   b   c   d
2   50  25  26
4   11  38  44
6   33  16  25
8   37  27  25
10  28  48  32
12  47  35  45
14  8   16  7
16  12  16  30
18  22  39  29
20  9   15  47

Expected mean table:预期均值表：

a    b     c     d
3   30.5  31.5  35
7   35    21.5  25
11  37.5  41.5  38.5
15  10    16    18.5
19  15.5  27    38

Answer 1

You can create an artificial group using df.index//2 (or as @DSM pointed out, using np.arange(len(df))//2 - so that it works for all indices) and then use groupby:您可以使用df.index//2创建一个人工组（或如@DSM 指出的那样，使用np.arange(len(df))//2 - 以便它适用于所有索引），然后使用 groupby：

df.groupby(np.arange(len(df))//2).mean()
Out[13]: 
      a     b     c     d
0   3.0  30.5  31.5  35.0
1   7.0  35.0  21.5  25.0
2  11.0  37.5  41.5  38.5
3  15.0  10.0  16.0  18.5
4  19.0  15.5  27.0  38.0

Answer 2

You can approach this problem using pd.rolling() to create a rolling average and then just grab every second element using iloc您可以使用pd.rolling()创建滚动平均值来解决此问题，然后使用iloc抓取每个第二个元素

df = df.rolling(2).mean() 
df = df.iloc[::2, :]

Note that the first observation will be missing (ie the rolling starts at the top) so make sure to check that your data is sorted how you need it.请注意，第一个观察将丢失（即滚动从顶部开始），因此请确保检查您的数据是否按您需要的方式排序。

Answer 3

NumPythonic way would be to extract the elements as a NumPy array with df.values , then reshape to a 3D array with 2 elements along axis=1 and 4 along axis=2 and perform the average reduction along axis=1 and finally convert back to a dataframe, like so - NumPythonic方法是将提取的元素作为NumPy的阵列df.values ，然后重塑到3D阵列2沿元件axis=1和4沿着axis=2 ，并执行沿平均减少axis=1 ，最后转换回一个数据框，就像这样 -

pd.DataFrame(df.values.reshape(-1,2,df.shape[1]).mean(1))

As it turns out, you can introduce NumPy's very efficient tool : np.einsum to do this average-reduction as a combination of sum-reduction and scaling-down , like so -事实证明，您可以引入 NumPy 的非常有效的工具： np.einsum将这种average-reduction作为sum-reduction和scaling-down ，就像这样 -

pd.DataFrame(np.einsum('ijk->ik',df.values.reshape(-1,2,df.shape[1]))/2.0)

Please note that the proposed approaches assume that the number of rows is divisible by 2 .请注意，建议的方法假设行数可以被2整除。

Also as noted by @DSM , to preserve the column names, you need to add columns=df.columns when converting back to Dataframe, ie -同样正如noted by @DSM所noted by @DSM ，为了保留列名，您需要在转换回columns=df.columns时添加columns=df.columns df.columns，即 -

pd.DataFrame(...,columns=df.columns)

Sample run -样品运行 -

>>> df
    0   1   2   3
0   2  50  25  26
1   4  11  38  44
2   6  33  16  25
3   8  37  27  25
4  10  28  48  32
5  12  47  35  45
6  14   8  16   7
7  16  12  16  30
8  18  22  39  29
9  20   9  15  47
>>> pd.DataFrame(df.values.reshape(-1,2,df.shape[1]).mean(1))
    0     1     2     3
0   3  30.5  31.5  35.0
1   7  35.0  21.5  25.0
2  11  37.5  41.5  38.5
3  15  10.0  16.0  18.5
4  19  15.5  27.0  38.0
>>> pd.DataFrame(np.einsum('ijk->ik',df.values.reshape(-1,2,df.shape[1]))/2.0)
    0     1     2     3
0   3  30.5  31.5  35.0
1   7  35.0  21.5  25.0
2  11  37.5  41.5  38.5
3  15  10.0  16.0  18.5
4  19  15.5  27.0  38.0

Runtime tests -运行时测试 -

In this section, let's test out all the three approaches listed thus far to solve the problem for performance, including @ayhan's solution with groupby .在本节中，让我们测试迄今为止列出的所有三种方法来解决性能问题，包括@ayhan's solution with groupby 。

In [24]: A = np.random.randint(0,9,(200,50))

In [25]: df = pd.DataFrame(A)

In [26]: %timeit df.groupby(df.index//2).mean() # @ayhan's solution
1000 loops, best of 3: 1.61 ms per loop

In [27]: %timeit pd.DataFrame(df.values.reshape(-1,2,df.shape[1]).mean(1))
1000 loops, best of 3: 317 µs per loop

In [28]: %timeit pd.DataFrame(np.einsum('ijk->ik',df.values.reshape(-1,2,df.shape[1]))/2.0)
1000 loops, best of 3: 266 µs per loop

Answer 4

df.set_index(np.arange(len(df)) // 2).mean(level=0)

Answer 5

In your case, as you want to average the rows, assuming your dataframe name is new在您的情况下，由于您想平均行，假设您的数据框名称是new

new = new.groupby(np.arange(len(new)) // 2).mean()

If one wants to do the average for the columns如果想对列进行平均

new = new.groupby(np.arrange(len(new.columns)) // 2, axis=1).mean()

Answer 6

I got ValueError: Grouper and axis must be same length when I tried using numpy to create the artificial group.当我尝试使用numpy创建人工组时，出现ValueError: Grouper and axis must be same length 。 As an alternative, you can use itertools which will generate an iterator of equal length to your Dataframe:作为替代方案，您可以使用itertools ，它会生成一个与您的 Dataframe 长度相等的迭代器：

SAMPLE_SIZE = 2
label_series = pd.Series(itertools.chain.from_iterable(itertools.repeat(x, SAMPLE_SIZE) for x in df.index))
sampled_df = df.groupby(label_series).mean()

计算表中每 x 行的平均值并创建新表

问题描述

6 个解决方案

解决方案1
46 2016-04-23 12:13:12

解决方案2
20 2018-02-27 19:19:31

解决方案3
9 2016-04-23 12:18:46

解决方案4
5 2017-08-05 20:31:00

解决方案5
1 2021-01-07 13:23:23

解决方案6
0 2021-03-13 10:04:59

计算表中每 x 行的平均值并创建新表

问题描述

6 个解决方案

解决方案1 46 2016-04-23 12:13:12

解决方案2 20 2018-02-27 19:19:31

解决方案3 9 2016-04-23 12:18:46

解决方案4 5 2017-08-05 20:31:00

解决方案5 1 2021-01-07 13:23:23

解决方案6 0 2021-03-13 10:04:59

解决方案1
46 2016-04-23 12:13:12

解决方案2
20 2018-02-27 19:19:31

解决方案3
9 2016-04-23 12:18:46

解决方案4
5 2017-08-05 20:31:00

解决方案5
1 2021-01-07 13:23:23

解决方案6
0 2021-03-13 10:04:59