如何根据名称对熊猫中的多列进行分组？

Question

I want to create a new dataframe with columns calculated as means of columns with similar names from this dataframe:我想创建一个新的数据框，其列计算为从此数据框中具有相似名称的列的平均值：

            B6_i     B6_ii    B6_iii  ...  BXD80_i   BXD80_ii   BXD81_i
data                                  ...                              
Cd38    0.598864 -0.225322  0.306926  ... -0.312190  0.281429  0.424752
Trim21  1.947399  2.920681  2.805861  ...  1.469634  2.103585  0.827487
Kpnb1  -0.458240 -0.417507 -0.441522  ... -0.314313 -0.153509 -0.095863
Six1    1.055255  0.868148  1.012298  ...  0.142565  0.264753  0.807692

The new dataframe should look like this:新的数据框应如下所示：

               B6     BXD80  ...    BXD81
data                                 
Cd38    -0.041416 -0.087859  ...  0.424752
Trim21  15.958981  3.091500  ...  0.827487
Kpnb1   -0.084471  0.048250  ... -0.095863
Six1     0.927383  0.037745  ...  0.807692

(like (B6_i + B6_ii + B6_iii)/3), based on all characters until the underscore "_") （如 (B6_i + B6_ii + B6_iii)/3），基于直到下划线“_”的所有字符）

Some columns are one of n columns, and others are singular (like 'BXD81_i'), so I need a method that can work with a varying number for each mean calculation.有些列是 n 列之一，而其他列是单数的（如“BXD81_i”），所以我需要一种方法，可以为每个平均计算使用不同的数字。

Answer 1

You can aggregate mean per columns by values before _ :您可以按_之前的值汇总每列的mean ：

df.columns = df.columns.str.split('_', expand=True)

df1 = df.groupby(level=0, axis=1).mean()

Or:或者：

df1 = df.groupby(lambda x: x.split('_')[0], axis=1).mean()
print (df1)
              B6     BXD80     BXD81
data                                
Cd38    0.226823 -0.015381  0.424752
Trim21  2.557980  1.786609  0.827487
Kpnb1  -0.439090 -0.233911 -0.095863
Six1    0.978567  0.203659  0.807692

如何根据名称对熊猫中的多列进行分组？

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-27 05:33:25

如何根据名称对熊猫中的多列进行分组？

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-27 05:33:25

解决方案1
1 已采纳 2022-05-27 05:33:25