[英]How to groupby multiple columns in pandas based on name?
I want to create a new dataframe with columns calculated as means of columns with similar names from this dataframe:我想创建一个新的数据框,其列计算为从此数据框中具有相似名称的列的平均值:
B6_i B6_ii B6_iii ... BXD80_i BXD80_ii BXD81_i
data ...
Cd38 0.598864 -0.225322 0.306926 ... -0.312190 0.281429 0.424752
Trim21 1.947399 2.920681 2.805861 ... 1.469634 2.103585 0.827487
Kpnb1 -0.458240 -0.417507 -0.441522 ... -0.314313 -0.153509 -0.095863
Six1 1.055255 0.868148 1.012298 ... 0.142565 0.264753 0.807692
The new dataframe should look like this:新的数据框应如下所示:
B6 BXD80 ... BXD81
data
Cd38 -0.041416 -0.087859 ... 0.424752
Trim21 15.958981 3.091500 ... 0.827487
Kpnb1 -0.084471 0.048250 ... -0.095863
Six1 0.927383 0.037745 ... 0.807692
(like (B6_i + B6_ii + B6_iii)/3), based on all characters until the underscore "_") (如 (B6_i + B6_ii + B6_iii)/3),基于直到下划线“_”的所有字符)
Some columns are one of n columns, and others are singular (like 'BXD81_i'), so I need a method that can work with a varying number for each mean calculation.有些列是 n 列之一,而其他列是单数的(如“BXD81_i”),所以我需要一种方法,可以为每个平均计算使用不同的数字。
You can aggregate mean
per columns by values before _
:您可以按
_
之前的值汇总每列的mean
:
df.columns = df.columns.str.split('_', expand=True)
df1 = df.groupby(level=0, axis=1).mean()
Or:或者:
df1 = df.groupby(lambda x: x.split('_')[0], axis=1).mean()
print (df1)
B6 BXD80 BXD81
data
Cd38 0.226823 -0.015381 0.424752
Trim21 2.557980 1.786609 0.827487
Kpnb1 -0.439090 -0.233911 -0.095863
Six1 0.978567 0.203659 0.807692
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.