[英]Aggregation of several columns in pandas
I have the following data in dataframe df: 我在数据框df中有以下数据:
VALUE COUNT REGION ID
235 15 LP 139
355 59 LP 102
421 8 LP 127
427 227 LP 90
439 4 LP 133
235 45 UP 139
355 231 UP 102
421 756 UP 127
427 23 UP 90
439 76 UP 133
I want to combine the data frame such that the rows with the common 'VALUE' are merged. 我想合并数据框,以便合并具有常见“ VALUE”的行。 Their count should be added up. 他们的数量应该加起来。 However, the column REGION need not be included in the final dataframe. 但是,REGION列不必包含在最终数据框中。 I tried the following: 我尝试了以下方法:
df.groupby(['VALUE'])['COUNT'].sum()
How do I get it to return all columns (except REGION)? 如何获取它以返回所有列(REGION除外)?
You can tell aggregate
to perform multiple actions on multiple columns. 您可以告诉aggregate
对多个列执行多个操作。
You did not mention what you want to do with the ID
column, so here I take the first. 您没有提及要对ID
列进行的操作,因此我在这里首先介绍。 Columns that can't be summed are usually silently dropped, and so is the case here. 不能求和的列通常会被静默删除,这里就是这种情况。
In [51]: df.groupby('VALUE').aggregate({'COUNT':np.sum, 'ID':lambda x:x.iloc[0]})
Out[51]:
COUNT ID
VALUE
235 60 139
355 290 102
421 764 127
427 250 90
439 80 133
(in general, the docs of groupby is one of the most useful pieces of info you'll read regarding pandas imho) (通常,groupby的文档是您将阅读的有关熊猫恕我直言的最有用的信息之一)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.