简体   繁体   English

熊猫中几列的汇总

[英]Aggregation of several columns in pandas

I have the following data in dataframe df: 我在数据框df中有以下数据:

VALUE   COUNT   REGION  ID
235     15      LP      139
355     59      LP      102
421     8       LP      127
427     227     LP      90
439     4       LP      133
235     45      UP      139
355     231     UP      102
421     756     UP      127
427     23      UP      90
439     76      UP      133

I want to combine the data frame such that the rows with the common 'VALUE' are merged. 我想合并数据框,以便合并具有常见“ VALUE”的行。 Their count should be added up. 他们的数量应该加起来。 However, the column REGION need not be included in the final dataframe. 但是,REGION列不必包含在最终数据框中。 I tried the following: 我尝试了以下方法:

df.groupby(['VALUE'])['COUNT'].sum()

How do I get it to return all columns (except REGION)? 如何获取它以返回所有列(REGION除外)?

You can tell aggregate to perform multiple actions on multiple columns. 您可以告诉aggregate对多个列执行多个操作。

You did not mention what you want to do with the ID column, so here I take the first. 您没有提及要对ID列进行的操作,因此我在这里首先介绍。 Columns that can't be summed are usually silently dropped, and so is the case here. 不能求和的列通常会被静默删除,这里就是这种情况。

In [51]: df.groupby('VALUE').aggregate({'COUNT':np.sum, 'ID':lambda x:x.iloc[0]})
Out[51]: 
       COUNT   ID
VALUE            
235       60  139
355      290  102
421      764  127
427      250   90
439       80  133

(in general, the docs of groupby is one of the most useful pieces of info you'll read regarding pandas imho) (通常,groupby的文档是您将阅读的有关熊猫恕我直言的最有用的信息之一)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM