简体   繁体   English

按单列汇总分组的熊猫数据框

[英]Sum grouped Pandas dataframe by single column

I have a Pandas dataframe: 我有一个熊猫数据框:

test=pd.DataFrame(columns=['GroupID','Sample','SampleMeta','Value'])
test.loc[0,:]='1','S1','S1_meta',1
test.loc[1,:]='1','S1','S1_meta',1
test.loc[2,:]='2','S2','S2_meta',1

I want to (1) group by two columns ('GroupID' and 'Sample'), (2) sum 'Value' per group, and (3) retain only unique values in 'SampleMeta' per group. 我想(1)按两列(“ GroupID”和“ Sample”)分组,(2)每个组的“值”总和,(3)每个组的“ SampleMeta”中仅保留唯一值。 The desired result ('GroupID' and 'Sample' as index) is shown: 显示了所需的结果(“ GroupID”和“ Sample”作为索引):

                SampleMeta  Value
GroupID Sample                       
1       S1      S1_meta      2
2       S2      S2_meta      1 

df.groupby() and the .sum() method get close, but .sum() concatenates identical values in the 'Values' column within a group. df.groupby()和.sum()方法很接近,但是.sum()在组内的“值”列中连接相同的值。 As a result, the 'S1_meta' value is duplicated. 结果,“ S1_meta”值被复制。

g=test.groupby(['GroupID','Sample'])
print g.sum()

                SampleMeta      Value
GroupID Sample                       
1       S1      S1_metaS1_meta  2
2       S2      S2_meta         1 

Is there a way to achieve the desired result using groupby() and associated methods? 有没有一种方法可以使用groupby()和相关方法来达到预期的结果? Merging the summed 'Value' per group with a separate 'SampleMeta' DataFrame works but there must be a more elegant solution. 将每个组的总“值”与单独的“ SampleMeta” DataFrame合并是可行的,但是必须有一个更优雅的解决方案。

Well, you can include SampleMeta as part of the groupby: 好吧,您可以将SampleMeta包含在groupby中:

print test.groupby(['GroupID','Sample','SampleMeta']).sum()

                           Value
GroupID Sample SampleMeta       
1       S1     S1_meta         2
2       S2     S2_meta         1

If you don't want SampleMeta as part of the index when done you could modify it as follows: 如果不想在完成SampleMeta作为索引的一部分,则可以按以下方式进行修改:

print test.groupby(['GroupID','Sample','SampleMeta']).sum().reset_index(level=2)

               SampleMeta  Value
GroupID Sample                  
1       S1        S1_meta      2
2       S2        S2_meta      1

This will only work right if there is no variation within SampleMeta for ['GroupID','Sample'] . 这仅在SampleMeta['GroupID','Sample']没有变化的情况下才有效。 Of course, If there was variation within ['GroupID','Sample'] then you probably to exclude SampleMeta from the groupby/sum entirely: 当然,如果['GroupID','Sample']存在差异['GroupID','Sample']则您可能会完全从groupby / sum中排除SampleMeta

print test.groupby(['GroupID','Sample'])['Value'].sum()

GroupID  Sample
1        S1        2
2        S2        1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM