简体   繁体   English

如何在 Pandas DF 中按列求和并删除其他行

[英]How to Sum by Column in Pandas DF and Remove Additional Rows

I have a dataframe in the form:我有一个 dataframe 的形式:

    Sales House Station Day Date    Time    Daypart  Total          Unique Key
0   CARLTON     CARLTON Mon 3AUG20  1213    DAYTIME  0              CARLTON_ 3AUG20
1   CARLTON     CARLTON Mon 3AUG20  2307    POSTPEAK 30             CARLTON_ 3AUG20
2   CARLTON     CARLTON Tue 4AUG20  1015    COFFEE   30             NaN
3   CARLTON     CARLTON Tue 4AUG20  1027    COFFEE   30             CARLTON_ 4AUG20
4   CARLTON     CARLTON Wed 5AUG20  1310    DAYTIME  30             CARLTON_ 5AUG20

The Unique Key column is just a column I have added to try make this process easier (correct me if I am wrong please.). Unique Key列只是我添加的一个列,以尝试使此过程更容易(如果我错了,请纠正我。)。 Essentially I would like to sum the Total column by using the Unique Key column, but also remove the extra rows associated with the Unique Key and only leaving one..本质上,我想通过使用Unique Key列对Total列求和,但也删除与Unique Key关联的额外行,只留下一个..

As an example, the above df would come out as the below.例如,上面的 df 将如下所示。 In this instance there is a match for row 1 and row 2, which the Total row should be summed, and then row 2 removed..在这种情况下,第 1 行和第 2 行存在匹配项,应将Total行相加,然后删除第 2 行。

    Sales House Station Day Date    Time    Daypart  Total          Unique Key
0   CARLTON     CARLTON Mon 3AUG20  1213    DAYTIME  30             CARLTON_ 3AUG20
1   CARLTON     CARLTON Tue 4AUG20  1015    COFFEE   30             NaN
2   CARLTON     CARLTON Tue 4AUG20  1027    COFFEE   30             CARLTON_ 4AUG20
3   CARLTON     CARLTON Wed 5AUG20  1310    DAYTIME  30             CARLTON_ 5AUG20

Is there a way to easily do this?有没有办法轻松做到这一点?

Seems like you need df.groupby() method.好像你需要df.groupby()方法。

I would try doing this in three steps:我会尝试分三个步骤执行此操作:

aggregated = df.groupby(['Station', 'Date'])['Total'].sum().reset_index() # Getting sum
df = df.drop_duplicates(['Station', 'Date'])                              # Removing duplicated rows
df = df.drop('Total', axis=1).merge(aggregated, on=['Station', 'Date'])   # Merge back

Edited according to the comment (added df = df.drop_duplicates(['Station', 'Date']) ) line in order to remove duplicates.根据注释(添加df = df.drop_duplicates(['Station', 'Date']) 行进行编辑,以删除重复项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM