简体   繁体   English

如何使用group by获取唯一ID的累积总和?

[英]How to get cumulative sum of unique IDs with group by?

I am very new to python and pandas working on a pandas dataframe which looks like我对 python 和 Pandas 非常陌生,它在 Pandas 数据框上工作,看起来像

Date     Time           ID   Weight
Jul-1     12:00         A       10
Jul-1     12:00         B       20
Jul-1     12:00         C       100
Jul-1     12:10         C       100
Jul-1     12:10         D       30
Jul-1     12:20         C       100
Jul-1     12:20         D       30
Jul-1     12:30         A       10
Jul-1     12:40         E       40
Jul-1     12:50         F       50
Jul-1     1:00          A       40

I am trying to achieve group by date, Time and ids and apply cumulative sum such that if an id is present in the next time-slot the weight is only added once(uniquely).我正在尝试按日期、时间和 id 实现分组并应用累积总和,这样如果下一个时间段中存在 id,则权重仅添加一次(唯一)。 The resulting data frame would look like this结果数据框看起来像这样

Date     Time           Weight   
Jul-1     12:00         130     (10+20+100)
Jul-1     12:10         160     (10+20+100+30)
Jul-1     12:20         160     (10+20+100+30)
Jul-1     12:30         160     (10+20+100+30)
Jul-1     12:40         200     (10+20+100+30+40)
Jul-1     12:50         250     (10+20+100+30+40+50)
Jul-1     01:00         250     (10+20+100+30+40+50)

This is what I tried below, however this is still counting the weights multiple times:这是我在下面尝试过的,但是这仍然多次计算重量:

df=df.groupby(['date','time','ID'])['Wt'].apply(lambda x: x.unique().sum()).reset_index()
df['cumWt']=df['Wt'].cumsum()

Any help would be really appreciated!任何帮助将非常感激!

Thanks a lot in advance!!非常感谢提前!

The code below uses pandas.duplicate() , pandas.merge() , pandas.groupby/sum and pandas.cumsum() to come to the desired output:下面的代码使用pandas.duplicate()pandas.merge()pandas.groupby/sumpandas.cumsum()来获得所需的输出:

# creates a series of weights to be considered and rename it to merge
unique_weights = df['weight'][~df.duplicated(['weight'])]
unique_weights.rename('consider_cum', inplace = True)

# merges the series to the original dataframe and replace the ignored values by 0
df = df.merge(unique_weights.to_frame(), how = 'left', left_index=True, right_index=True)
df.consider_cum = df.consider_cum.fillna(0)

# sums grouping by date and time
df = df.groupby(['date', 'time']).sum().reset_index()

# create the cumulative sum column and present the output
df['weight_cumsum'] = df['consider_cum'].cumsum()
df[['date', 'time', 'weight_cumsum']]

Produces the following output:产生以下输出:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM