简体   繁体   English

获取pandas中的累计和

[英]Get cumulative sum in pandas

Context语境

Datetime约会时间 Campaign_name活动名称 Status地位 Open_time开放时间
2022-03-15 00:00 2022-03-15 00:00 Funny_campaign搞笑活动 Open打开
2022-03-15 01:00 2022-03-15 01:00 Funny_campaign搞笑活动 Continue继续
2022-03-15 02:00 2022-03-15 02:00 Funny_campaign搞笑活动 Continue继续
2022-03-15 03:00 2022-03-15 03:00 Funny_campaign搞笑活动 Continue继续
2022-03-15 04:00 2022-03-15 04:00 Funny_campaign搞笑活动 Close关闭
2022-03-15 08:00 2022-03-15 08:00 Funny_campaign搞笑活动 Open打开
2022-03-15 09:00 2022-03-15 09:00 Funny_campaign搞笑活动 Continue继续
2022-03-15 10:00 2022-03-15 10:00 Funny_campaign搞笑活动 Close关闭

Problem问题

I need to calculate the time from open to close.我需要计算从打开到关闭的时间。

My code right now我现在的代码

There are two approches I could go with.我可以使用两种方法 go。 Get the open time in each 'Close' or a cumulative open_time in each 'Open' and 'Continue'.获取每个“关闭”中的打开时间或每个“打开”和“继续”中的累积 open_time。 Here is my take on the last one.这是我对最后一个的看法。

My code right now is almost fine, it doesn't count the time between Close and Open but it forgets to sum the last time difference.我现在的代码几乎没问题,它不计算关闭和打开之间的时间,但它忘记了最后一个时间差的总和。

df["Datetime"] = pd.to_datetime(df["Datetime"])
df["time_diff"] = df["Datetime"].diff()
df["time_diff"] = df["time_diff"].astype("timedelta64[m]").fillna(0)
condition = df["Status"] == "Close"
df.loc[condition, "time_diff"] = 0
df["Cumulative time"] = df.groupby(["Campaign_name"])["time_diff"].cumsum()
df = df.drop("time_diff", 1)

IIUC, you could start new groups on the opens and use: IIUC,您可以在 opens 上开始新的组并使用:

df['Datetime'] = pd.to_datetime(df['Datetime'])

group = df['Status'].eq('Open').cumsum()

df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])
# or, alternative syntax
# df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g.diff().cumsum())

Output: Output:

             Datetime   Campaign_name    Status       Open_time
0 2022-03-15 00:00:00  Funny_campaign      Open 0 days 00:00:00
1 2022-03-15 01:00:00  Funny_campaign  Continue 0 days 01:00:00
2 2022-03-15 02:00:00  Funny_campaign  Continue 0 days 02:00:00
3 2022-03-15 03:00:00  Funny_campaign  Continue 0 days 03:00:00
4 2022-03-15 04:00:00  Funny_campaign     Close 0 days 04:00:00
5 2022-03-15 08:00:00  Funny_campaign      Open 0 days 00:00:00
6 2022-03-15 09:00:00  Funny_campaign  Continue 0 days 01:00:00
7 2022-03-15 10:00:00  Funny_campaign     Close 0 days 02:00:00

Or to only assign to "Close":或者只分配给“关闭”:

df.loc[df['Status'].eq('Close'), 'Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])

Output: Output:

             Datetime   Campaign_name    Status        Open_time
0 2022-03-15 00:00:00  Funny_campaign      Open              NaN
1 2022-03-15 01:00:00  Funny_campaign  Continue              NaN
2 2022-03-15 02:00:00  Funny_campaign  Continue              NaN
3 2022-03-15 03:00:00  Funny_campaign  Continue              NaN
4 2022-03-15 04:00:00  Funny_campaign     Close  0 days 04:00:00
5 2022-03-15 08:00:00  Funny_campaign      Open              NaN
6 2022-03-15 09:00:00  Funny_campaign  Continue              NaN
7 2022-03-15 10:00:00  Funny_campaign     Close  0 days 02:00:00

And for just the difference close-open for each group:对于每个组的关闭和打开差异:

df.groupby(group)['Datetime'].agg(lambda g: g.iloc[-1]-g.iloc[0])

Output: Output:

Status
1   0 days 04:00:00
2   0 days 02:00:00
Name: Datetime, dtype: timedelta64[ns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM