[英]Get cumulative sum in pandas
Datetime![]() |
Campaign_name![]() |
Status![]() |
Open_time![]() |
---|---|---|---|
2022-03-15 00:00 ![]() |
Funny_campaign![]() |
Open![]() |
|
2022-03-15 01:00 ![]() |
Funny_campaign![]() |
Continue![]() |
|
2022-03-15 02:00 ![]() |
Funny_campaign![]() |
Continue![]() |
|
2022-03-15 03:00 ![]() |
Funny_campaign![]() |
Continue![]() |
|
2022-03-15 04:00 ![]() |
Funny_campaign![]() |
Close![]() |
|
2022-03-15 08:00 ![]() |
Funny_campaign![]() |
Open![]() |
|
2022-03-15 09:00 ![]() |
Funny_campaign![]() |
Continue![]() |
|
2022-03-15 10:00 ![]() |
Funny_campaign![]() |
Close![]() |
I need to calculate the time from open to close.我需要计算从打开到关闭的时间。
There are two approches I could go with.我可以使用两种方法 go。 Get the open time in each 'Close' or a cumulative open_time in each 'Open' and 'Continue'.
获取每个“关闭”中的打开时间或每个“打开”和“继续”中的累积 open_time。 Here is my take on the last one.
这是我对最后一个的看法。
My code right now is almost fine, it doesn't count the time between Close and Open but it forgets to sum the last time difference.我现在的代码几乎没问题,它不计算关闭和打开之间的时间,但它忘记了最后一个时间差的总和。
df["Datetime"] = pd.to_datetime(df["Datetime"])
df["time_diff"] = df["Datetime"].diff()
df["time_diff"] = df["time_diff"].astype("timedelta64[m]").fillna(0)
condition = df["Status"] == "Close"
df.loc[condition, "time_diff"] = 0
df["Cumulative time"] = df.groupby(["Campaign_name"])["time_diff"].cumsum()
df = df.drop("time_diff", 1)
IIUC, you could start new groups on the opens and use: IIUC,您可以在 opens 上开始新的组并使用:
df['Datetime'] = pd.to_datetime(df['Datetime'])
group = df['Status'].eq('Open').cumsum()
df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])
# or, alternative syntax
# df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g.diff().cumsum())
Output: Output:
Datetime Campaign_name Status Open_time
0 2022-03-15 00:00:00 Funny_campaign Open 0 days 00:00:00
1 2022-03-15 01:00:00 Funny_campaign Continue 0 days 01:00:00
2 2022-03-15 02:00:00 Funny_campaign Continue 0 days 02:00:00
3 2022-03-15 03:00:00 Funny_campaign Continue 0 days 03:00:00
4 2022-03-15 04:00:00 Funny_campaign Close 0 days 04:00:00
5 2022-03-15 08:00:00 Funny_campaign Open 0 days 00:00:00
6 2022-03-15 09:00:00 Funny_campaign Continue 0 days 01:00:00
7 2022-03-15 10:00:00 Funny_campaign Close 0 days 02:00:00
Or to only assign to "Close":或者只分配给“关闭”:
df.loc[df['Status'].eq('Close'), 'Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])
Output: Output:
Datetime Campaign_name Status Open_time
0 2022-03-15 00:00:00 Funny_campaign Open NaN
1 2022-03-15 01:00:00 Funny_campaign Continue NaN
2 2022-03-15 02:00:00 Funny_campaign Continue NaN
3 2022-03-15 03:00:00 Funny_campaign Continue NaN
4 2022-03-15 04:00:00 Funny_campaign Close 0 days 04:00:00
5 2022-03-15 08:00:00 Funny_campaign Open NaN
6 2022-03-15 09:00:00 Funny_campaign Continue NaN
7 2022-03-15 10:00:00 Funny_campaign Close 0 days 02:00:00
And for just the difference close-open for each group:对于每个组的关闭和打开差异:
df.groupby(group)['Datetime'].agg(lambda g: g.iloc[-1]-g.iloc[0])
Output: Output:
Status
1 0 days 04:00:00
2 0 days 02:00:00
Name: Datetime, dtype: timedelta64[ns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.