![](/img/trans.png)
[英]Calculate time difference between two dates in the same column in Pandas
[英]Calculate time between two different values in the same pandas column
我的數據如下所示
Device Time Condition
D1 01/11/2019 00:00 issue
D1 01/11/2019 00:15 issue
D1 01/11/2019 00:30 issue
D1 01/11/2019 00:45 issue
D1 01/11/2019 01:00 issue
D1 01/11/2019 01:15 Resolved
D1 01/11/2019 01:30 Resolved
D2 01/11/2019 01:45 issue
D2 01/11/2019 02:00 Resolved
D1 01/11/2019 01:45 issue
D1 01/11/2019 02:00 Resolved
我需要創建一個新列來查找第一個問題和第一個解決之間的時間。 我需要一個 groupby 聲明,它將保留第一個問題並解決所有問題的第一個問題。 然后找時間 - 當我使用按設備分組並調節它時,每個設備只保留一個問題。
所需的 output 如下所示
Device Time Condition durationTofix
D1 01/11/2019 00:00 issue
D1 01/11/2019 00:15 issue
D1 01/11/2019 00:30 issue
D1 01/11/2019 00:45 issue
D1 01/11/2019 01:00 issue
D1 01/11/2019 01:15 Resolved 01:15
D1 01/11/2019 01:30 Resolved
D2 01/11/2019 01:45 issue
D2 01/11/2019 02:00 Resolved 00:15
D1 01/11/2019 01:45 issue
D1 01/11/2019 02:00 Resolved 00:15
由於 groupby 設備和條件還不夠,我想創建一個索引列
data["index"] = data.groupby(['Device','condition']).??? #Something like cumcount() but it is not cumcount in this case
然后使用 pivot 表進行時間計算
H = data.pivot_table(index=['index','Device'], columns=['condition'], values='Timestamp',aggfunc=lambda x: x)
H['durationTofix'] = H['Resolved']- H['issue']
如果在Resolved
per continuous groups by Device
之前始終存在至少一個問題,則解決方案:
#converting to datetimes
df['Time'] = pd.to_datetime(df['Time'])
#consetutive groups
g = df['Device'].ne(df['Device'].shift()).cumsum()
#test issue values
m = df['Condition'].eq('issue')
#replace not issue to missing values
i = df['Time'].where(m)
#get first duplicated rows by consecutive groups and condition column
mask = ~df.assign(g=g,i=i).duplicated(['g','Condition'])
#forward filling Time by first issue per groups
s = df['Time'].where(mask & m).groupby(g).ffill()
#subtract and filter only first Resolved per groups
df['durationTofix'] = df['Time'].sub(s).where(mask & df['Condition'].eq('Resolved'))
print (df)
Device Time Condition durationTofix
0 D1 2019-01-11 00:00:00 issue NaT
1 D1 2019-01-11 00:15:00 issue NaT
2 D1 2019-01-11 00:30:00 issue NaT
3 D1 2019-01-11 00:45:00 issue NaT
4 D1 2019-01-11 01:00:00 issue NaT
5 D1 2019-01-11 01:15:00 Resolved 01:15:00
6 D1 2019-01-11 01:30:00 Resolved NaT
7 D2 2019-01-11 01:45:00 issue NaT
8 D2 2019-01-11 02:00:00 Resolved 00:15:00
9 D1 2019-01-11 01:45:00 issue NaT
10 D1 2019-01-11 02:00:00 Resolved 00:15:00
最大的問題是如何正確分組/解決您的問題,這可以通過反向cumsum
來完成:
df["Time"] = pd.to_datetime(df["Time"])
df["group"] = (df["Condition"].eq("Resolved")&df["Condition"].shift(-1).eq("issue"))[::-1].cumsum()[::-1]
df["diff"] = (df[~df.duplicated(["Condition","group"])].groupby("group")["Time"].transform(lambda d: d.diff()))
print (df)
Device Time Condition group diff
0 D1 2019-01-11 00:00:00 issue 2 NaT
1 D1 2019-01-11 00:15:00 issue 2 NaT
2 D1 2019-01-11 00:30:00 issue 2 NaT
3 D1 2019-01-11 00:45:00 issue 2 NaT
4 D1 2019-01-11 01:00:00 issue 2 NaT
5 D1 2019-01-11 01:15:00 Resolved 2 01:15:00
6 D1 2019-01-11 01:30:00 Resolved 2 NaT
7 D2 2019-01-11 01:45:00 issue 1 NaT
8 D2 2019-01-11 02:00:00 Resolved 1 00:15:00
9 D1 2019-01-11 01:45:00 issue 0 NaT
10 D1 2019-01-11 02:00:00 Resolved 0 00:15:00
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.