Pandas 根據整幀日期時間數據填列

Question

我有一個 python 數據框，例如：

時間戳 (UTC)	應用程序	地位	開始時間	時間結束
2021 年 11 月 18 日 17:13:01	應用程序 1	通過		17:13:01
2021 年 11 月 18 日 17:07:28	應用程序 1	失敗	17:07:28
2021 年 11 月 18 日 16:31:11	應用程序 1	失敗	16:31:11
2021 年 11 月 18 日 16:15:22	應用程序 1	通過		16:15:22
2021 年 11 月 18 日 16:07:51	應用程序 1	失敗	16:07:51
2021 年 11 月 22 日 13:56:18	應用程序 2	通過		13:56:18
2021 年 11 月 22 日 03:43:33	應用程序 2	失敗	03:43:33
2021 年 11 月 22 日 02:48:06	應用程序 2	失敗	02:48:06
11/19/2021 10:30:21	應用程序 3	通過		10:30:21
2021 年 11 月 17 日 13:42:11	應用程序 3	失敗	13:42:11

這是一個數據樣本，我將使用的數據只有更多記錄才會看起來相同。 我需要它來計算每個應用程序從第一個失敗事件到第一個通過事件的停機時間。 如果有多個通過狀態，我需要它以時間格式計算單個停機時間序列和應用程序的總停機時間，並將這些值放在不同的列中。

我正在使用 Pandas 進行 csv 操作。

所以最終的數據框看起來像。

時間戳 (UTC)	應用程序	地位	開始時間	時間結束	停機時間	停機時間
2021 年 11 月 18 日 17:13:01	應用程序 1	通過		17:13:01	41:50	49:21
2021 年 11 月 18 日 17:07:28	應用程序 1	失敗	17:07:28		41:50	49:21
2021 年 11 月 18 日 16:31:11	應用程序 1	失敗	16:31:11		41:50	49:21
2021 年 11 月 18 日 16:15:22	應用程序 1	通過		16:15:22	07:31	49:21
2021 年 11 月 18 日 16:07:51	應用程序 1	失敗	16:07:51		07:31	49:21
2021 年 11 月 22 日 13:56:18	應用程序 2	通過		13:56:18	11:08:12	668.12
2021 年 11 月 22 日 03:43:33	應用程序 2	失敗	03:43:33		11:08:12	668.12
2021 年 11 月 22 日 02:48:06	應用程序 2	失敗	02:48:06		11:08:12	668.12
11/19/2021 10:30:21	應用程序 3	通過		10:30:21	44:48:10	2688.10
2021 年 11 月 17 日 13:42:11	應用程序 3	失敗	13:42:11		44:48:10	2688.10

任何幫助，將不勝感激。

我知道這些表格不容易閱讀，但我必須在 Stack Overflow 發布之前將其格式化為代碼

這是示例df的代碼


import pandas as pd

data = {'TimeStamp': ['11/18/2021 17:13:01','11/18/2021 17:07:28','11/18/2021 16:31:11','11/18/2021 16:15:22',
              '11/18/2021 16:07:51','11/22/2021 13:56:18','11/22/2021 03:43:33','11/22/2021 02:48:06',
                      '11/19/2021 10:30:21','11/17/2021 13:42:11'],
'App': ['App1','App1','App1','App1','App1','App2','App2','App2','App3','App3'],
'Status': ['Passing','Failing','Failing','Passing','Failing','Passing','Failing','Failing','Passing','Failing']}

df = pd.DataFrame(data)

print(df)

Answer 1

你可以這樣做：

# Setup
df = pd.DataFrame(data).sort_values(
    by=["App", "TimeStamp", "Status"], ignore_index=True
)
df["TimeStamp"] = pd.to_datetime(df["TimeStamp"])

# Calculate difference between rows and deal with first one, convert values
df["Downtime"] = df["TimeStamp"].diff().fillna(method="bfill").dt.total_seconds()

# Iterate to deal with change of sequences
df["group"] = 0
for i in df.index:
    if i == 0:
        df.loc[i, "Downtime"] = 0
        continue
    if df.loc[i - 1, "Status"] == "Passing":
        df.loc[i, "Downtime"] = 0
        df.loc[i:, "group"] += 1

# Add cumulative sums by app
cum_sums = df.groupby(["App"]).sum()
for app in df["App"].unique():
    df.loc[df["App"] == app, "Total Downtime"] = cum_sums.loc[app, "Downtime"]

# Add cumulative sums by group
cum_sums = df.groupby(["group"]).sum()
for group in df["group"].unique():
    df.loc[df["group"] == group, "Downtime"] = cum_sums.loc[group, "Downtime"]

# Cleanup
df = df.drop(columns="group")
df["Downtime"] = df["Downtime"].apply(
    lambda x: f"{int(x // 3600):02}:{int((x % 3600) // 60):02}:{int(x % 60):02}"
)
df["Total Downtime"] = df["Total Downtime"].apply(
    lambda x: f"{int(x // 3600):02}:{int((x % 3600) // 60):02}:{int(x % 60):02}"
)

print(df)
# Outputs
            TimeStamp   App   Status  Downtime Total Downtime
0 2021-11-18 16:07:51  App1  Failing  00:07:31       00:49:21
1 2021-11-18 16:15:22  App1  Passing  00:07:31       00:49:21
2 2021-11-18 16:31:11  App1  Failing  00:41:50       00:49:21
3 2021-11-18 17:07:28  App1  Failing  00:41:50       00:49:21
4 2021-11-18 17:13:01  App1  Passing  00:41:50       00:49:21
5 2021-11-22 02:48:06  App2  Failing  11:08:12       11:08:12
6 2021-11-22 03:43:33  App2  Failing  11:08:12       11:08:12
7 2021-11-22 13:56:18  App2  Passing  11:08:12       11:08:12
8 2021-11-17 13:42:11  App3  Failing  44:48:10       44:48:10
9 2021-11-19 10:30:21  App3  Passing  44:48:10       44:48:10

Pandas 根據整幀日期時間數據填列

問題描述

1 個解決方案

解決方案1
0 2021-12-11 08:45:43

Pandas 根據整幀日期時間數據填列

問題描述

1 個解決方案

解決方案1 0 2021-12-11 08:45:43

解決方案1
0 2021-12-11 08:45:43