![](/img/trans.png)
[英]How to calculate the accumulated time of a groupby function using pandas?
[英]Calculate the duration from a date time column using pandas groupby or any other pandas function
我是熊貓新手,我的數據框如下所示
如何計算特定“ ID”從第一個狀態到下一個狀態的持續時間(以天為單位),依此類推。
計算有兩個以上發生故障且在它們之間至少進行一次維護的ID。
用“失敗-失敗”模式和“失敗-維護”子集數據。
我嘗試了所有組合pandas groupby函數,例如
df.groupby(['ID', 'Status' ]).size().reset_index(name='counts').sort_values(['counts'], ascending =False)
使用以下字典創建DF
import pandas as pd
import numpy as np
sales = [ {'ID': '1', 'Status': 'Failure', 'Date': '2017-04-26'},
{'ID': '2', 'Status': 'Failure', 'Date': '2017-05-06'},
{'ID': '1', 'Status': 'Maintenance', 'Date': '2017-05-16'},
{'ID': '1', 'Status': 'Failure', 'Date': '2017-07-06'},
{'ID': '2', 'Status': 'Failure', 'Date': '2017-09-06'},
{'ID': '1', 'Status': 'Failure', 'Date': '2018-01-14'},
{'ID': '3', 'Status': 'Maintenance', 'Date': '2017-07-16'},
{'ID': '4', 'Status': 'Failure', 'Date': '2017-07-16'},
{'ID': '2', 'Status': 'Maintenance', 'Date': '2018-07-06'},
{'ID': '3', 'Status': 'Failure', 'Date': '2018-01-06'},
{'ID': '3', 'Status': 'Maintenance', 'Date': '2018-07-06'},
{'ID': '3', 'Status': 'Failure', 'Date': '2019-07-06'},
{'ID': '2', 'Status': 'Maintenance', 'Date': '2019-05-06'},
{'ID': '2', 'Status': 'Failure', 'Date': '2019-10-06'},
{'ID': '4', 'Status': 'Maintenance', 'Date': '2019-11-06'}]
df = pd.DataFrame(sales)
df['Date'] = pd.to_datetime(df['Date'])
預期輸出
2.1有多個故障的ID。
2.1多少個ID發生多個故障,並且在它們之間進行一次維護。 以及兩次故障之間有兩次維護的次數等等。
根據“ ID”和“日期”對數據框進行排序后,以下問題3的說明如下:
Date ID Status
0 2017-04-26 1 F
2 2017-05-16 1 M
3 2017-07-06 1 F
5 2018-01-14 1 F
1 2017-05-06 2 F
4 2017-09-06 2 F
8 2018-07-06 2 M
12 2019-05-06 2 M
13 2019-10-06 2 F
6 2017-07-16 3 M
9 2018-01-06 3 F
10 2018-07-06 3 M
11 2019-07-06 3 F
7 2017-07-16 4 F
14 2019-11-06 4 M
現在,在ID 1中,索引3和5為FF,在ID 2中,索引1和4在ID 3中為FF,沒有FF模式,在ID 4中也沒有FF模式。
因此,下面給出了預期的FF子集。
Date ID Status
0 2017-07-06 1 F
1 2018-01-14 1 F
2 2017-05-06 2 F
3 2017-09-06 2 F
類似地,子集之后的FM數據幀如下所示
Date ID Status
0 2017-04-26 1 F
1 2017-05-16 1 M
2 2017-09-06 2 F
3 2018-07-06 2 M
4 2018-01-06 3 F
5 2018-07-06 3 M
6 2017-07-16 4 F
7 2019-11-06 4 M
我很難理解您的問題,但是也許這些答案可以幫助您完全解決它,或者至少不會卡住(以防我弄錯了問題)
我仍然看到三個問題:
計算特定“ ID”在下一次故障和下一個狀態時的持續時間(以天為單位)。
計算有兩個以上失敗的ID。
有多少人之間至少有一種保養
因為你需要熊貓和Numpy
import pandas as pd
import numpy as np
sales = [{'ID': '1', 'Status': 'Failure', 'Date': '2017-04-26'},
{'ID': '2', 'Status': 'Failure', 'Date': '2017-05-06'},
{'ID': '1', 'Status': 'Maintenance', 'Date': '2017-05-16'},
{'ID': '1', 'Status': 'Failure', 'Date': '2017-07-06'},
{'ID': '2', 'Status': 'Failure', 'Date': '2017-09-06'},
{'ID': '1', 'Status': 'Failure', 'Date': '2018-01-14'},
{'ID': '3', 'Status': 'Maintenance', 'Date': '2017-07-16'},
{'ID': '4', 'Status': 'Failure', 'Date': '2017-07-16'},
{'ID': '2', 'Status': 'Maintenance', 'Date': '2018-07-06'},
{'ID': '3', 'Status': 'Failure', 'Date': '2018-01-06'},
{'ID': '3', 'Status': 'Maintenance', 'Date': '2018-07-06'},
{'ID': '3', 'Status': 'Failure', 'Date': '2019-07-06'},
{'ID': '2', 'Status': 'Maintenance', 'Date': '2019-05-06'},
{'ID': '2', 'Status': 'Failure', 'Date': '2019-10-06'},
{'ID': '4', 'Status': 'Maintenance', 'Date': '2019-11-06'}]
df = pd.DataFrame(sales)
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(['ID', 'Date', 'Status'])
print('{0}\n'.format(df))
# Question 2
# IDs with more than two failures
df_question2 = df.groupby(['ID', 'Status']) \
.size().reset_index() \
.rename(columns={'ID': 'ID', 'Status': 'Status', 0: 'Counts'})
# Answer 2
counts_of_more_than_two_failures = len(df_question2.loc[df_question2['Counts'] > 2])
print('IDs with more than two failures : {0}'.format(counts_of_more_than_two_failures))
# Question 3
# one maintenance between failures
df_question3 = df
df_question3['Status'] = np.where(df['Status'] == 'Failure', '1', '0')
df_question3_status = df_question3.groupby('ID')['Status'].apply(list)
dict_question3 = df_question3_status.to_frame().to_dict().get('Status')
# Answer 3
for key, value in dict_question3.items():
# keep only non-empty values from the list
_find_me = list(filter(None, ''.join(value).strip('0').split('1')))
_has = True if _find_me else False
print('ID {0} has number of maintenance between failures: {1}'.format(key, _has))
print('\n')
# subset patterns
df = pd.DataFrame(sales)
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(['ID', 'Date', 'Status'])
df_question3 = df
df_question3['Status'] = np.where(df['Status'] == 'Failure', '0', '1')
df_question3_patterns = df_question3.groupby('ID')['Status'].apply(list)
dict_question3 = df_question3_patterns.to_frame().to_dict().get('Status')
# F-F
# temp dataframe
df_ff_pattern = pd.DataFrame([])
for k, value in enumerate(dict_question3.items()):
# keep index in dictionary values
for i, j in enumerate(value[1]):
# only FF values
if i < len(value[1]) - 1 and j == '0':
if value[1][i] == value[1][i + 1]:
# locate n and n+1 rows based on i index
df_ff_pattern = df_ff_pattern.append(df_question3[df_question3['ID'] == value[0]].iloc[[i, i + 1]])
print('subset FF patterns')
# back-substitute status values
df_ff_pattern['Status'] = np.where(df_ff_pattern['Status'] == '0', 'F', 'M')
print(df_ff_pattern)
print('\n')
# F-M
for k, value in enumerate(dict_question3.items()):
# keep index in dictionary values
for i, j in enumerate(value[1]):
# only FM values
if i < len(value[1])-1 and j == '0':
if value[1][i] != value[1][i + 1]:
# locate n and n+1 rows based on i index
print('subset FM patterns')
print(df_question3[df_question3['ID'] == value[0]].iloc[[i, i+1]])
# Question 1
df_question1 = pd.DataFrame(sales)
df_question1['Date'] = pd.to_datetime(df_question1['Date'])
df_question1 = df_question1.reset_index().sort_values(['ID', 'Date', 'Status']).set_index(['ID', 'Status'])
df_question1['Difference'] = df_question1.groupby('ID')['Date'].transform(pd.Series.diff)
# Possible Answer 1
# all days in statuses
print(df_question1)
df_question1 = df_question1.reset_index()
df_question1_failure = df_question1.loc[df_question1['Status'] == 'Failure']
df_question1_failure_pre_diff = df_question1_failure[['ID', 'Difference']]
# filter by status
df_question1_maintenance = df_question1.loc[df_question1['Status'] == 'Maintenance']
df_question1_maintenance_pre_diff = df_question1_maintenance[['ID', 'Difference']]
# group by and sum
df_question1_failure_group = df_question1_failure_pre_diff.groupby('ID').sum()
df_question1_maintenance_group = df_question1_maintenance_pre_diff.groupby('ID').sum()
# Possible Answer 1
# days in status failure
print((df_question1_failure_group - df_question1_maintenance_group).abs())
如果您認為缺少某些內容,請發表評論,並改善答案。 無論如何,如果別人把它們弄對了,這可能只是一個起點
希望能幫助到你 (:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.