[英]Use pandas groupby to fetch frequency count based on time intervals
我有一個 dataframe 如下圖所示
df = pd.DataFrame({'subject_id':[1,1,1,2,2,2],
'start_time':['2130-03-25 18:51:47','2130-04-23 18:51:47','2130-04-23 18:51:47','2120-01-11 18:51:47','2120-01-11 18:51:47','2120-04-28 18:51:47'],
'test_time':['2130-03-26 14:51:47','2130-04-24 18:51:47','2130-04-25 18:51:47','2121-02-26 18:51:47','2121-02-26 18:51:47','2120-04-28 19:51:47'],
'test':['test1','test2','test2','test2','test3','test3']})
df['start_time'] = pd.to_datetime(df['start_time'])
df['test_time'] = pd.to_datetime(df['test_time'])
我想做的是
a) 從start_time
開始每 24 小時獲取每個主題的測試次數。 測試時間可以從test_time
列中找到
示例 - 24 小時,我的意思是0-24hours
小時、 24-48hours
小時、 48-72hours
等。
我嘗試了以下
df['time_diff'] = (df.test_time - df.start_time) / pd.Timedelta(hours=1)
conditions = [
(df['time_diff'] >= 0) & (df['time_diff'] <= 24),
(df['time_diff'] >24 ) & (df['time_diff'] <= 48),
(df['time_diff'] > 48) & (df['time_diff'] <= 72)]
choices = ['0-24hrs','24-48hrs','48-72hrs']
df['op'] = np.select(conditions, choices, default='Greater than 3 days')
df.groupby(['subject_id','test','op'])['test'].count()
但是,上面會產生格式不正確的 output。
我希望我的 output 如下所示
你可以只添加unstack
out = df.groupby(['subject_id','test','op'])['test'].count().unstack(fill_value=0).reset_index()
out
op subject_id test 0-24hrs 24-48hrs Greater than 3 days
0 1 test1 1 0 0
1 1 test2 1 1 0
2 2 test2 0 0 1
3 2 test3 1 0 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.