[英]Summarising pandas dataframe in single row
我希望得到一些幫助,將下面詳述的 dataframe 總結成一行摘要,如頁面下方所需的 output 所示。 提前謝謝了。
employees = {'Name of Employee': ['Mark','Mark','Mark','Mark','Mark','Mark', 'Mark','Mark','Mark','Mark','Mark','Mark','Mark'],
'Department': ['21','21','21','21','21','21', '21','21','21','21','21','21','21'],
'Team': ['2','2','2','2','2','2','2','2','2','2','2','2','2'],
'Log': ['2020-02-19 09:01:17', '2020-02-19 09:54:02', '2020-04-10 11:00:31', '2020-04-11 12:39:08', '2020-04-18 09:45:22', '2020-05-05 09:01:17', '2020-05-23 09:54:02', '2020-07-03 11:00:31', '2020-07-03 12:39:08', '2020-07-04 09:45:22', '2020-07-05 09:01:17', '2020-07-06 09:54:02', '2020-07-06 11:00:31'],
'Call Duration' : ['0.01178', '0.01736','0.01923','0.00911','0.01007','0.01206','0.01256','0.01006','0.01162','0.00733','0.01250','0.01013','0.01308'],
'ITT': ['NO','YES', 'NO', 'Follow up', 'YES','YES', 'NO', 'Follow up','YES','YES', 'NO','YES','YES']
}
df = pd.DataFrame(employees)
所需的 output:
Name Dept Team Start End Weeks Total Calls Ave. Call time Sold Rejected more info
Mark 21 2 2020-02-19 2020-07-06 19.71 13 0.01207 7 4 2
我試圖應用的邏輯是(雖然我猜我在下面寫的語法中有錯誤,但我希望你仍然能夠理解計算):
嘗試將pd.NamedAgg
與groupby
結合使用:
df['Log'] = pd.to_datetime(df['Log'])
df['Call Duration'] = df['Call Duration'].astype(float)
df.groupby(['Name of Employee', 'Team', 'Department'])\
.agg(Start = ('Log','min'),
End = ('Log', 'max'),
Weeks = ('Log', lambda x: np.ptp(x) / np.timedelta64(7, 'D')),
Total_Calls = ('Log', 'count'),
Avg_Call_Time = ('Call Duration', 'mean'),
Sold = ('ITT', lambda x: (x == 'YES').sum()),
Rejected = ('ITT', lambda x: (x == 'NO').sum()),
More_info = ('ITT', lambda x: (x=='Follow up').sum()))
Output:
Start End Weeks Total_Calls Avg_Call_Time Sold Rejected More_info
Name of Employee Team Department
Mark 2 21 2020-02-19 09:01:17 2020-07-06 11:00:31 19.726114 13 0.012068 7 4 2
你有語法錯誤,你忘了在每個鍵的末尾加上逗號。 現在你可以處理這個 dataframe 了。
import pandas as pd
employees = {'Name=': ['Mark','Mark','Mark','Mark','Mark','Mark', 'Mark','Mark','Mark','Mark','Mark','Mark','Mark'],
'Department': ['21','21','21','21','21','21', '21','21','21','21','21','21','21'],
'Team': ['2','2','2','2','2','2','2','2','2','2','2','2','2'],
'Log': ['2020-02-19 09:01:17', '2020-02-19 09:54:02', '2020-04-10 11:00:31', '2020-04-11 12:39:08', '2020-04-18 09:45:22', '2020-05-05 09:01:17', '2020-05-23 09:54:02', '2020-07-03 11:00:31', '2020-07-03 12:39:08', '2020-07-04 09:45:22', '2020-07-05 09:01:17', '2020-07-06 09:54:02', '2020-07-06 11:00:31'],
'Call Duration' : ['0.01178', '0.01736','0.01923','0.00911','0.01007','0.01206','0.01256','0.01006','0.01162','0.00733','0.01250','0.01013','0.01308'],
'ITT': ['NO','YES', 'NO', 'Follow up', 'YES','YES', 'NO', 'Follow up','YES','YES', 'NO','YES','YES']
}
df = pd.DataFrame(employees)
print(df)
Output:-
Name Department ... Call Duration ITT
Mark 21 ... 0.01178 NO
Mark 21 ... 0.01736 YES
Mark 21 ... 0.01923 NO
Mark 21 ... 0.00911 Follow up
Mark 21 ... 0.01007 YES
Mark 21 ... 0.01206 YES
Mark 21 ... 0.01256 NO
Mark 21 ... 0.01006 Follow up
Mark 21 ... 0.01162 YES
Mark 21 ... 0.00733 YES
Mark 21 ... 0.01250 NO
Mark 21 ... 0.01013 YES
Mark 21 ... 0.01308 YES
[13 rows x 6 columns]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.