簡體   English   中英

單行匯總pandas dataframe

[英]Summarising pandas dataframe in single row

我希望得到一些幫助,將下面詳述的 dataframe 總結成一行摘要,如頁面下方所需的 output 所示。 提前謝謝了。

employees = {'Name of Employee': ['Mark','Mark','Mark','Mark','Mark','Mark', 'Mark','Mark','Mark','Mark','Mark','Mark','Mark'],
                         'Department': ['21','21','21','21','21','21', '21','21','21','21','21','21','21'],
                         'Team': ['2','2','2','2','2','2','2','2','2','2','2','2','2'],
                         'Log': ['2020-02-19 09:01:17', '2020-02-19 09:54:02', '2020-04-10 11:00:31', '2020-04-11 12:39:08', '2020-04-18 09:45:22', '2020-05-05 09:01:17', '2020-05-23 09:54:02', '2020-07-03 11:00:31', '2020-07-03 12:39:08', '2020-07-04 09:45:22', '2020-07-05 09:01:17', '2020-07-06 09:54:02', '2020-07-06 11:00:31'],
                         'Call Duration' : ['0.01178', '0.01736','0.01923','0.00911','0.01007','0.01206','0.01256','0.01006','0.01162','0.00733','0.01250','0.01013','0.01308'],
                         'ITT': ['NO','YES', 'NO', 'Follow up', 'YES','YES', 'NO', 'Follow up','YES','YES', 'NO','YES','YES']
                        }
            
df = pd.DataFrame(employees)
    

所需的 output:

Name  Dept  Team     Start      End      Weeks Total Calls  Ave. Call time  Sold  Rejected  more info
Mark   21    2    2020-02-19 2020-07-06  19.71      13          0.01207       7       4        2

我試圖應用的邏輯是(雖然我猜我在下面寫的語法中有錯誤,但我希望你仍然能夠理解計算):

  • 開始 = df['Log'] 中的最小日期
  • End = df['Log'] 中的最大日期
  • 周數 = (df['log'] 中的最大日期 - df['Log'] 中的最小日期)/7
  • 總呼叫數 = df['Log'].count
  • 平均通話時間 = (df['Call Duration'].sum)/(df['Log'].count)
  • 已售出 = (df['ITT']=='YES').count
  • 拒絕 = (df['ITT']=='NO').count
  • 更多信息 = (df['ITT']=='跟進').count

嘗試將pd.NamedAgggroupby結合使用:

df['Log'] = pd.to_datetime(df['Log'])
df['Call Duration'] = df['Call Duration'].astype(float)

df.groupby(['Name of Employee', 'Team', 'Department'])\
  .agg(Start = ('Log','min'),
       End = ('Log', 'max'),
        Weeks = ('Log', lambda x: np.ptp(x) / np.timedelta64(7, 'D')),
        Total_Calls = ('Log', 'count'),
        Avg_Call_Time = ('Call Duration', 'mean'),
        Sold = ('ITT', lambda x: (x == 'YES').sum()),
        Rejected = ('ITT', lambda x: (x == 'NO').sum()),
        More_info = ('ITT', lambda x: (x=='Follow up').sum()))

Output:

                                               Start                 End      Weeks  Total_Calls  Avg_Call_Time  Sold  Rejected  More_info
Name of Employee Team Department                                                                                                          
Mark             2    21         2020-02-19 09:01:17 2020-07-06 11:00:31  19.726114           13       0.012068     7         4          2

你有語法錯誤,你忘了在每個鍵的末尾加上逗號。 現在你可以處理這個 dataframe 了。

import pandas as pd
    employees = {'Name=': ['Mark','Mark','Mark','Mark','Mark','Mark', 'Mark','Mark','Mark','Mark','Mark','Mark','Mark'],
                             'Department': ['21','21','21','21','21','21', '21','21','21','21','21','21','21'],
                             'Team': ['2','2','2','2','2','2','2','2','2','2','2','2','2'],
                             'Log': ['2020-02-19 09:01:17', '2020-02-19 09:54:02', '2020-04-10 11:00:31', '2020-04-11 12:39:08', '2020-04-18 09:45:22', '2020-05-05 09:01:17', '2020-05-23 09:54:02', '2020-07-03 11:00:31', '2020-07-03 12:39:08', '2020-07-04 09:45:22', '2020-07-05 09:01:17', '2020-07-06 09:54:02', '2020-07-06 11:00:31'],
                             'Call Duration' : ['0.01178', '0.01736','0.01923','0.00911','0.01007','0.01206','0.01256','0.01006','0.01162','0.00733','0.01250','0.01013','0.01308'],
                             'ITT': ['NO','YES', 'NO', 'Follow up', 'YES','YES', 'NO', 'Follow up','YES','YES', 'NO','YES','YES']
                            }
                
    df = pd.DataFrame(employees)
    print(df)

Output:-

              Name  Department  ... Call Duration        ITT
              Mark         21  ...       0.01178         NO
              Mark         21  ...       0.01736        YES
              Mark         21  ...       0.01923         NO
              Mark         21  ...       0.00911  Follow up
              Mark         21  ...       0.01007        YES
              Mark         21  ...       0.01206        YES
              Mark         21  ...       0.01256         NO
              Mark         21  ...       0.01006  Follow up
              Mark         21  ...       0.01162        YES
              Mark         21  ...       0.00733        YES
              Mark         21  ...       0.01250         NO
              Mark         21  ...       0.01013        YES
              Mark         21  ...       0.01308        YES

[13 rows x 6 columns]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM