然后通过使用多个列合并Pandas Group

Question

I'm pretty new to Pandas but I'm trying to analyze a dataset of employee timestamps to determine the sum of unique daily timestamps per week. 我是Pandas的新手，但我试图分析员工时间戳的数据集，以确定每周唯一的每日时间戳的总和。

My initial dataframe (input1) looks like this (but much longer): 我的初始数据帧（input1）看起来像这样（但更长）：

            ID          Datetime        Week/Year
0          15.0    2019-02-04 08:28:44   6/2019
1          15.0    2019-02-04 12:48:05   6/2019
2          15.0    2019-02-04 12:54:29   6/2019
3          15.0    2019-02-05 08:05:51   6/2019
4          15.0    2019-02-05 12:47:26   6/2019
5          15.0    2019-02-05 14:45:34   6/2019
6          15.0    2019-02-06 08:10:59   6/2019
7          15.0    2019-02-06 12:49:24   6/2019
8          15.0    2019-02-06 13:02:48   6/2019
9          15.0    2019-02-07 08:02:22   6/2019
10         15.0    2019-02-08 08:02:10   6/2019
11         15.0    2019-02-08 09:55:22   6/2019

I created another dataframe: 我创建了另一个数据框：

df = pd.DataFrame({'Timestamp':  input1['Datetime'], 'ID': input1['ID'], 'Week/Year': input1['Week/Year'],'MDY':input1['Server Date/Time'].apply(lambda x: "%d/%d/%d" % (x.month, x.day, x.year))})

Then I grouped by Week, Employee, and got unique count per day (MDY): 然后，我按周，员工分组，并获得每天的唯一计数（MDY）：

df_grouped = df.groupby(['Week/Year', 'ID']).MDY.nunique()

Week/Year   ID    MDY 
6/2019      15.0   5

The end result I'm looking for is to merge the MDY sums back to the initial dataframe by joining on Week and ID - I tried a few different ways: 我想要的最终结果是通过加入Week和ID将MDY总和合并回初始数据帧-我尝试了几种不同的方法：

input1.merge(df_grouped.to_frame(), left_on=['ID','Week/Year'], right_index=True)

to get something like: 得到类似的东西：

           ID          Datetime        Week/Year    MDY
0          15.0    2019-02-04 08:28:44   6/2019    5
1          15.0    2019-02-04 12:48:05   6/2019    5
2          15.0    2019-02-04 12:54:29   6/2019    5
3          15.0    2019-02-05 08:05:51   6/2019    5
4          15.0    2019-02-05 12:47:26   6/2019    5
5          15.0    2019-02-05 14:45:34   6/2019    5

After the join I just end up getting NaN across the board. 加入之后，我最终得到了NaN的全面支持。 Anyone able to steer me in the right direction? 有人能够引导我朝正确的方向前进吗？

Thanks. 谢谢。

Answer 1

this groupby 这个groupby

df_grouped = df.groupby(['Week/Year', 'WD: Employee ID']).MDY.nunique()

should return a series has index as Week/Year WD: Employee ID 应该返回一个序列，其索引为Week/Year WD: Employee ID

Week/Year   WD: Employee ID
6/2019      15.0   5
Name: MDY , dtype: int64

However, you show its index as Week/Year ID . 但是，您将其索引显示为Week/Year ID 。 You may check columns name to make sure it match. 您可以检查列名以确保其匹配。

Next, on this 接下来，在此

input1.merge(df_grouped.to_frame(), left_on=['ID','Week/Year'], right_index=True)

Assume df_grouped has index as you show in example which is Week/Year ID , you have left_on wrong order against right_index . 假设df_grouped索引如示例中所示，即Week/Year ID ，则对right_index left_on顺序错误。 It should be 它应该是

input1.merge(df_grouped.to_frame(), left_on=['Week/Year', 'ID'], right_index=True)

然后通过使用多个列合并Pandas Group

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-06-04 18:10:32

然后通过使用多个列合并Pandas Group

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-06-04 18:10:32

解决方案1
1 已采纳 2019-06-04 18:10:32