如何通过查看一个数据框中的日期落在另一个数据框中的日期范围内来合并Pandas数据框？

Question

I have two dataframes that have employee data as below. 我有两个具有员工数据的数据框，如下所示。 One data file has employee data including dates on which employees were sick, and the other data file has dates on which employees worked (ie presented as date ranges). 一个数据文件包含员工数据，其中包括员工生病的日期，而另一个数据文件包含员工工作的日期（即，显示为日期范围）。 I would like to combine the two files (hopefully in pandas) by looking at where the "sick day" for a particular employee falls in a "work range". 我想通过查看特定雇员的“病假”在“工作范围”内的位置来合并这两个文件（希望是在熊猫中）。 For example, in image/data below, employee 1 was sick on 11/25/2015, 12/23/2015, and 10/12/2015. 例如，在下面的图像/数据中，员工1在11/25 / 2015、12 / 23/2015和10/12/2015患病。 These fall in the "work ranges" 11/21/2015 - 11/29/2015, 12/21/2015 - 12/29/2015, and 10/9/2015 - 10/17/2015, respectively. 这些分别属于“工作范围”，分别为11/21/2015-11/29 / 2015、12 / 21/2015-12/29/2015和10/9/2015-10/17/2015。

EMPLOYEE WORK DATES DATA: 员工工作日期数据：

 ╔══════════╦════════════╦════════════╗ ║ Employee ║ datein ║ dateout ║ ╠══════════╬════════════╬════════════╣ ║ 1 ║ 11/21/2015 ║ 11/29/2015 ║ ║ 2 ║ 12/9/2015 ║ 12/14/2015 ║ ║ 3 ║ 11/10/2015 ║ 11/19/2015 ║ ║ 4 ║ 11/11/2015 ║ 11/17/2015 ║ ║ 5 ║ 11/30/2015 ║ 12/8/2015 ║ ║ 1 ║ 12/21/2015 ║ 12/29/2015 ║ ║ 2 ║ 1/7/2016 ║ 1/12/2016 ║ ║ 3 ║ 12/10/2015 ║ 12/19/2015 ║ ║ 4 ║ 12/10/2015 ║ 12/16/2015 ║ ║ 5 ║ 12/30/2015 ║ 1/7/2016 ║ ║ 1 ║ 10/9/2015 ║ 10/17/2015 ║ ║ 2 ║ 10/27/2015 ║ 11/1/2015 ║ ║ 3 ║ 9/28/2015 ║ 10/7/2015 ║ ║ 4 ║ 9/29/2015 ║ 10/5/2015 ║ ╚══════════╩════════════╩════════════╝

EMPLOYEE SICK DATES DATA: 员工病假日期数据：

 ╔══════════╦════════════╦═══════════╗ ║ Employee ║ sickDate ║ sickness ║ ╠══════════╬════════════╬═══════════╣ ║ 1 ║ 11/25/2015 ║ flu ║ ║ 10 ║ 11/21/2015 ║ hd ║ ║ 21 ║ 9/20/2015 ║ other ║ ║ 1 ║ 12/23/2015 ║ other ║ ║ 4 ║ 12/13/2015 ║ vacationx ║ ║ 7 ║ 7/21/2015 ║ cough ║ ║ 3 ║ 10/1/2015 ║ rash ║ ║ 4 ║ 10/5/2015 ║ other ║ ║ 5 ║ 1/7/2016 ║ eyex ║ ║ 2 ║ 12/12/2015 ║ tanx ║ ║ 1 ║ 10/12/2015 ║ fatiguex ║ ╚══════════╩════════════╩═══════════╝

CONSOLIDATED DATA: 合并数据：

 ╔══════════╦════════════╦════════════╦════════════╦═══════════╗ ║ Employee ║ datein ║ dateout ║ sickDate ║ sickness ║ ╠══════════╬════════════╬════════════╬════════════╬═══════════╣ ║ 1 ║ 11/21/2015 ║ 11/29/2015 ║ 11/25/2015 ║ flu ║ ║ 2 ║ 12/9/2015 ║ 12/14/2015 ║ 12/12/2015 ║ tanx ║ ║ 3 ║ 11/10/2015 ║ 11/19/2015 ║ ║ ║ ║ 4 ║ 11/11/2015 ║ 11/17/2015 ║ ║ ║ ║ 5 ║ 11/30/2015 ║ 12/8/2015 ║ ║ ║ ║ 1 ║ 12/21/2015 ║ 12/29/2015 ║ 12/23/2015 ║ other ║ ║ 2 ║ 1/7/2016 ║ 1/12/2016 ║ ║ ║ ║ 3 ║ 12/10/2015 ║ 12/19/2015 ║ ║ ║ ║ 4 ║ 12/10/2015 ║ 12/16/2015 ║ 12/13/2015 ║ vacationx ║ ║ 5 ║ 12/30/2015 ║ 1/7/2016 ║ 1/7/2016 ║ eyex ║ ║ 1 ║ 10/9/2015 ║ 10/17/2015 ║ 10/12/2015 ║ fatiguex ║ ║ 2 ║ 10/27/2015 ║ 11/1/2015 ║ ║ ║ ║ 3 ║ 9/28/2015 ║ 10/7/2015 ║ 10/1/2015 ║ rash ║ ║ 4 ║ 9/29/2015 ║ 10/5/2015 ║ 10/5/2015 ║ other ║ ╚══════════╩════════════╩════════════╩════════════╩═══════════╝

How do I do that in pandas or python? 如何在Pandas或python中做到这一点？ (Thank you for your help!) （谢谢您的帮助！）

Answer 1

You need to put this data to pd.DataFrame( ... ) as df1 and set_index('Employee') 您需要将此数据作为df1和set_index('Employee')放入pd.DataFrame( ... ) set_index('Employee')

 ╔══════════╦════════════╦════════════╗ ║ Employee ║ datein ║ dateout ║ ╠══════════╬════════════╬════════════╣ ║ 1 ║ 11/21/2015 ║ 11/29/2015 ║ ║ 2 ║ 12/9/2015 ║ 12/14/2015 ║ ║ 3 ║ 11/10/2015 ║ 11/19/2015 ║ ║ 4 ║ 11/11/2015 ║ 11/17/2015 ║ ║ 5 ║ 11/30/2015 ║ 12/8/2015 ║ ║ 1 ║ 12/21/2015 ║ 12/29/2015 ║ ║ 2 ║ 1/7/2016 ║ 1/12/2016 ║ ║ 3 ║ 12/10/2015 ║ 12/19/2015 ║ ║ 4 ║ 12/10/2015 ║ 12/16/2015 ║ ║ 5 ║ 12/30/2015 ║ 1/7/2016 ║ ║ 1 ║ 10/9/2015 ║ 10/17/2015 ║ ║ 2 ║ 10/27/2015 ║ 11/1/2015 ║ ║ 3 ║ 9/28/2015 ║ 10/7/2015 ║ ║ 4 ║ 9/29/2015 ║ 10/5/2015 ║ ╚══════════╩════════════╩════════════╝

Then put this data to pd.DataFrame( ... ) as df2 and set_index('Employee') 然后将此数据作为df2和set_index('Employee')放入pd.DataFrame( ... ) set_index('Employee')

 ╔══════════╦════════════╦═══════════╗ ║ Employee ║ sickDate ║ sickness ║ ╠══════════╬════════════╬═══════════╣ ║ 1 ║ 11/25/2015 ║ flu ║ ║ 10 ║ 11/21/2015 ║ hd ║ ║ 21 ║ 9/20/2015 ║ other ║ ║ 1 ║ 12/23/2015 ║ other ║ ║ 4 ║ 12/13/2015 ║ vacationx ║ ║ 7 ║ 7/21/2015 ║ cough ║ ║ 3 ║ 10/1/2015 ║ rash ║ ║ 4 ║ 10/5/2015 ║ other ║ ║ 5 ║ 1/7/2016 ║ eyex ║ ║ 2 ║ 12/12/2015 ║ tanx ║ ║ 1 ║ 10/12/2015 ║ fatiguex ║ ╚══════════╩════════════╩═══════════╝

Finally, df = df1.join(df2).reset_index() 最后， df = df1.join(df2).reset_index()

Answer 2

Consider an inner and outer pandas merge approach. 考虑内部和外部大熊猫合并方法。 Below assumes dates are in datetime formats which may require conversion from string objects: 下面假设日期为datetime格式，可能需要从字符串对象进行转换：

workdf['datein'] = pd.to_datetime(workdf['datein'])
workdf['dateout'] = pd.to_datetime(workdf['dateout'])
sickdf['sickDate'] = pd.to_datetime(sickdf['sickDate'])

# INNER MERGE ON BOTH DFs WHERE SICK DAYS REPEAT FOR MATCHING EMPLOYEE ROW IN WORK DAYS
mergedf = pd.merge(workdf, sickdf, on='Employee', how="inner")

# OUTER MERGE TO KEEP ALL WORK DAY RECORDS WITH FILTERED SICK DAYS DATA SET
finaldf = pd.merge(mergedf[(mergedf['sickDate'] - mergedf['datein'] >= 0) &
                           (mergedf['dateout'] - mergedf['sickDate'] >= 0)],
                   workdf, on=['Employee', 'datein', 'dateout'], how="outer")

finaldf = finaldf.sort(['Employee','datein','dateout']).reset_index(drop=True)

Result 结果

#    Employee     datein      dateout     sickDate   sickness
#0          1 2015-10-09   2015-10-17   2015-10-12   fatiguex
#1          1 2015-11-21   2015-11-29   2015-11-25        flu
#2          1 2015-12-21   2015-12-29   2015-12-23      other
#3          2 2015-10-27   2015-11-01          NaT        NaN
#4          2 2015-12-09   2015-12-14   2015-12-12       tanx
#5          2 2016-01-07   2016-01-12          NaT        NaN
#6          3 2015-09-28   2015-10-07   2015-10-01       rash
#7          3 2015-11-10   2015-11-19          NaT        NaN
#8          3 2015-12-10   2015-12-19          NaT        NaN
#9          4 2015-09-29   2015-10-05   2015-10-05      other
#10         4 2015-11-11   2015-11-17          NaT        NaN
#11         4 2015-12-10   2015-12-16   2015-12-13  vacationx
#12         5 2015-11-30   2015-12-08          NaT        NaN
#13         5 2015-12-30   2016-01-07   2016-01-07       eyex

如何通过查看一个数据框中的日期落在另一个数据框中的日期范围内来合并Pandas数据框？

问题描述

2 个解决方案

解决方案1
0 2016-01-16 22:44:58

解决方案2
0 2016-01-17 00:50:53

如何通过查看一个数据框中的日期落在另一个数据框中的日期范围内来合并Pandas数据框？

问题描述

2 个解决方案

解决方案1 0 2016-01-16 22:44:58

解决方案2 0 2016-01-17 00:50:53

解决方案1
0 2016-01-16 22:44:58

解决方案2
0 2016-01-17 00:50:53