简体   繁体   English

根据另一个 dataframe 的日期范围进行过滤

[英]Filtering based on date ranges from another dataframe

I have two pandas dataframes as following:我有两个 pandas 数据帧如下:

df1: df1:

id  date        item
3   2015-11-23  B
3   2015-11-23  A
3   2016-05-11  C
3   2017-02-01  C
3   2018-07-12  E
4   2014-05-11  C
4   2015-02-01  C
4   2018-07-12  E

df2 df2

id  start       end            
3   2016-05-11  2017-08-30
4   2015-01-11  2017-08-22

I would like to cut df1 such that I only keep items of df1 which falls within the date ranges given in df2 :我想削减df1这样我只保留属于df2中给出的日期范围内的 df1 项目:

id  date        item
3   2016-05-11  C
3   2017-02-01  C
4   2015-02-01  C

In reality, df1 and df2 are of millions of rows and therefore, I won't be able to do any quick fixes using for loops for example.实际上,df1 和 df2 有数百万行,因此,我无法使用 for 循环等进行任何快速修复。 I have rough idea of using groupby by id, but I am afraid all my tries have failed so far.我有按 id 使用 groupby 的粗略想法,但恐怕到目前为止我所有的尝试都失败了。

Thank you in advance!先感谢您!

The basic way is to build a dataframe containing all possible events for that id .基本方法是构建一个 dataframe ,其中包含该id的所有可能事件。 You can then filter on whether that event is between your two dates.然后您可以过滤该事件是否在您的两个日期之间。

df3 = df1.merge(df2, how='inner', left_on='id', right_on='id')

df3[(df3['date'] <= df3['end']) & (df3['date'] >= df3['date'])]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据另一个 dataframe 的日期范围对多个 Dask dataframe 进行切片的最快方法 - Fastest way to slice multiple Dask dataframe based on the date ranges from another dataframe 根据另一个dataframe过滤dataframe - Filtering dataframe based on another dataframe 根据来自另一个 DataFrame 的标准过滤 Pandas 中的 DataFrame - Filtering DataFrame in pandas based on criteria from another DataFrame 根据时间范围重新采样 dataframe,忽略日期 - Resample dataframe based on time ranges, ignoring date 根据另一个数据框的组范围来解释数据框列的范围 - Interpret range from dataframe column based on group ranges from another dataframe 一种高效(快速)的方法,可以根据从Python Pandas中另一个DataFrame获取的范围将一个DataFrame中的连续数据分组? - Efficient (fast) way to group continuous data in one DataFrame based on ranges taken from another DataFrame in Python Pandas? 根据日期与另一个DataFrame之间的日期加入DataFrame - Join DataFrame based on date which is between dates from another DataFrame 根据日期和时间过滤数据框(在2个单独的列中) - filtering dataframe based on date and time (in 2 separate columns) 基于streamlit date_input 过滤数据框 - Filtering dataframe based on streamlit date_input 基于多个日期条件过滤数据框 - Filtering Dataframe based on Multiple Date Conditions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM