简体   繁体   中英

Pandas filter dataframe based on date range and another column

I have a pandas dataframe called df1 and would like to filter the dataframe based on conditions in dataframe df2 , where for a specific grp_id , I only want the dates starting from column year in df2 up to the most recent year (2016) as shown in df3 . This is just a subset of my data in which I have at least 10 unique grp_id to subset that have different start years.

df1

       db_id           cert_status grp_id       year   cap prov
130   IX-011  not-certified member     SD 2004-01-01  30.0   KB
131   IX-011  not-certified member     SD 2005-01-01  30.0   KB
132   IX-011  not-certified member     SD 2006-01-01  30.0   KB
133   IX-011  not-certified member     SD 2007-01-01  30.0   KB
134   IX-011  not-certified member     SD 2008-01-01  30.0   KB
135   IX-011  not-certified member     SD 2009-01-01  30.0   KB
136   IX-011  not-certified member     SD 2010-01-01  30.0   KB
137   IX-011  not-certified member     SD 2011-01-01  30.0   KB
138   IX-011  not-certified member     SD 2012-01-01  30.0   KB
139   IX-011  not-certified member     SD 2013-01-01  30.0   KB
140   IX-011  not-certified member     SD 2014-01-01  30.0   KB
141   IX-011  not-certified member     SD 2015-01-01  30.0   KB
142   IX-011  not-certified member     SD 2016-01-01  30.0   KB
208   IX-017  not-certified member     CG 2004-01-01  30.0   KB
209   IX-017  not-certified member     CG 2005-01-01  30.0   KB
210   IX-017  not-certified member     CG 2006-01-01  30.0   KB
211   IX-017  not-certified member     CG 2007-01-01  30.0   KB
212   IX-017  not-certified member     CG 2008-01-01  30.0   KB
213   IX-017  not-certified member     CG 2009-01-01  30.0   KB
214   IX-017  not-certified member     CG 2010-01-01  30.0   KB
215   IX-017  not-certified member     CG 2011-01-01  30.0   KB
216   IX-017  not-certified member     CG 2012-01-01  30.0   KB
217   IX-017  not-certified member     CG 2013-01-01  80.0   KB
218   IX-017  not-certified member     CG 2014-01-01  30.0   KB
219   IX-017  not-certified member     CG 2015-01-01  30.0   KB
220   IX-017  not-certified member     CG 2016-01-01  30.0   KB

df2

   grp_id member       year
4     SD       Y 2007-01-01
6     CG       Y 2011-01-01

df3

       db_id           cert_status grp_id       year   cap prov
133   IX-011  not-certified member     SD 2007-01-01  30.0   KB
134   IX-011  not-certified member     SD 2008-01-01  30.0   KB
135   IX-011  not-certified member     SD 2009-01-01  30.0   KB
136   IX-011  not-certified member     SD 2010-01-01  30.0   KB
137   IX-011  not-certified member     SD 2011-01-01  30.0   KB
138   IX-011  not-certified member     SD 2012-01-01  30.0   KB
139   IX-011  not-certified member     SD 2013-01-01  30.0   KB
140   IX-011  not-certified member     SD 2014-01-01  30.0   KB
141   IX-011  not-certified member     SD 2015-01-01  30.0   KB
142   IX-011  not-certified member     SD 2016-01-01  30.0   KB
215   IX-017  not-certified member     CG 2011-01-01  30.0   KB
216   IX-017  not-certified member     CG 2012-01-01  30.0   KB
217   IX-017  not-certified member     CG 2013-01-01  80.0   KB
218   IX-017  not-certified member     CG 2014-01-01  30.0   KB
219   IX-017  not-certified member     CG 2015-01-01  30.0   KB
220   IX-017  not-certified member     CG 2016-01-01  30.0   KB

What would be the easiest and quickest way to go about doing this?

Try using merge with query to filter:

df1.merge(df2, on = ['grp_id'], suffixes=('','_2'), right_index=True)\
   .query('year >= year_2')[df1.columns]

Output:

      db_id           cert_status grp_id        year   cap prov
133  IX-011  not-certified member     SD  2007-01-01  30.0   KB
134  IX-011  not-certified member     SD  2008-01-01  30.0   KB
135  IX-011  not-certified member     SD  2009-01-01  30.0   KB
136  IX-011  not-certified member     SD  2010-01-01  30.0   KB
137  IX-011  not-certified member     SD  2011-01-01  30.0   KB
138  IX-011  not-certified member     SD  2012-01-01  30.0   KB
139  IX-011  not-certified member     SD  2013-01-01  30.0   KB
140  IX-011  not-certified member     SD  2014-01-01  30.0   KB
141  IX-011  not-certified member     SD  2015-01-01  30.0   KB
142  IX-011  not-certified member     SD  2016-01-01  30.0   KB
215  IX-017  not-certified member     CG  2011-01-01  30.0   KB
216  IX-017  not-certified member     CG  2012-01-01  30.0   KB
217  IX-017  not-certified member     CG  2013-01-01  80.0   KB
218  IX-017  not-certified member     CG  2014-01-01  30.0   KB
219  IX-017  not-certified member     CG  2015-01-01  30.0   KB
220  IX-017  not-certified member     CG  2016-01-01  30.0   KB

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM