简体   繁体   中英

Filter pandas based on paired conditions

I'm having trouble to to filter a dataframe using the result of groupby condition. I already tried to use an mask with isin() but it doesn't return only the paired conditions.

Suppose i have a dataset like this below:

    Dest    Origin  DepDelay
0   TPA     IAD     8.0
1   TPA     IAD     19.0
2   BWI     IND     8.0
4   BWI     IND     34.0
5   JAX     IND     25.0
6   LAS     IND     67.0
8   MCI     IND     2.0
10  MCO     IND     6.0
11  MCO     IND     94.0
... .... ... ...
7009710 ATL MSP     30.0

I wanna filter this using the result of groupby condition, where i got all top 5 routes. To obtain the routes i used this code:

top_5_route = flights_df[flights_df['DepDelay'] > 0].groupby(['Origin', 'Dest'])['Dest'].size().nlargest(5).index.to_list()

top_5_route:

[('LAX', 'SFO'),
('DAL', 'HOU'),
('SFO', 'LAX'),
('ORD', 'LGA'),
('HOU', 'DAL')]

i wanna filter this dataframe based on this labels to get only the paired conditions for "Origin - Dest". For example, the new dataframe should contain only values where origin is LAX and the Dest is SFO and the others paired conditions.

If i use isin method the dataframe also will contain values LAX - SFO , LAX-HOU . That does not match the paired condition.

Thanks!

IIUC,你需要过滤与你的元组列表匹配的行,使用:

df.set_index(['Dest','Origin']).loc[top_5_route].reset_index()

You can create a new column called 'Route':

flights_df['Route'] = flights_df['Origin'] + '-' + flights_df['Dest']

And then group by route to get the top 5 delayed routes

top_5_route = flights_df[flights_df['DepDelay'] > 0.0].groupby('Route').size().nlargest(5)

To filter the flights_df on these routes:

flights_df[flights_df['Route'].isin(top_5_route.keys())]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM