简体   繁体   中英

Subset pandas dataframe on multiple columns based on values from another dataframe

I have two dataframes as

import pandas as pd
points = pd.DataFrame({'player':['a','b','c','d','e'],'points':[2,5,3,6,1]})
matches = pd.DataFrame({'p1':['a','c','e'], 'p2':['c', 'b', 'd']})

I want to retain only those rows from dataframe matches where both p1 and p2 have points greater than 2. Right now I am first merging points and matches on p1 and player then merging resulting dataframe and points on p2 and player. After this applying filter on both points columns of resulting dataframe.

new_df = pd.merge(matches, points, how = 'left', left_on = 'p1', right_on = 'player')
new_df = pd.merge(new_df, points, how = 'left', left_on = 'p2', right_on = 'player')
new_df = new_df[(new_df.points_x >2) & (new_df.points_y >2)]

This gives me what I require but I was wondering what would be a better and efficient way to do this?

I would avoid the joins in this case and write it like this:

scorers = points.query('points > 2').player
matches.query('p1 in @scorers and p2 in @scorers')

I think it's more readable.

It feels a little silly to benchmark on such a small example, but on my machine this method runs on average in 2.99ms while your original method takes 4.45ms. It would be interesting to find if this scales better or not.

I don't know if there are other micro optimizations you could make to this code like converting scorers to a set.

If you don't like the query syntax:

scorers = points[points.points > 2].player
matches[matches.p1.isin(scorers) & matches.p2.isin(scorers)]

This has better performance as well, taking about 1.36ms.

As an alternative, you can construct a series mapping players to points, then use pd.Series.map for each series in matches :

s = points.set_index('player')['points']
res = matches.loc[matches.apply(lambda x: x.map(s)).gt(2).all(1)]

print(res)

  p1 p2
1  c  b

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM