简体   繁体   中英

Merge 2 data frames based on matching rows of 2 columns with Pandas

I have a very important problem that needs to be solved for a project!

So I have 2 data frames that look like these ones: The first Dataframe is:

Date            Winner      Loser         Tournament
2007-01-01      Luczak P.   Hrbaty D.     Grandslam
2007-01-02      Serra F.    Johansson J.  Grandslam
2007-01-02      ......      ......

The second Dataframe is:

Date            Winner      Loser          Tournament
2007-05-28      Federer R.  Russel M.      Grandslam
2007-05-28      Ascione T.  Cilic M.       Grandslam
2007-05-28      ......      ......

The two data frames have the same number of rows corresponding to the same matches from the same period even though the first one starts from 2007-01-01 and the other from 2007-05-28. I checked it by looking at the excel files which I imported to build the two data frames (from different sources).

The problem is that one Dataframe (the first one) gives me the exact date for each match while the other Datframe (second one) sets the date for each row as the starting period of the tournament and not the exact date that match was played. So I cannot merge the two data frames based on Date values.

However, I know for sure that the couples of Winner and Loser for each row are the same so I wanted to merge the two data frames based on the rows in which the winner and the players are the same .

Does anybody knows how I can do this? Thanks in advance!

You can do it by merge_asof :

df = pd.merge_asof(df1.sort_values('Date'), 
                   df2.sort_values('Date'), on='Date', by=['Winner','Loser'])
df= pd.merge(df1,df2,how='inner',left_on=['Winner','Loser'],right_on=['Winner','Loser'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM