简体   繁体   English

基于与 Pandas 的 2 列匹配行合并 2 个数据框

[英]Merge 2 data frames based on matching rows of 2 columns with Pandas

I have a very important problem that needs to be solved for a project!我有一个非常重要的问题需要为一个项目解决!

So I have 2 data frames that look like these ones: The first Dataframe is:所以我有 2 个看起来像这样的数据框:第一个数据框是:

Date            Winner      Loser         Tournament
2007-01-01      Luczak P.   Hrbaty D.     Grandslam
2007-01-02      Serra F.    Johansson J.  Grandslam
2007-01-02      ......      ......

The second Dataframe is:第二个数据框是:

Date            Winner      Loser          Tournament
2007-05-28      Federer R.  Russel M.      Grandslam
2007-05-28      Ascione T.  Cilic M.       Grandslam
2007-05-28      ......      ......

The two data frames have the same number of rows corresponding to the same matches from the same period even though the first one starts from 2007-01-01 and the other from 2007-05-28.即使第一个从 2007-01-01 开始,另一个从 2007-05-28 开始,这两个数据帧的行数相同,对应于同一时期的相同匹配项。 I checked it by looking at the excel files which I imported to build the two data frames (from different sources).我通过查看我为构建两个数据框(来自不同来源)而导入的 excel 文件进行了检查。

The problem is that one Dataframe (the first one) gives me the exact date for each match while the other Datframe (second one) sets the date for each row as the starting period of the tournament and not the exact date that match was played.问题是一个 Dataframe(第一个)为我提供了每场比赛的确切日期,而另一个 Dataframe(第二个)将每一行的日期设置为比赛的开始时间,而不是比赛的确切日期。 So I cannot merge the two data frames based on Date values.所以我无法根据日期值合并两个数据框。

However, I know for sure that the couples of Winner and Loser for each row are the same so I wanted to merge the two data frames based on the rows in which the winner and the players are the same .但是,我肯定知道每一行的获胜者和失败者是相同的,所以我想根据获胜者和玩家相同的行合并两个数据帧

Does anybody knows how I can do this?有谁知道我怎么能做到这一点? Thanks in advance!提前致谢!

You can do it by merge_asof :你可以通过merge_asof来做到:

df = pd.merge_asof(df1.sort_values('Date'), 
                   df2.sort_values('Date'), on='Date', by=['Winner','Loser'])
df= pd.merge(df1,df2,how='inner',left_on=['Winner','Loser'],right_on=['Winner','Loser'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM