简体   繁体   English

在具有不同日期的多列上合并或联接两个数据框

[英]Merging or Joining Two Dataframes on Multiple Columns With Different Dates

I am trying to combine these two dataframes (df1 and df2): 我正在尝试合并这两个数据帧(df1和df2):

gmDate n pf pa 0 2012-10-31 ATL 0 0 1 2012-10-31 BKN 0 0 2 2012-10-31 BOS 107 120 3 2012-10-31 CHA 0 0 4 2012-10-31 CHI 0 0 5 2012-10-31 CLE 94 84 6 2012-10-31 DAL 99 91 7 2012-10-31 DEN 0 0 8 2012-10-31 DET 0 0 9 2012-10-31 GS 0 0

gmDate t tw tf ta o ow of oa 0 2012-10-30 WAS 0 0 0 CLE 1 0 0 1 2012-10-30 BOS 0 0 0 MIA 1 0 0 2 2012-10-30 DAL 1 0 0 LAL 0 0 0 3 2012-10-31 DEN 0 0 0 PHI 1 0 0 4 2012-10-31 IND 1 0 0 TOR 0 0 0 5 2012-10-31 HOU 1 0 0 DET 0 0 0 6 2012-10-31 SAC 0 0 0 CHI 1 0 0 7 2012-10-31 SA 1 0 0 NO 0 0 0 8 2012-10-31 DAL 0 0 0 UTA 1 0 0 9 2012-10-31 GS 1 0 0 PHO 0 0 0

I need pf and pa in df1 to populate into tf and ta or of and oa in df2 based on matching gmDate and n against t or o in df2. 我需要基于匹配gmDate和n相对于df2中t或o的df1中的pf和pa来填充到tf和ta或df2中的and oa中。 The df1 includes every day in the calendar, whether or not a team played that day, and df2 contains only the days a team played. df1包括日历中的每一天,无论该天是否参加了比赛,而df2仅包含该队参加的比赛天。 I have not been able to get a merge or join to work for me. 我无法合并或加入以为我工作。

Currently I have been trying to do this by running two separate for loops: 目前,我一直在尝试通过运行两个单独的for循环来做到这一点:

for s in range(0, len(df1)): for d in range(0, len(df2): if df1.iloc[s,0] == df2.iloc[d,0] and df1.iloc[s,1] == df2.iloc[d,1]: df2.iloc[d,3] = df1.iloc[s,2] df2.iloc[d,4] = df1.iloc[s,3]

and then: 接着:

for s in range(0, len(df1)): for d in range(0, len(df2): if df1.iloc[s,0] == df2.iloc[d,0] and df1.iloc[s,1] == df2.iloc[d,5]: df2.iloc[d,7] = df1.iloc[s,2] df2.iloc[d,8] = df1.iloc[s,3]

Each of them takes a VERY long time to run. 他们每个人都需要很长时间才能运行。 df1 has a length of 29,520 and df2 has a length of 7,379. df1的长度为29,520,而df2的长度为7,379。

Sorry if this is too confusing. 抱歉,这太令人困惑了。 I looking either the best way to do this with a merge/join or not have my loops run forever. 我正在寻找通过合并/联接执行此操作的最佳方法,或者没有让我的循环永远运行。

Thank you in advance for your help. 预先感谢您的帮助。

I think I have understood well what you want. 我想我很了解你想要什么。 My idea is that you can do this: 我的想法是,您可以执行以下操作:

you want to substitute in the columns tf and ta of the dataframe df2 with the columns of the dataframe df1 , pf and pa when the dates coincide that is: (df1 ['gmDate']. values) == (df2 ['gmDate']. values) 您想在日期重合时将数据df2 tfta列替换为数据df1pfpa的列,即:(df1 ['gmDate']。values)==(df2 ['gmDate' ]。值)

There you select the rows and column of df2 what do you want to replace, doing: 在其中,选择df2的行和列,然后执行以下操作:

df2.loc[(df1['gmDate'].values)==(df2['gmDate'].values),['tf','ta']]

what are these: 这些是什么:

    tf  ta
3   0   0
4   0   0
5   0   0
6   0   0
7   0   0
8   0   0
9   0   0

and assign to it: 并分配给它:

df1.loc[(df1['gmDate'].values)==(df2['gmDate'].values),['pf','pa']].values

what are these: 这些是什么:

array([[ 0,  0],
       [ 0,  0],
       [94, 84],
       [99, 91],
       [ 0,  0],
       [ 0,  0],
       [ 0,  0]])

Doing it also in the opposite case you get the code: 在相反的情况下也这样做,您将得到代码:

df2.loc[(df1['gmDate'].values)==(df2['gmDate'].values),['tf','ta']]=df1.loc[(df1['gmDate'].values)==(df2['gmDate'].values),['pf','pa']].values
df2.loc[(df1['gmDate'].values)!=(df2['gmDate'].values),['of','oa']]=df1.loc[(df1['gmDate'].values)!=(df2['gmDate'].values),['pf','pa']].values

df2 Output: df2输出:

    gmDate      t   tw  tf  ta  o   ow  of  oa
0   2012-10-30  WAS 0   0   0   CLE 1   0   0
1   2012-10-30  BOS 0   0   0   MIA 1   0   0
2   2012-10-30  DAL 1   0   0   LAL 0   107 120
3   2012-10-31  DEN 0   0   0   PHI 1   0   0
4   2012-10-31  IND 1   0   0   TOR 0   0   0
5   2012-10-31  HOU 1   94  84  DET 0   0   0
6   2012-10-31  SAC 0   99  91  CHI 1   0   0
7   2012-10-31  SA  1   0   0   NO  0   0   0
8   2012-10-31  DAL 0   0   0   UTA 1   0   0
9   2012-10-31  GS  1   0   0   PHO 0   0   0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM