[英]Merging or Joining Two Dataframes on Multiple Columns With Different Dates
I am trying to combine these two dataframes (df1 and df2): 我正在尝试合并这两个数据帧(df1和df2):
gmDate n pf pa 0 2012-10-31 ATL 0 0 1 2012-10-31 BKN 0 0 2 2012-10-31 BOS 107 120 3 2012-10-31 CHA 0 0 4 2012-10-31 CHI 0 0 5 2012-10-31 CLE 94 84 6 2012-10-31 DAL 99 91 7 2012-10-31 DEN 0 0 8 2012-10-31 DET 0 0 9 2012-10-31 GS 0 0
gmDate t tw tf ta o ow of oa 0 2012-10-30 WAS 0 0 0 CLE 1 0 0 1 2012-10-30 BOS 0 0 0 MIA 1 0 0 2 2012-10-30 DAL 1 0 0 LAL 0 0 0 3 2012-10-31 DEN 0 0 0 PHI 1 0 0 4 2012-10-31 IND 1 0 0 TOR 0 0 0 5 2012-10-31 HOU 1 0 0 DET 0 0 0 6 2012-10-31 SAC 0 0 0 CHI 1 0 0 7 2012-10-31 SA 1 0 0 NO 0 0 0 8 2012-10-31 DAL 0 0 0 UTA 1 0 0 9 2012-10-31 GS 1 0 0 PHO 0 0 0
I need pf and pa in df1 to populate into tf and ta or of and oa in df2 based on matching gmDate and n against t or o in df2. 我需要基于匹配gmDate和n相对于df2中t或o的df1中的pf和pa来填充到tf和ta或df2中的and oa中。 The df1 includes every day in the calendar, whether or not a team played that day, and df2 contains only the days a team played.
df1包括日历中的每一天,无论该天是否参加了比赛,而df2仅包含该队参加的比赛天。 I have not been able to get a merge or join to work for me.
我无法合并或加入以为我工作。
Currently I have been trying to do this by running two separate for loops: 目前,我一直在尝试通过运行两个单独的for循环来做到这一点:
for s in range(0, len(df1)): for d in range(0, len(df2): if df1.iloc[s,0] == df2.iloc[d,0] and df1.iloc[s,1] == df2.iloc[d,1]: df2.iloc[d,3] = df1.iloc[s,2] df2.iloc[d,4] = df1.iloc[s,3]
and then: 接着:
for s in range(0, len(df1)): for d in range(0, len(df2): if df1.iloc[s,0] == df2.iloc[d,0] and df1.iloc[s,1] == df2.iloc[d,5]: df2.iloc[d,7] = df1.iloc[s,2] df2.iloc[d,8] = df1.iloc[s,3]
Each of them takes a VERY long time to run. 他们每个人都需要很长时间才能运行。 df1 has a length of 29,520 and df2 has a length of 7,379.
df1的长度为29,520,而df2的长度为7,379。
Sorry if this is too confusing. 抱歉,这太令人困惑了。 I looking either the best way to do this with a merge/join or not have my loops run forever.
我正在寻找通过合并/联接执行此操作的最佳方法,或者没有让我的循环永远运行。
Thank you in advance for your help. 预先感谢您的帮助。
I think I have understood well what you want. 我想我很了解你想要什么。 My idea is that you can do this:
我的想法是,您可以执行以下操作:
you want to substitute in the columns tf
and ta
of the dataframe df2
with the columns of the dataframe df1
, pf
and pa
when the dates coincide that is: (df1 ['gmDate']. values) == (df2 ['gmDate']. values) 您想在日期重合时将数据
df2
tf
和ta
列替换为数据df1
, pf
和pa
的列,即:(df1 ['gmDate']。values)==(df2 ['gmDate' ]。值)
There you select the rows and column of df2
what do you want to replace, doing: 在其中,选择
df2
的行和列,然后执行以下操作:
df2.loc[(df1['gmDate'].values)==(df2['gmDate'].values),['tf','ta']]
what are these: 这些是什么:
tf ta
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
and assign to it: 并分配给它:
df1.loc[(df1['gmDate'].values)==(df2['gmDate'].values),['pf','pa']].values
what are these: 这些是什么:
array([[ 0, 0],
[ 0, 0],
[94, 84],
[99, 91],
[ 0, 0],
[ 0, 0],
[ 0, 0]])
Doing it also in the opposite case you get the code: 在相反的情况下也这样做,您将得到代码:
df2.loc[(df1['gmDate'].values)==(df2['gmDate'].values),['tf','ta']]=df1.loc[(df1['gmDate'].values)==(df2['gmDate'].values),['pf','pa']].values
df2.loc[(df1['gmDate'].values)!=(df2['gmDate'].values),['of','oa']]=df1.loc[(df1['gmDate'].values)!=(df2['gmDate'].values),['pf','pa']].values
df2 Output: df2输出:
gmDate t tw tf ta o ow of oa
0 2012-10-30 WAS 0 0 0 CLE 1 0 0
1 2012-10-30 BOS 0 0 0 MIA 1 0 0
2 2012-10-30 DAL 1 0 0 LAL 0 107 120
3 2012-10-31 DEN 0 0 0 PHI 1 0 0
4 2012-10-31 IND 1 0 0 TOR 0 0 0
5 2012-10-31 HOU 1 94 84 DET 0 0 0
6 2012-10-31 SAC 0 99 91 CHI 1 0 0
7 2012-10-31 SA 1 0 0 NO 0 0 0
8 2012-10-31 DAL 0 0 0 UTA 1 0 0
9 2012-10-31 GS 1 0 0 PHO 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.