[英]Assign index from a different dataframe
I have two dataframes. 我有两个数据框。 I want to keep all cases in which an observation in dataset B is also in dataset A, but use the index from dataset A. 我想保留所有情况,其中数据集B中的观测值也位于数据集A中,但使用数据集A中的索引。
dfA: dfA:
Index some_var some_var2 match_var
AB x y 12
AC x y 13
AD x y 14
dfB: dfB:
Index Match_var some_var3 some_var4
1 12 z w
2 22 z w
3 14 z w
Desired outcome: 期望的结果:
Index some_var3 some_var4 match_var
AB z w 12
AD z w 14
The problem is that the actual data is too large to perform a merge and dropping unneeded columns and unmatched cases. 问题是实际数据太大,无法执行合并并删除不需要的列和不匹配的案例。 The memory usage exceeds 100GB RAM. 内存使用量超过100GB RAM。
I wanted to use dfC=dfB.loc[(dfB['Match_var'].isin(dfA['Match_var']))]
However, this makes me keep the index of dfB, while I need the one from dfA. 我想使用dfC=dfB.loc[(dfB['Match_var'].isin(dfA['Match_var']))]
但是,这使我保留了dfB的索引,而我却需要dfA的索引。
dfA.reset_index(inplace=True)
idx = dfB.loc[(dfB['Match_var'].isin(dfA['Match_var']))]
dfB.loc[idx, 'indexvar'] = dfA['Unnamed']
dfB.set_index(['indexvar'],inplace=True)
also does not work for some reason. 由于某种原因也无法正常工作。 the code seems to assign the index from dfA to the wrong observation in the new dataframe 代码似乎将dfA的索引分配给新数据帧中的错误观察值
IIUC 联合会
pd.concat([dfA.set_index('match_var'),dfB.set_index('Match_var')],join ='inner',axis=1)
Out[782]:
Index some_var some_var2 Index some_var3 some_var4
12 AB x y 1 z w
14 AD x y 3 z w
In order to get your output 为了得到你的输出
pd.concat([dfA.set_index('match_var')[['Index']],dfB.set_index('Match_var')[['some_var3','some_var4']]],join ='inner',axis=1).reset_index()
Out[788]:
index Index some_var3 some_var4
0 12 AB z w
1 14 AD z w
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.