从其他数据框分配索引

Question

I have two dataframes. 我有两个数据框。 I want to keep all cases in which an observation in dataset B is also in dataset A, but use the index from dataset A. 我想保留所有情况，其中数据集B中的观测值也位于数据集A中，但使用数据集A中的索引。

dfA: dfA：

 Index      some_var    some_var2    match_var
   AB          x           y           12
   AC          x           y           13
   AD          x           y           14

dfB: dfB：

 Index   Match_var   some_var3    some_var4    
   1       12          z           w           
   2       22          z           w           
   3       14          z           w

Desired outcome: 期望的结果：

 Index      some_var3    some_var4    match_var
   AB          z           w           12
   AD          z           w           14

The problem is that the actual data is too large to perform a merge and dropping unneeded columns and unmatched cases. 问题是实际数据太大，无法执行合并并删除不需要的列和不匹配的案例。 The memory usage exceeds 100GB RAM. 内存使用量超过100GB RAM。

I wanted to use dfC=dfB.loc[(dfB['Match_var'].isin(dfA['Match_var']))] However, this makes me keep the index of dfB, while I need the one from dfA. 我想使用dfC=dfB.loc[(dfB['Match_var'].isin(dfA['Match_var']))]但是，这使我保留了dfB的索引，而我却需要dfA的索引。

dfA.reset_index(inplace=True)
idx = dfB.loc[(dfB['Match_var'].isin(dfA['Match_var']))]
dfB.loc[idx, 'indexvar'] = dfA['Unnamed']
dfB.set_index(['indexvar'],inplace=True)

also does not work for some reason. 由于某种原因也无法正常工作。 the code seems to assign the index from dfA to the wrong observation in the new dataframe 代码似乎将dfA的索引分配给新数据帧中的错误观察值

Answer 1

IIUC 联合会

pd.concat([dfA.set_index('match_var'),dfB.set_index('Match_var')],join ='inner',axis=1)
Out[782]: 
    Index some_var some_var2  Index some_var3 some_var4
12     AB        x         y      1         z         w
14     AD        x         y      3         z         w

In order to get your output 为了得到你的输出

pd.concat([dfA.set_index('match_var')[['Index']],dfB.set_index('Match_var')[['some_var3','some_var4']]],join ='inner',axis=1).reset_index()
Out[788]: 
   index Index some_var3 some_var4
0     12    AB         z         w
1     14    AD         z         w

从其他数据框分配索引

问题描述

1 个解决方案

解决方案1
0 2018-02-16 22:13:59

从其他数据框分配索引

问题描述

1 个解决方案

解决方案1 0 2018-02-16 22:13:59

解决方案1
0 2018-02-16 22:13:59