简体   繁体   English

从其他数据框分配索引

[英]Assign index from a different dataframe

I have two dataframes. 我有两个数据框。 I want to keep all cases in which an observation in dataset B is also in dataset A, but use the index from dataset A. 我想保留所有情况,其中数据集B中的观测值也位于数据集A中,但使用数据集A中的索引。

dfA: dfA:

 Index      some_var    some_var2    match_var
   AB          x           y           12
   AC          x           y           13
   AD          x           y           14

dfB: dfB:

 Index   Match_var   some_var3    some_var4    
   1       12          z           w           
   2       22          z           w           
   3       14          z           w    

Desired outcome: 期望的结果:

 Index      some_var3    some_var4    match_var
   AB          z           w           12
   AD          z           w           14

The problem is that the actual data is too large to perform a merge and dropping unneeded columns and unmatched cases. 问题是实际数据太大,无法执行合并并删除不需要的列和不匹配的案例。 The memory usage exceeds 100GB RAM. 内存使用量超过100GB RAM。

I wanted to use dfC=dfB.loc[(dfB['Match_var'].isin(dfA['Match_var']))] However, this makes me keep the index of dfB, while I need the one from dfA. 我想使用dfC=dfB.loc[(dfB['Match_var'].isin(dfA['Match_var']))]但是,这使我保留了dfB的索引,而我却需要dfA的索引。

dfA.reset_index(inplace=True)
idx = dfB.loc[(dfB['Match_var'].isin(dfA['Match_var']))]
dfB.loc[idx, 'indexvar'] = dfA['Unnamed']
dfB.set_index(['indexvar'],inplace=True)

also does not work for some reason. 由于某种原因也无法正常工作。 the code seems to assign the index from dfA to the wrong observation in the new dataframe 代码似乎将dfA的索引分配给新数据帧中的错误观察值

IIUC 联合会

pd.concat([dfA.set_index('match_var'),dfB.set_index('Match_var')],join ='inner',axis=1)
Out[782]: 
    Index some_var some_var2  Index some_var3 some_var4
12     AB        x         y      1         z         w
14     AD        x         y      3         z         w

In order to get your output 为了得到你的输出

pd.concat([dfA.set_index('match_var')[['Index']],dfB.set_index('Match_var')[['some_var3','some_var4']]],join ='inner',axis=1).reset_index()
Out[788]: 
   index Index some_var3 some_var4
0     12    AB         z         w
1     14    AD         z         w

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过索引将 Dataframe 中的 Header 中的值分配给 Dataframe 中的所有行 - Assign Values from a Header within Dataframe to All Rows in Dataframe by Index dataframe 通过键分配不同大小的其他 dataframe 的列 - dataframe assign columns from other dataframe with different size by key 数据框的索引与作为参数传递的列表不同 - Index of dataframe different from list that was passed as argument 从DataFrame提取的列具有不同的索引 - Column extracted from DataFrame has a different index 通过根据具有不同索引的唯一值将值从第一个数据帧更新到第二个数据帧来迭代每一行,否则追加并分配新的 ID - Iterate each row by updating values from 1st dataframe to 2nd dataframe based on unique value w/ different index, otherwise append and assign new ID Pandas 数据框 - 如何分配索引? - Pandas dataframe - how to assign index? 根据索引将值分配给数据框 - Assign values to dataframe based on index 如何从python中的列表中为数据框分配不同的值 - how to assign different value to dataframe from a list in python 将两个不同列表中的值按顺序分配给python中数据帧中的一列 - Sequentially assign values from two different lists to a column in a dataframe in python 将来自一个 Pandas DataFrame 的索引与不同 Pandas ZBA834BA059A9A379459C112175EB8 中的不同索引组合 - Combine an index from one Pandas DataFrame with a different index in a different Pandas DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM