[英]Assign value on large pandas dataframe based on index stored on two others dataframes
I have three dataframes that I am comparing, where I have stored several data, one where is the information of my interest, which is the one I want to complete.我有三个要比较的数据框,其中存储了多个数据,其中一个是我感兴趣的信息,这是我想要完成的。 The second one where is the column with the coordinates that I want to add to my general dataframe and the third one where are stored the indexes of the two previous dataframes where the values correspond.
第二个是我想添加到我的通用数据帧的坐标列,第三个是存储值对应的前两个数据帧的索引。
It is a little confusing, but I put an example where you can see it better:这有点令人困惑,但我举了一个例子,你可以更好地看到它:
Dataframe 1:数据框 1:
index![]() |
n_tree ![]() |
---|---|
247 ![]() |
1 ![]() |
248 ![]() |
2 ![]() |
Dataframe 2:数据框 2:
index![]() |
coords![]() |
---|---|
1400 ![]() |
(20,47) ![]() |
1401 ![]() |
(30,85) ![]() |
dataframe 3:数据框 3:
index![]() |
index_dataframe_1 ![]() |
index_dataframe_2 ![]() |
---|---|---|
0 ![]() |
247 ![]() |
1401 ![]() |
My intention is that my general dataframe contains the correct coordinate column.我的意图是我的通用数据框包含正确的坐标列。 as follow:
如下:
index![]() |
n_tree ![]() |
coords![]() |
---|---|---|
247 ![]() |
1 ![]() |
(30,85) ![]() |
I have tried to assign it with .iloc, .loc, .at but I get the following error:我试图用 .iloc、.loc、.at 分配它,但出现以下错误:
for idx, rw in dataframe_3.iterrows():
coords = dataframe_1.loc[rw.index_dataframe_2, "coords"]
dataframe_2.loc[int(rw.index_dataframe_1), "coords"] = coords
ValueError: Must have equal len keys and value when setting with an iterable. ValueError:使用可迭代对象设置时必须具有相等的 len 键和值。
You can perform two merges:您可以执行两个合并:
(df3.merge(df1, left_on='index_dataframe_1', right_index=True)
.merge(df2, left_on='index_dataframe_2', right_index=True)
[['n_tree', 'coords']]
)
output:输出:
n_tree coords
index
0 1 (30,85)
inputs:输入:
>>> df1
n_tree
index
247 1
248 2
>>> df2
coords
index
1400 (20,47)
1401 (30,85)
>>> df3
index_dataframe_1 index_dataframe_2
index
0 247 1401
Use 2 inner joins by .merge()
:通过
.merge()
使用 2 个内部连接:
(Assuming index
in your dataframes are data columns instead of row indexes): (假设数据框中的
index
是数据列而不是行索引):
df_out = (df1.merge(df3, left_on='index', right_on='index_dataframe_1', suffixes=('', '_y'))
.merge(df2, left_on='index_dataframe_2', right_on='index', suffixes=('', '_z'))
)
df_out = df_out[['index', 'n_tree', 'coords']]
Result:结果:
print(df_out)
index n_tree coords
0 247 1 (30,85)
I think this could work for you:我认为这对你有用:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'index': [247, 248], 'n_tree': [1, 2]}).set_index('index')
df2 = pd.DataFrame({'index': [1400, 1401], 'coords': [(20,47), (30,85)]}).set_index('index')
df3 = pd.DataFrame({'index': [0], 'index_dataframe_1': [247], 'index_dataframe_2': [1401]}).set_index('index')
mapping = dict(zip(df3.index_dataframe_1, df3.index_dataframe_2))
l = list()
for i in df1.index:
m = mapping.get(i, np.nan)
if m is not np.nan:
l.append(df2.at[m, 'coords'])
else:
l.append(np.nan)
df1['coords'] = l
print(df1)
Result:结果:
n_tree coords
index
247 1 (30, 85)
248 2 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.