[英]Python - lookup a value from another df multiple times
I have the following two dataframes:我有以下两个数据框:
prod_id land_ids
0 1 [1,2]
1 2 [1]
2 3 [2,3,4]
3 4 []
4 5 [3,4]
land_id land_desc
0 1 germany
1 2 austria
2 3 switzerland
3 4 italy
Bascially, I want all numbers in column land_ids to individually join the other df.基本上,我希望land_ids 列中的所有数字单独加入另一个df。
The result should look something like this:结果应如下所示:
prod_id land_ids list_land
0 1 [1,2] germany austria
1 2 [1] germany
2 3 [2,3,4] austria switzerland italy
3 4 []
4 5 [3,4] switzerland italy
Preferrably, the column list_land is one string where the lands are concatenated.优选地,列list_land 是连接土地的一个字符串。 But I would also be fine with getting a list as a result.但我也可以得到一个列表作为结果。
Any idea on how to do this?关于如何做到这一点的任何想法?
Here is my code for creating the df:这是我创建df的代码:
data_prod = {'prod_id': [1,2,3,4,5], 'land_ids': [[1,2],[1],[2,3,4],[1,3],[3,4]]}
prod_df = pd.DataFrame(data_prod)
data_land = {'land_id': [1,2,3,4], 'land_desc': ['germany', 'austria', 'switzerland', 'italy']}
land_df = pd.DataFrame(data_land)
EDIT: what do I have to add if one value of land_ids is empty?编辑:如果land_ids 的一个值为空,我必须添加什么?
df1 = pd.DataFrame({"prod_id":[1,2,3,4,5],"land_ids":[[1,2],[1],[2,3,4],[1,3],[3,4]]})
df2 = pd.DataFrame({"land_id":[1,2,3,4],"land_ids":["germany","austria","switzerland","italy"]})
df2 = df2.set_index('land_id', drop=True)
df1['list_land'] = df1['land_ids'].apply(lambda x: [df2.at[ids, 'land_desc'] for ids in x])
If you want to get list_land as a string, than you can do like this.如果您想将list_land作为字符串获取,则可以这样做。
df1['list_land'] = df1['land_ids'].apply(lambda x: " ".join([df2.at[ids, 'land_desc'] for ids in x]))
you can use the apply
method:你可以使用apply
方法:
prod_df['list_land'] = prod_df['land_ids'].apply(lambda x: [land_df.loc[land_df['land_id'] == y]['land_ids'].values[0] for y in x])
In this case, the list_land
column is a list.在这种情况下, list_land
列是一个列表。 You can use the following code if you want it to be a string.如果您希望它是一个字符串,您可以使用以下代码。
prod_df['list_land'] = prod_df['land_ids'].apply(lambda x: ' '.joind([land_df.loc[land_df['land_id'] == y]['land_ids'].values[0] for y in x]))
Maybe something like this:也许是这样的:
import pandas as pd
df1 = pd.DataFrame({"prod_id":[1,2,3,4,5],"land_ids":[[1,2],[1],[2,3,4],[1,3],[3,4]]})
df2 = pd.DataFrame({"land_id":[1,2,3,4],"land_ids":["germany","austria","switzerland","italy"]})
list_land = []
for index, row in df1.iterrows():
list_land.append([row2.land_ids for land_id in row["land_ids"] for _, row2 in df2.iterrows() if row2.land_id == land_id])
df1["list_land"] = list_land
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.