[英]How to merge two columns of a dataframe based on values from a column in another dataframe?
I have a dataframe called df_location:我有一个名为 df_location 的 dataframe:
location = {'location_id': [1,2,3,4,5,6,7,8,9,10],
'temperature_value': [20,21,22,23,24,25,26,27,28,29],
'humidity_value':[60,61,62,63,64,65,66,67,68,69]}
df_location = pd.DataFrame(locations)
I have another dataframe called df_islands:我还有另一个名为 df_islands 的 dataframe:
islands = {'island_id':[10,20,30,40,50,60],
'list_of_locations':[[1],[2,3],[4,5],[6,7,8],[9],[10]]}
df_islands = pd.DataFrame(islands)
Each island_id corresponds to one or more locations.每个 island_id 对应一个或多个位置。 As you can see, the locations are stored in a list.
如您所见,位置存储在列表中。 What I'm trying to do is to search the list_of_locations for each unique location and merge it to df_location in a way where each island_id will correspond to a specific location.
我要做的是在 list_of_locations 中搜索每个唯一位置,并将其合并到 df_location 中,其中每个 island_id 都对应一个特定位置。
Final dataframe should be the following:最终 dataframe 应该如下:
merged = {'location_id': [1,2,3,4,5,6,7,8,9,10],
'temperature_value': [20,21,22,23,24,25,26,27,28,29],
'humidity_value':[60,61,62,63,64,65,66,67,68,69],
'island_id':[10,20,20,30,30,40,40,40,50,60]}
df_merged = pd.DataFrame(merged)
I don't know whether there is a method or function in python to do so.我不知道python中是否有方法或function可以这样做。 I would really appreciate it if someone can give me a solution to this problem.
如果有人能给我解决这个问题,我将不胜感激。
The df.apply()
method works here. df.apply()
方法在这里工作。 It's a bit long-winded but it works:这有点冗长,但它有效:
df_location['island_id'] = df_location['location_id'].apply(
lambda x: [
df_islands['island_id'][i] \
for i in df_islands.index \
if x in df_islands['list_of_locations'][i]
# comment above line and use this instead if list is stored in a string
# if x in eval(df_islands['list_of_locations'][i])
][0]
)
First we select the final value we want if the if statement is True: df_islands['island_id'][i]
首先我们 select 如果 if 语句为真,我们想要的最终值:
df_islands['island_id'][i]
Then we loop over each column in df_islands
by using df_islands.index
然后我们使用
df_islands.index
df_islands
中的每一列
Then create the if statement which loops over all values in df_islands['list_of_locations']
and returns True
if the value for df_location['location_id']
is in the list
.然后创建 if 语句,循环遍历
df_islands['list_of_locations']
中的所有值,如果df_location['location_id']
的值在list
中,则返回True
。
Finally, since we must contain this long statement in square brackets, it is a list.最后,由于我们必须在方括号中包含这个长语句,它是一个列表。 However, we know that there is only one value in the list so we can index it by using
[0]
at the end.但是,我们知道列表中只有一个值,因此我们可以在末尾使用
[0]
对其进行索引。
I hope this helps and happy for other editors to make the answer more legible!我希望这对其他编辑有所帮助和高兴,使答案更清晰!
print(df_location)
location_id temperature_value humidity_value island_id
0 1 20 60 10
1 2 21 61 20
2 3 22 62 20
3 4 23 63 30
4 5 24 64 30
5 6 25 65 40
6 7 26 66 40
7 8 27 67 40
8 9 28 68 50
9 10 29 69 60
The pandas method you're looking for to expand your df_islands
dataframe is .explode(column_name)
.您正在寻找扩展
df_islands
dataframe 的 pandas 方法是.explode(column_name)
。 From there, rename your column to location_id
and then join the dataframes using pd.merge()
.从那里,将您的列重命名为
location_id
,然后使用pd.merge()
加入数据框。 It'll perform a SQL-like join method using the location_id as the key.它将使用 location_id 作为键执行类似 SQL 的连接方法。
import pandas as pd
locations = {'location_id': [1,2,3,4,5,6,7,8,9,10],
'temperature_value': [20,21,22,23,24,25,26,27,28,29],
'humidity_value':[60,61,62,63,64,65,66,67,68,69]}
df_locations = pd.DataFrame(locations)
islands = {'island_id':[10,20,30,40,50,60],
'list_of_locations':[[1],[2,3],[4,5],[6,7,8],[9],[10]]}
df_islands = pd.DataFrame(islands)
df_islands = df_islands.explode(column='list_of_locations')
df_islands.columns = ['island_id', 'location_id']
pd.merge(df_locations, df_islands)
Out[]:
location_id temperature_value humidity_value island_id
0 1 20 60 10
1 2 21 61 20
2 3 22 62 20
3 4 23 63 30
4 5 24 64 30
5 6 25 65 40
6 7 26 66 40
7 8 27 67 40
8 9 28 68 50
9 10 29 69 60
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.