如何根据另一个 dataframe 中的列的值合并 dataframe 的两列？

Question

I have a dataframe called df_location:我有一个名为 df_location 的 dataframe：

location = {'location_id': [1,2,3,4,5,6,7,8,9,10],
            'temperature_value': [20,21,22,23,24,25,26,27,28,29],
            'humidity_value':[60,61,62,63,64,65,66,67,68,69]}
df_location = pd.DataFrame(locations)

I have another dataframe called df_islands:我还有另一个名为 df_islands 的 dataframe：

islands = {'island_id':[10,20,30,40,50,60],
          'list_of_locations':[[1],[2,3],[4,5],[6,7,8],[9],[10]]}
df_islands = pd.DataFrame(islands)

Each island_id corresponds to one or more locations.每个 island_id 对应一个或多个位置。 As you can see, the locations are stored in a list.如您所见，位置存储在列表中。 What I'm trying to do is to search the list_of_locations for each unique location and merge it to df_location in a way where each island_id will correspond to a specific location.我要做的是在 list_of_locations 中搜索每个唯一位置，并将其合并到 df_location 中，其中每个 island_id 都对应一个特定位置。

Final dataframe should be the following:最终 dataframe 应该如下：

merged = {'location_id': [1,2,3,4,5,6,7,8,9,10],
                'temperature_value': [20,21,22,23,24,25,26,27,28,29],
                'humidity_value':[60,61,62,63,64,65,66,67,68,69],
                'island_id':[10,20,20,30,30,40,40,40,50,60]}
df_merged = pd.DataFrame(merged)

I don't know whether there is a method or function in python to do so.我不知道python中是否有方法或function可以这样做。 I would really appreciate it if someone can give me a solution to this problem.如果有人能给我解决这个问题，我将不胜感激。

Answer 1

The df.apply() method works here. df.apply()方法在这里工作。 It's a bit long-winded but it works:这有点冗长，但它有效：

df_location['island_id'] = df_location['location_id'].apply(
    lambda x: [
        df_islands['island_id'][i] \
        for i in df_islands.index \
        if x in df_islands['list_of_locations'][i]     
   
        # comment above line and use this instead if list is stored in a string
        # if x in eval(df_islands['list_of_locations'][i])        
        ][0]
)

First we select the final value we want if the if statement is True: df_islands['island_id'][i]首先我们 select 如果 if 语句为真，我们想要的最终值： df_islands['island_id'][i]

Then we loop over each column in df_islands by using df_islands.index然后我们使用df_islands.index df_islands中的每一列

Then create the if statement which loops over all values in df_islands['list_of_locations'] and returns True if the value for df_location['location_id'] is in the list .然后创建 if 语句，循环遍历df_islands['list_of_locations']中的所有值，如果df_location['location_id']的值在list中，则返回True 。

Finally, since we must contain this long statement in square brackets, it is a list.最后，由于我们必须在方括号中包含这个长语句，它是一个列表。 However, we know that there is only one value in the list so we can index it by using [0] at the end.但是，我们知道列表中只有一个值，因此我们可以在末尾使用[0]对其进行索引。

I hope this helps and happy for other editors to make the answer more legible!我希望这对其他编辑有所帮助和高兴，使答案更清晰！

print(df_location)

   location_id  temperature_value  humidity_value  island_id
0            1                 20              60         10
1            2                 21              61         20
2            3                 22              62         20
3            4                 23              63         30
4            5                 24              64         30
5            6                 25              65         40
6            7                 26              66         40
7            8                 27              67         40
8            9                 28              68         50
9           10                 29              69         60

Answer 2

The pandas method you're looking for to expand your df_islands dataframe is .explode(column_name) .您正在寻找扩展df_islands dataframe 的 pandas 方法是.explode(column_name) 。 From there, rename your column to location_id and then join the dataframes using pd.merge() .从那里，将您的列重命名为location_id ，然后使用pd.merge()加入数据框。 It'll perform a SQL-like join method using the location_id as the key.它将使用 location_id 作为键执行类似 SQL 的连接方法。

import pandas as pd

locations = {'location_id': [1,2,3,4,5,6,7,8,9,10],
            'temperature_value': [20,21,22,23,24,25,26,27,28,29],
            'humidity_value':[60,61,62,63,64,65,66,67,68,69]}
df_locations = pd.DataFrame(locations)

islands = {'island_id':[10,20,30,40,50,60],
          'list_of_locations':[[1],[2,3],[4,5],[6,7,8],[9],[10]]}
df_islands = pd.DataFrame(islands)

df_islands = df_islands.explode(column='list_of_locations')

df_islands.columns = ['island_id', 'location_id']

pd.merge(df_locations, df_islands)

Out[]:
  location_id  temperature_value  humidity_value  island_id
0           1                 20              60         10
1           2                 21              61         20
2           3                 22              62         20
3           4                 23              63         30
4           5                 24              64         30
5           6                 25              65         40
6           7                 26              66         40
7           8                 27              67         40
8           9                 28              68         50
9          10                 29              69         60

如何根据另一个 dataframe 中的列的值合并 dataframe 的两列？

问题描述

2 个解决方案

解决方案1
0 2020-07-02 12:29:43

解决方案2
0 2020-07-02 12:32:22

如何根据另一个 dataframe 中的列的值合并 dataframe 的两列？

问题描述

2 个解决方案

解决方案1 0 2020-07-02 12:29:43

解决方案2 0 2020-07-02 12:32:22

解决方案1
0 2020-07-02 12:29:43

解决方案2
0 2020-07-02 12:32:22