简体   繁体   English

如何根据另一个 dataframe 中的列的值合并 dataframe 的两列?

[英]How to merge two columns of a dataframe based on values from a column in another dataframe?

I have a dataframe called df_location:我有一个名为 df_location 的 dataframe:

location = {'location_id': [1,2,3,4,5,6,7,8,9,10],
            'temperature_value': [20,21,22,23,24,25,26,27,28,29],
            'humidity_value':[60,61,62,63,64,65,66,67,68,69]}
df_location = pd.DataFrame(locations)

I have another dataframe called df_islands:我还有另一个名为 df_islands 的 dataframe:

islands = {'island_id':[10,20,30,40,50,60],
          'list_of_locations':[[1],[2,3],[4,5],[6,7,8],[9],[10]]}
df_islands = pd.DataFrame(islands)

Each island_id corresponds to one or more locations.每个 island_id 对应一个或多个位置。 As you can see, the locations are stored in a list.如您所见,位置存储在列表中。 What I'm trying to do is to search the list_of_locations for each unique location and merge it to df_location in a way where each island_id will correspond to a specific location.我要做的是在 list_of_locations 中搜索每个唯一位置,并将其合并到 df_location 中,其中每个 island_id 都对应一个特定位置。

Final dataframe should be the following:最终 dataframe 应该如下:

merged = {'location_id': [1,2,3,4,5,6,7,8,9,10],
                'temperature_value': [20,21,22,23,24,25,26,27,28,29],
                'humidity_value':[60,61,62,63,64,65,66,67,68,69],
                'island_id':[10,20,20,30,30,40,40,40,50,60]}
df_merged = pd.DataFrame(merged)

I don't know whether there is a method or function in python to do so.我不知道python中是否有方法或function可以这样做。 I would really appreciate it if someone can give me a solution to this problem.如果有人能给我解决这个问题,我将不胜感激。

The df.apply() method works here. df.apply()方法在这里工作。 It's a bit long-winded but it works:这有点冗长,但它有效:

df_location['island_id'] = df_location['location_id'].apply(
    lambda x: [
        df_islands['island_id'][i] \
        for i in df_islands.index \
        if x in df_islands['list_of_locations'][i]     
   
        # comment above line and use this instead if list is stored in a string
        # if x in eval(df_islands['list_of_locations'][i])        
        ][0]
)

First we select the final value we want if the if statement is True: df_islands['island_id'][i]首先我们 select 如果 if 语句为真,我们想要的最终值: df_islands['island_id'][i]

Then we loop over each column in df_islands by using df_islands.index然后我们使用df_islands.index df_islands中的每一列

Then create the if statement which loops over all values in df_islands['list_of_locations'] and returns True if the value for df_location['location_id'] is in the list .然后创建 if 语句,循环遍历df_islands['list_of_locations']中的所有值,如果df_location['location_id']的值在list中,则返回True

Finally, since we must contain this long statement in square brackets, it is a list.最后,由于我们必须在方括号中包含这个长语句,它是一个列表。 However, we know that there is only one value in the list so we can index it by using [0] at the end.但是,我们知道列表中只有一个值,因此我们可以在末尾使用[0]对其进行索引。

I hope this helps and happy for other editors to make the answer more legible!我希望这对其他编辑有所帮助和高兴,使答案更清晰!

print(df_location)

   location_id  temperature_value  humidity_value  island_id
0            1                 20              60         10
1            2                 21              61         20
2            3                 22              62         20
3            4                 23              63         30
4            5                 24              64         30
5            6                 25              65         40
6            7                 26              66         40
7            8                 27              67         40
8            9                 28              68         50
9           10                 29              69         60

The pandas method you're looking for to expand your df_islands dataframe is .explode(column_name) .您正在寻找扩展df_islands dataframe 的 pandas 方法是.explode(column_name) From there, rename your column to location_id and then join the dataframes using pd.merge() .从那里,将您的列重命名为location_id ,然后使用pd.merge()加入数据框。 It'll perform a SQL-like join method using the location_id as the key.它将使用 location_id 作为键执行类似 SQL 的连接方法。

import pandas as pd

locations = {'location_id': [1,2,3,4,5,6,7,8,9,10],
            'temperature_value': [20,21,22,23,24,25,26,27,28,29],
            'humidity_value':[60,61,62,63,64,65,66,67,68,69]}
df_locations = pd.DataFrame(locations)

islands = {'island_id':[10,20,30,40,50,60],
          'list_of_locations':[[1],[2,3],[4,5],[6,7,8],[9],[10]]}
df_islands = pd.DataFrame(islands)

df_islands = df_islands.explode(column='list_of_locations')

df_islands.columns = ['island_id', 'location_id']

pd.merge(df_locations, df_islands)
Out[]:
  location_id  temperature_value  humidity_value  island_id
0           1                 20              60         10
1           2                 21              61         20
2           3                 22              62         20
3           4                 23              63         30
4           5                 24              64         30
5           6                 25              65         40
6           7                 26              66         40
7           8                 27              67         40
8           9                 28              68         50
9          10                 29              69         60

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将数据框中的列合并到另一个数据框 - how to merge columns from a dataframe to another dataframe 如果 pyspark dataframe 基于两列中的值在另一个 dataframe 中,如何删除它们的行? - How to drop rows of a pyspark dataframe if they're in another dataframe based on the values from two columns? 如何通过这些列扩展列并与另一个 dataframe 合并? - How to expand a column and merge with another dataframe by these columns? 如何通过使用实际数据帧中两列中的值索引另一个数据帧来在实际数据帧中创建列 - How can I create a column in an actual dataframe by indexing another dataframe using the values in two columns from the actual dataframe 如何根据另一个列值对两个数据框列求和 - How to sum two dataframe columns based on another column value 如何根据条件将 dataframe 中的列合并到另一个列? - how to merge a column from a dataframe to another based on a condition? 根据另一个数据帧的列值的条件将数据添加到数据帧中的列 - Adding data to columns in a dataframe based on condition on column values of another dataframe 根据另一个 dataframe 中的列值创建 dataframe 列 - Create a dataframe column based on values that are columns in another dataframe 如何将两个数据帧中的两列合并为一个新数据帧(pandas)的一列? - How to merge the two columns from two dataframe into one column of a new dataframe (pandas)? Pandas:使用基于两列的另一个数据帧中的值替换一个数据帧中的值 - Pandas: replace values in one dataframe with values from another dataframe based on two columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM