pandas - 合并两个数据帧覆盖并指定要保留的列

Question

Im trying to merge to panda dataframes, although what I want may not actually be a merge. 我试图合并到熊猫数据帧，虽然我想要的可能实际上不是合并。

I have a two columns in two frames that match, one column shares unique values that can be used to join. 我有两个匹配的两列，一列共享可用于连接的唯一值。 the other column has one empty field and one populated one. 另一列有一个空字段和一个填充的字段。

I want to overwrite the emtpy fields whilst matching on the unique fields but only keep the column thats overwritten, I do not want the rest of the columns from the second DataFrame. 我想在匹配唯一字段时覆盖emtpy字段，但只保留被覆盖的列，我不希望第二个DataFrame中的其余列。

hopefully the below will explain a little further 希望以下内容能够进一步解释

>>> animals = [{"animal" : "dog", "name" : "freddy", "food" : ""},{"animal" : "cat", "name" : "dexter", "food" : ""},{"animal" : "dog", "name" : "lou lou", "food" : ""}]
>>> foods = [{"name" : "freddy", "food" : "dog mix", "brand" : "doggys dog"},{"name" : "dexter", "food" : "fussy cat mix", "brand" : "fish fishy"},{"name" : "lou lou", "food" : "bones", "brand" : "i was a cow"}]
>>> a_pd = pd.DataFrame(animals)
>>> a_pd
  animal food     name
0    dog        freddy
1    cat        dexter
2    dog       lou lou
>>> f_pd = pd.DataFrame(foods)
>>> f_pd
         brand           food     name
0   doggys dog        dog mix   freddy
1   fish fishy  fussy cat mix   dexter
2  i was a cow          bones  lou lou
>>>
>>>
>>> animal_data = a_pd.merge(f_pd, on='name', how='left')
>>> animal_data
  animal food_x     name        brand         food_y
0    dog          freddy   doggys dog        dog mix
1    cat          dexter   fish fishy  fussy cat mix
2    dog         lou lou  i was a cow          bones
>>>

I should just have food and I dont want the brand (also to note this is sample data and the live data has a lot more columns 我应该有食物，我不想要品牌（还要注意这是样本数据，实时数据有更多的列

desired results 期望的结果

>>> animal_data
  animal        name            food
0    dog      freddy         dog mix
1    cat      dexter   fussy cat mix
2    dog     lou lou           bones

Answer 1

Use: 采用：

animal_data = a_pd.merge(f_pd, on='name', how='left', suffixes=('_x','')).drop('food_x', axis=1)

Output: 输出：

  animal     name        brand           food
0    dog   freddy   doggys dog        dog mix
1    cat   dexter   fish fishy  fussy cat mix
2    dog  lou lou  i was a cow          bones

Or 要么

a_pd[['animal','name']].merge(f_pd, how='left')

Output: 输出：

  animal     name        brand           food
0    dog   freddy   doggys dog        dog mix
1    cat   dexter   fish fishy  fussy cat mix
2    dog  lou lou  i was a cow          bones

Answer 2

You can using update 您可以使用update

a_pd.set_index('name',inplace=True)
a_pd.update(f_pd.set_index('name'))
a_pd
Out[68]: 
        animal           food
name                         
freddy     dog        dog mix
dexter     cat  fussy cat mix
lou lou    dog          bones
a_pd.reset_index()
Out[69]: 
      name animal           food
0   freddy    dog        dog mix
1   dexter    cat  fussy cat mix
2  lou lou    dog          bones

Or we using map 或者我们使用map

a_pd.food=a_pd.name.map(f_pd.set_index('name').food)
a_pd
Out[74]: 
  animal           food     name
0    dog        dog mix   freddy
1    cat  fussy cat mix   dexter
2    dog          bones  lou lou

Answer 3

I'd either try drop or just selecting columns you want to keep: 我要么尝试drop要么只选择要保留的列：

animal_data.drop(['food_x', 'brand'], axis=1, inplace=True)

or 要么

animal_data = animal_data[['animal', 'name', 'food']]

Answer 4

It might be best to merge views of the dataframes that do not contain the columns you don't want in the merged dataframe. 最好合并不包含合并数据框中不需要的列的数据框的视图。 For example: 例如：

a_cols = ['animal', 'name']
f_cols = ['food', 'name']
a_pd[a_cols].merge(f_pd[f_cols], on='name', how='left')

This may be faster and may save you some memory if working with extremely large dataframes, as only the relevant columns are carried forward in the merge. 这可能更快，如果使用非常大的数据帧，可能会节省一些内存，因为只有相关的列在合并中结转。

pandas - 合并两个数据帧覆盖并指定要保留的列

问题描述

4 个解决方案

解决方案1
3 已采纳 2018-09-11 14:50:55

解决方案2
3 2018-09-11 14:51:07

解决方案3
2 2018-09-11 14:53:25

解决方案4
2 2018-09-11 14:55:54

pandas - 合并两个数据帧覆盖并指定要保留的列

问题描述

4 个解决方案

解决方案1 3 已采纳 2018-09-11 14:50:55

解决方案2 3 2018-09-11 14:51:07

解决方案3 2 2018-09-11 14:53:25

解决方案4 2 2018-09-11 14:55:54

解决方案1
3 已采纳 2018-09-11 14:50:55

解决方案2
3 2018-09-11 14:51:07

解决方案3
2 2018-09-11 14:53:25

解决方案4
2 2018-09-11 14:55:54