[英]pandas - merge two data frames overwrite and specify which columns to keep
我试图合并到熊猫数据帧,虽然我想要的可能实际上不是合并。
我有两个匹配的两列,一列共享可用于连接的唯一值。 另一列有一个空字段和一个填充的字段。
我想在匹配唯一字段时覆盖emtpy字段,但只保留被覆盖的列,我不希望第二个DataFrame中的其余列。
希望以下内容能够进一步解释
>>> animals = [{"animal" : "dog", "name" : "freddy", "food" : ""},{"animal" : "cat", "name" : "dexter", "food" : ""},{"animal" : "dog", "name" : "lou lou", "food" : ""}]
>>> foods = [{"name" : "freddy", "food" : "dog mix", "brand" : "doggys dog"},{"name" : "dexter", "food" : "fussy cat mix", "brand" : "fish fishy"},{"name" : "lou lou", "food" : "bones", "brand" : "i was a cow"}]
>>> a_pd = pd.DataFrame(animals)
>>> a_pd
animal food name
0 dog freddy
1 cat dexter
2 dog lou lou
>>> f_pd = pd.DataFrame(foods)
>>> f_pd
brand food name
0 doggys dog dog mix freddy
1 fish fishy fussy cat mix dexter
2 i was a cow bones lou lou
>>>
>>>
>>> animal_data = a_pd.merge(f_pd, on='name', how='left')
>>> animal_data
animal food_x name brand food_y
0 dog freddy doggys dog dog mix
1 cat dexter fish fishy fussy cat mix
2 dog lou lou i was a cow bones
>>>
我应该有食物,我不想要品牌(还要注意这是样本数据,实时数据有更多的列
期望的结果
>>> animal_data
animal name food
0 dog freddy dog mix
1 cat dexter fussy cat mix
2 dog lou lou bones
采用:
animal_data = a_pd.merge(f_pd, on='name', how='left', suffixes=('_x','')).drop('food_x', axis=1)
输出:
animal name brand food
0 dog freddy doggys dog dog mix
1 cat dexter fish fishy fussy cat mix
2 dog lou lou i was a cow bones
要么
a_pd[['animal','name']].merge(f_pd, how='left')
输出:
animal name brand food
0 dog freddy doggys dog dog mix
1 cat dexter fish fishy fussy cat mix
2 dog lou lou i was a cow bones
您可以使用update
a_pd.set_index('name',inplace=True)
a_pd.update(f_pd.set_index('name'))
a_pd
Out[68]:
animal food
name
freddy dog dog mix
dexter cat fussy cat mix
lou lou dog bones
a_pd.reset_index()
Out[69]:
name animal food
0 freddy dog dog mix
1 dexter cat fussy cat mix
2 lou lou dog bones
或者我们使用map
a_pd.food=a_pd.name.map(f_pd.set_index('name').food)
a_pd
Out[74]:
animal food name
0 dog dog mix freddy
1 cat fussy cat mix dexter
2 dog bones lou lou
我要么尝试drop
要么只选择要保留的列:
animal_data.drop(['food_x', 'brand'], axis=1, inplace=True)
要么
animal_data = animal_data[['animal', 'name', 'food']]
最好合并不包含合并数据框中不需要的列的数据框的视图。 例如:
a_cols = ['animal', 'name']
f_cols = ['food', 'name']
a_pd[a_cols].merge(f_pd[f_cols], on='name', how='left')
这可能更快,如果使用非常大的数据帧,可能会节省一些内存,因为只有相关的列在合并中结转。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.