[英]pandas - merge two data frames overwrite and specify which columns to keep
Im trying to merge to panda dataframes, although what I want may not actually be a merge. 我试图合并到熊猫数据帧,虽然我想要的可能实际上不是合并。
I have a two columns in two frames that match, one column shares unique values that can be used to join. 我有两个匹配的两列,一列共享可用于连接的唯一值。 the other column has one empty field and one populated one. 另一列有一个空字段和一个填充的字段。
I want to overwrite the emtpy fields whilst matching on the unique fields but only keep the column thats overwritten, I do not want the rest of the columns from the second DataFrame. 我想在匹配唯一字段时覆盖emtpy字段,但只保留被覆盖的列,我不希望第二个DataFrame中的其余列。
hopefully the below will explain a little further 希望以下内容能够进一步解释
>>> animals = [{"animal" : "dog", "name" : "freddy", "food" : ""},{"animal" : "cat", "name" : "dexter", "food" : ""},{"animal" : "dog", "name" : "lou lou", "food" : ""}]
>>> foods = [{"name" : "freddy", "food" : "dog mix", "brand" : "doggys dog"},{"name" : "dexter", "food" : "fussy cat mix", "brand" : "fish fishy"},{"name" : "lou lou", "food" : "bones", "brand" : "i was a cow"}]
>>> a_pd = pd.DataFrame(animals)
>>> a_pd
animal food name
0 dog freddy
1 cat dexter
2 dog lou lou
>>> f_pd = pd.DataFrame(foods)
>>> f_pd
brand food name
0 doggys dog dog mix freddy
1 fish fishy fussy cat mix dexter
2 i was a cow bones lou lou
>>>
>>>
>>> animal_data = a_pd.merge(f_pd, on='name', how='left')
>>> animal_data
animal food_x name brand food_y
0 dog freddy doggys dog dog mix
1 cat dexter fish fishy fussy cat mix
2 dog lou lou i was a cow bones
>>>
I should just have food and I dont want the brand (also to note this is sample data and the live data has a lot more columns 我应该有食物,我不想要品牌(还要注意这是样本数据,实时数据有更多的列
desired results 期望的结果
>>> animal_data
animal name food
0 dog freddy dog mix
1 cat dexter fussy cat mix
2 dog lou lou bones
Use: 采用:
animal_data = a_pd.merge(f_pd, on='name', how='left', suffixes=('_x','')).drop('food_x', axis=1)
Output: 输出:
animal name brand food
0 dog freddy doggys dog dog mix
1 cat dexter fish fishy fussy cat mix
2 dog lou lou i was a cow bones
Or 要么
a_pd[['animal','name']].merge(f_pd, how='left')
Output: 输出:
animal name brand food
0 dog freddy doggys dog dog mix
1 cat dexter fish fishy fussy cat mix
2 dog lou lou i was a cow bones
You can using update
您可以使用update
a_pd.set_index('name',inplace=True)
a_pd.update(f_pd.set_index('name'))
a_pd
Out[68]:
animal food
name
freddy dog dog mix
dexter cat fussy cat mix
lou lou dog bones
a_pd.reset_index()
Out[69]:
name animal food
0 freddy dog dog mix
1 dexter cat fussy cat mix
2 lou lou dog bones
Or we using map
或者我们使用map
a_pd.food=a_pd.name.map(f_pd.set_index('name').food)
a_pd
Out[74]:
animal food name
0 dog dog mix freddy
1 cat fussy cat mix dexter
2 dog bones lou lou
I'd either try drop
or just selecting columns you want to keep: 我要么尝试drop
要么只选择要保留的列:
animal_data.drop(['food_x', 'brand'], axis=1, inplace=True)
or 要么
animal_data = animal_data[['animal', 'name', 'food']]
It might be best to merge views of the dataframes that do not contain the columns you don't want in the merged dataframe. 最好合并不包含合并数据框中不需要的列的数据框的视图。 For example: 例如:
a_cols = ['animal', 'name']
f_cols = ['food', 'name']
a_pd[a_cols].merge(f_pd[f_cols], on='name', how='left')
This may be faster and may save you some memory if working with extremely large dataframes, as only the relevant columns are carried forward in the merge. 这可能更快,如果使用非常大的数据帧,可能会节省一些内存,因为只有相关的列在合并中结转。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.