[英]Compare two data frames, find the common elements, and fill column value if not present
So I have two data frames, with some common keywords.所以我有两个数据框,有一些共同的关键字。
for example:例如:
df1 = {'keyword': ['Computer','Phone','Printer'],
'Price1': [1200,800,200],
'category':['first','second','first']
}
df2= {'keyword': ['Computer','Phone','Printer','chair'],
'Price2': [1200,800,200,40]
}
As you can see above, one df has a category feature, while the other doesn't.正如您在上面看到的,一个 df 具有类别特征,而另一个则没有。 So what I want to do is combine two dfs, keep the common items as it is, and if there are some keywords present in one df ('chair', in our case), and absent in another, add the values from df where that keyword exists,and fill that categorical feature (category) with a particular value with 'third' for example.所以我想做的是合并两个 df,保持公共项目不变,如果一个 df 中存在一些关键字(在我们的例子中是“椅子”),而另一个 df 中不存在,则添加 df 中的值,其中该关键字存在,并用例如“第三”的特定值填充该分类特征(类别)。
While not entirely clear, I think you want combine_first
:虽然不完全清楚,但我认为您需要combine_first
:
df2.combine_first(df1)
NB.注意。 I transformed the dictionaries to dataframes first with dfX = pd.DataFrame(dfX)
我首先使用dfX = pd.DataFrame(dfX)
将字典转换为数据帧
output: output:
Price1 Price2 category keyword
0 1200.0 1200 first Computer
1 800.0 800 second Phone
2 200.0 200 first Printer
3 NaN 40 NaN chair
Alternatively, use merge
:或者,使用merge
:
df1.merge(df2, on='keyword', how='outer')
output: output:
keyword Price1 category Price2
0 Computer 1200.0 first 1200
1 Phone 800.0 second 800
2 Printer 200.0 first 200
3 chair NaN NaN 40
Building upon mozway's answer, if the prices of the items do not vary across the DataFrames you don't need to specify Price1 and Price2 in the column names.根据 mozway 的回答,如果项目的价格在 DataFrame 中没有变化,则不需要在列名称中指定 Price1 和 Price2。 Also, after joining the data, you can fill the remaining NAs in the Category column with any word you want with the fillna()
.此外,加入数据后,您可以使用fillna()
用您想要的任何词填充类别列中剩余的 NA。
Here is the streamlined code for you:这是为您简化的代码:
import pandas as pd
df1 = pd.DataFrame({'keyword': ['Computer','Phone','Printer'],
'Price': [1200,800,200],
'category':['first','second','first']
})
df2 = pd.DataFrame({'keyword': ['Computer','Phone','Printer','chair'],
'Price': [1200,800,200,40]
})
df_combined = df1.combine_first(df2)
# Arbitrarily sets the word for unknown categories
keyword = "third"
df_combined["category"].fillna(keyword, inplace=True)
And this is its output:这是它的 output:
Price category keyword
0 1200.0 first Computer
1 800.0 second Phone
2 200.0 first Printer
3 40.0 third chair
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.