简体   繁体   English

比较两个数据框,找到共同的元素,如果不存在则填充列值

[英]Compare two data frames, find the common elements, and fill column value if not present

So I have two data frames, with some common keywords.所以我有两个数据框,有一些共同的关键字。

for example:例如:

df1 = {'keyword': ['Computer','Phone','Printer'],
       'Price1':   [1200,800,200],
       'category':['first','second','first']
       }


df2= {'keyword': ['Computer','Phone','Printer','chair'],
      'Price2': [1200,800,200,40]
      }

As you can see above, one df has a category feature, while the other doesn't.正如您在上面看到的,一个 df 具有类别特征,而另一个则没有。 So what I want to do is combine two dfs, keep the common items as it is, and if there are some keywords present in one df ('chair', in our case), and absent in another, add the values from df where that keyword exists,and fill that categorical feature (category) with a particular value with 'third' for example.所以我想做的是合并两个 df,保持公共项目不变,如果一个 df 中存在一些关键字(在我们的例子中是“椅子”),而另一个 df 中不存在,则添加 df 中的值,其中该关键字存在,并用例如“第三”的特定值填充该分类特征(类别)。

While not entirely clear, I think you want combine_first :虽然不完全清楚,但我认为您需要combine_first

df2.combine_first(df1)

NB.注意。 I transformed the dictionaries to dataframes first with dfX = pd.DataFrame(dfX)我首先使用dfX = pd.DataFrame(dfX)将字典转换为数据帧

output: output:

   Price1  Price2 category   keyword
0  1200.0    1200    first  Computer
1   800.0     800   second     Phone
2   200.0     200    first   Printer
3     NaN      40      NaN     chair

Alternatively, use merge :或者,使用merge

df1.merge(df2, on='keyword', how='outer')

output: output:

    keyword  Price1 category  Price2
0  Computer  1200.0    first    1200
1     Phone   800.0   second     800
2   Printer   200.0    first     200
3     chair     NaN      NaN      40

Building upon mozway's answer, if the prices of the items do not vary across the DataFrames you don't need to specify Price1 and Price2 in the column names.根据 mozway 的回答,如果项目的价格在 DataFrame 中没有变化,则不需要在列名称中指定 Price1 和 Price2。 Also, after joining the data, you can fill the remaining NAs in the Category column with any word you want with the fillna() .此外,加入数据后,您可以使用fillna()用您想要的任何词填充类别列中剩余的 NA。

Here is the streamlined code for you:这是为您简化的代码:

import pandas as pd


df1 = pd.DataFrame({'keyword': ['Computer','Phone','Printer'],
       'Price':   [1200,800,200],
       'category':['first','second','first']
       })


df2 = pd.DataFrame({'keyword': ['Computer','Phone','Printer','chair'],
      'Price': [1200,800,200,40]
      })

df_combined = df1.combine_first(df2)

# Arbitrarily sets the word for unknown categories
keyword = "third"

df_combined["category"].fillna(keyword, inplace=True)

And this is its output:这是它的 output:

    Price category   keyword
0  1200.0    first  Computer
1   800.0   second     Phone
2   200.0    first   Printer
3    40.0    third     chair

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM