简体   繁体   English

如果使用熊猫在另一个数据帧中不存在列值,如何将它们从一个数据帧合并到另一个数据帧

[英]How do I merge column values from one dataframe to another if they are not present in another using pandas

I have two different excel files which I read using pd.readExcel .我有两个不同的 Excel 文件,我使用pd.readExcel读取它们。 The first excel file is kind of a master file which has a lot of columns.第一个 excel 文件是一种主文件,它有很多列。 showing only those columns which are relevant: df1仅显示那些相关的列:df1

Company Name                                              Excel Company ID
0                                    cleverbridge AG      IQ109133656
1  BT España, Compañía de Servicios Globales de T...        IQ3806173
2                                   Technoserv Group       IQ40333012
3                                    Blue Media S.A.       IQ50008102
4            zeb.rolfes.schierenbeck.associates gmbh       IQ30413992

and the second excel is basically an output excel file which looks like this: df2第二个 excel 基本上是一个输出 excel 文件,如下所示:df2

company_id          found_keywords  no_of_url                                       company_name
0  IQ137156215      insurance         15                         Zühlke Technology Group AG
1    IQ3806173      insurance         15  BT España, Compañía de Servicios Globales de T...
2   IQ40333012      insurance          4                                   Technoserv Group
3   IQ51614192      insurance         15                             Octo Telematics S.p.A.

I want this output excel file/ df2 to include those company_id and company name from df1 where company id and company name from df1 is not a part of df2.我希望这个输出 excel 文件/df2 包含来自 df1 的那些 company_id 和公司名称,其中来自 df1 的公司 ID 和公司名称不是 df2 的一部分。 Something like this: df2像这样的东西:df2

company_id found_keywords  no_of_url                                       company_name
0  IQ137156215      insurance         15                         Zühlke Technology Group AG
1    IQ3806173      insurance         15  BT España, Compañía de Servicios Globales de T...
2   IQ40333012      insurance          4                                   Technoserv Group
3   IQ51614192      insurance         15                             Octo Telematics S.p.A.
4   IQ30413992      NaN               NaN              zeb.rolfes.schierenbeck.associates gmbh          

I tried several ways of achieveing this by using pd.merge as well as np.where I even tried reindexing based on columns but nothing worked out.我尝试了几种通过使用pd.mergenp.where来实现此目的的方法,我什至尝试了基于列的重新索引,但没有任何结果。 What exactly do I need to do so that it works as expected.我到底需要做什么才能按预期工作。 Please help me out.Thanks!请帮帮我。谢谢!

EDIT :编辑

using pd.merge使用 pd.merge

df2.merge(df, right_on='company_id', left_on='Excel Company ID', how='outer')

which gave an output with [220 rows X 31 columns]它给出了 [220 行 X 31 列] 的输出

Your expected output is unclear.您的预期输出不清楚。 If you use pd.merge with how='outer' and indicator=True , you will have:如果您将pd.mergehow='outer'indicator=True一起使用,您将拥有:

df1 = df1.rename(columns={'Company Name': 'company_name', 'Excel Company ID': 'company_id'})
out = df2.merge(df1, on=['company_id', 'company_name'], how='outer', indicator=True)

Output:输出:

>>> out
    company_id found_keywords  no_of_url                                       company_name      _merge
0  IQ137156215      insurance       15.0                         Zühlke Technology Group AG   left_only
1    IQ3806173      insurance       15.0  BT España, Compañía de Servicios Globales de T...        both
2   IQ40333012      insurance        4.0                                   Technoserv Group        both
3   IQ51614192      insurance       15.0                             Octo Telematics S.p.A.   left_only
4  IQ109133656            NaN        NaN                                    cleverbridge AG  right_only
5   IQ50008102            NaN        NaN                                    Blue Media S.A.  right_only
6   IQ30413992            NaN        NaN            zeb.rolfes.schierenbeck.associates gmbh  right_only

Check the last column _merge .检查最后一列_merge If you have right_only , it means the company_id and company_name are not found in df2 .如果您有right_only ,则表示在df2中找不到company_idcompany_name

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何通过在 Python DataFrame 中保持某些列值不变来将数据从一行合并到另一行 - How do I merge data from one row to another by keeping some column values unchanged in Python DataFrame 如何使用一列或另一列对 Pandas DataFrame 进行分组 - How do I group a pandas DataFrame using one column or another 如何使用另一列中的一个键将pandas df与多列合并? - How do I merge pandas df with multiple columns using one key from another column? 如何将列值从一个 dataframe 提取到另一个? - How do I extract column values from one dataframe to another? 如何从一个数据框中的列中提取特定值并将它们附加到另一个数据框中的列中? - 熊猫 - How do you extract specific values from a column in one dataframe and append them to a column in another dataframe? - Pandas 如何删除 Pandas 中另一列 B 中存在的 A 列中的常见元素? - How do I delete common elements from one column A that are present in another column B in Pandas? 如何根据 pandas dataframe 中另一列的多个值在一列中创建值列表? - How do I create a list of values in a column from several values from another column in a pandas dataframe? 如何将一个现有列的值添加/合并到另一列 - Python - Pandas - Jupyter Notebook - How can I add/merge values from one existing column to another column - Python - Pandas - Jupyter Notebook 如何使用熊猫中另一个数据框的值更新一个数据框 - How to update one dataframe using values from another dataframe in pandas 如何在DataFrame中使用另一列中的值减去一列中的值? - How do I Substract values in one column with the values of another in a DataFrame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM