简体   繁体   English

比较另一列 dataframe 中一列的值 dataframe

[英]Compare values of one column of dataframe in another dataframe

I have 2 dataframes.我有 2 个数据框。 df1 is df1 是

   DATE
2020-05-20
2020-05-21

and df2 is和 df2 是

ID    NAME    DATE
1     abc     2020-05-20
2     bcd     2020-05-20
3     ggg     2020-05-25
4     jhg     2020-05-26

I want to compare the values of df1 with df2, for eg: taking first value of df1 ie 2020-05-20 and find it in df2 and filter it and show output and subset the filtered rows.我想将 df1 的值与 df2 进行比较,例如:取 df1 的第一个值,即 2020-05-20 并在 df2 中找到它并过滤它并显示 output 并对过滤的行进行子集化。
My code is我的代码是

for index,row in df1.iterrows():
    x = row['DATE']
    if x == df2['DATE']:
        print('Found')
        new = df2[df2['DATE'] == x]
        print(new)
    else:
        print('Not Found')

But I am getting the following error:但我收到以下错误:

ValueError: The truth value of a series is ambigious. Use a.empty,a.bool(),a.item(),a.any()

x == df2['DATE'] is a pd.Series (of Booleans), not a single value. x == df2['DATE']是一个pd.Series (布尔值),而不是单个值。 You have to reduce that to a single Boolean value in order to evaluate that in a condition.您必须将其减少到单个 Boolean 值才能在条件下评估它。

You can either use .any() or .all() depeding on what you need.您可以根据需要使用.any()或 .all( .all() I assumed you need .any() here.我假设你需要.any()这里。

for index,row in df1.iterrows():
    x = row['DATE']
    if (x == df2['DATE']).any():
        print('Found')
        new = df2[df2['DATE'] == x]
        print(new)
    else:
        print('Not Found')

Also see here for a pure pandas solution for this.另请参阅此处了解纯 pandas 解决方案。

you can create one extra column in df1 and use np.where to fill it.您可以在 df1 中创建一个额外的列并使用 np.where 来填充它。

import numpy as np
df1['Match'] = np.where(df1.DATE.isin(df2.DATE),'Found', 'Not Found')

this can also be done as a merge which I think makes it a bit clearer as it's only one line with no branching.这也可以作为merge来完成,我认为这使它更清晰一些,因为它只有一条没有分支的行。 You can also add the validate parameter to make sure that each key is unique in either the left of right dataset,您还可以添加validate参数以确保每个键在右侧数据集的左侧都是唯一的,

import pandas

df1 = pandas.DataFrame(['2020-05-20', '2020-05-21'], columns=['DATE'])
df2 = pandas.DataFrame({'Name': ['abc', 'bcd', 'ggg', 'jgh'], 
                        'DATE': ['2020-05-20', '2020-05-20', '2020-05-25', '2020-05-26']})

df3 = df1.merge(right=df2, on='DATE', how='left')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将一个 dataframe 的两列与另一个 dataframe 的一列进行比较 - Compare two columns of one dataframe to one column of another dataframe 将一个 dataframe 中的一列与另一个单个值进行比较 - Compare a column in one dataframe with another single value 将 from.apply() 更改为 function,它使用列表理解将一个 dataframe 与一列列表与另一个 dataframe 中的值进行比较 - Change from .apply() to a function that uses list comprehension to compare one dataframe with a column of lists to values in another dataframe 比较 dataframe 的一列内的值(字符串) - compare values (strings) within one column of a dataframe 如何将pyspark数据帧列中的值与pyspark中的另一个数据帧进行比较 - How to compare values in a pyspark dataframe column with another dataframe in pyspark 将一个 dataframe 中的一列与另一个 dataframe pandas 中的许多列进行比较 - Compare a column in one dataframe with many columns in another dataframe pandas 需要将一个Pandas(Python)数据框与另一个数据框的值进行比较 - Need to compare one Pandas (Python) dataframe with values from another dataframe 根据另一个 dataframe 的列值打印一个 dataframe 的列值 - print column values of one dataframe based on the column values of another dataframe 根据另一个 dataframe 中的行查询一个 dataframe 行并比较值 - Query for one dataframe row based on row in another dataframe & compare values 如何将一个数据帧的列值附加到另一个数据帧的列 - How to append column values of one dataframe to column of another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM