[英]Compare values of one column of dataframe in another dataframe
I have 2 dataframes.我有 2 个数据框。 df1 is
df1 是
DATE
2020-05-20
2020-05-21
and df2 is和 df2 是
ID NAME DATE
1 abc 2020-05-20
2 bcd 2020-05-20
3 ggg 2020-05-25
4 jhg 2020-05-26
I want to compare the values of df1 with df2, for eg: taking first value of df1 ie 2020-05-20 and find it in df2 and filter it and show output and subset the filtered rows.我想将 df1 的值与 df2 进行比较,例如:取 df1 的第一个值,即 2020-05-20 并在 df2 中找到它并过滤它并显示 output 并对过滤的行进行子集化。
My code is我的代码是
for index,row in df1.iterrows():
x = row['DATE']
if x == df2['DATE']:
print('Found')
new = df2[df2['DATE'] == x]
print(new)
else:
print('Not Found')
But I am getting the following error:但我收到以下错误:
ValueError: The truth value of a series is ambigious. Use a.empty,a.bool(),a.item(),a.any()
x == df2['DATE']
is a pd.Series
(of Booleans), not a single value. x == df2['DATE']
是一个pd.Series
(布尔值),而不是单个值。 You have to reduce that to a single Boolean value in order to evaluate that in a condition.您必须将其减少到单个 Boolean 值才能在条件下评估它。
You can either use .any()
or .all()
depeding on what you need.您可以根据需要使用
.any()
或 .all( .all()
。 I assumed you need .any()
here.我假设你需要
.any()
这里。
for index,row in df1.iterrows():
x = row['DATE']
if (x == df2['DATE']).any():
print('Found')
new = df2[df2['DATE'] == x]
print(new)
else:
print('Not Found')
Also see here for a pure pandas solution for this.另请参阅此处了解纯 pandas 解决方案。
you can create one extra column in df1 and use np.where to fill it.您可以在 df1 中创建一个额外的列并使用 np.where 来填充它。
import numpy as np
df1['Match'] = np.where(df1.DATE.isin(df2.DATE),'Found', 'Not Found')
this can also be done as a merge
which I think makes it a bit clearer as it's only one line with no branching.这也可以作为
merge
来完成,我认为这使它更清晰一些,因为它只有一条没有分支的行。 You can also add the validate
parameter to make sure that each key is unique in either the left of right dataset,您还可以添加
validate
参数以确保每个键在右侧数据集的左侧都是唯一的,
import pandas
df1 = pandas.DataFrame(['2020-05-20', '2020-05-21'], columns=['DATE'])
df2 = pandas.DataFrame({'Name': ['abc', 'bcd', 'ggg', 'jgh'],
'DATE': ['2020-05-20', '2020-05-20', '2020-05-25', '2020-05-26']})
df3 = df1.merge(right=df2, on='DATE', how='left')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.