[英]For every row in Pandas dataframe determine if a column value exists in another column
I have a pandas data frame like this: 我有一个这样的熊猫数据框:
df = pd.DataFrame({'category' : ['A', 'B', 'C', 'A'], 'category_pred' : [['A'], ['B','D'], ['A','B','C'], ['D']]})
print(df)
category category_pred
0 A [A]
1 B [B, D]
2 C [A, B, C]
3 A [D]
I would like to have an output like this: 我想要这样的输出:
category category_pred count
0 A [A] 1
1 B [B, D] 1
2 C [A, B, C] 1
3 A [D] 0
That is, for every row, determine if the value in 'category' appears in 'category_pred'. 也就是说,对于每一行,确定“ category”中的值是否出现在“ category_pred”中。 Note that 'category_pred' can contain multiple values. 请注意,“ category_pred”可以包含多个值。
I can do a for-loop like this one, but it is really slow. 我可以像这样做一个for循环,但这确实很慢。
for i in df.index:
if df.category[i] in df.category_pred[i]:
df['count'][i] = 1
I am looking for an efficient way to do this operation. 我正在寻找一种有效的方法来执行此操作。 Thanks! 谢谢!
You can make use of the DataFrame's apply
method. 您可以使用DataFrame的apply
方法。
df['count'] = df.apply(lambda x: 1 if x.category in x.category_pred else 0, axis = 1)
This will add the new column as you want 这将根据需要添加新列
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.