对于Pandas数据框中的每一行，确定另一列中是否存在一列值

Question

I have a pandas data frame like this: 我有一个这样的熊猫数据框：

df = pd.DataFrame({'category' : ['A', 'B', 'C', 'A'], 'category_pred' : [['A'], ['B','D'], ['A','B','C'], ['D']]})
print(df)

  category category_pred
0        A           [A]
1        B        [B, D]
2        C     [A, B, C]
3        A           [D]

I would like to have an output like this: 我想要这样的输出：

  category category_pred  count
0        A           [A]      1
1        B        [B, D]      1
2        C     [A, B, C]      1
3        A           [D]      0

That is, for every row, determine if the value in 'category' appears in 'category_pred'. 也就是说，对于每一行，确定“ category”中的值是否出现在“ category_pred”中。 Note that 'category_pred' can contain multiple values. 请注意，“ category_pred”可以包含多个值。

I can do a for-loop like this one, but it is really slow. 我可以像这样做一个for循环，但这确实很慢。

for i in df.index:
    if df.category[i] in df.category_pred[i]:
        df['count'][i] = 1

I am looking for an efficient way to do this operation. 我正在寻找一种有效的方法来执行此操作。 Thanks! 谢谢！

Answer 1

You can make use of the DataFrame's apply method. 您可以使用DataFrame的apply方法。

df['count'] = df.apply(lambda x: 1 if x.category in x.category_pred else 0, axis = 1)

This will add the new column as you want 这将根据需要添加新列

对于Pandas数据框中的每一行，确定另一列中是否存在一列值

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-09-16 20:47:46

对于Pandas数据框中的每一行，确定另一列中是否存在一列值

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-09-16 20:47:46

解决方案1
1 已采纳 2015-09-16 20:47:46