检查列的值是否在 pandas 中另一个 numpy 数组列的值中

Question

I have a pandas dataframe我有一个 pandas dataframe

import pandas as pd
dt = pd.DataFrame({'id' : ['a', 'a', 'a', 'b', 'b'],
                   'col_a': [1,2,3,1,2],
                   'col_b': [2,2,[2,3],4,[2,3]]})

I would like to create a column which will assess if values col_a are in col_b .我想创建一个列来评估值col_a是否在col_b中。

The output dataframe should look like this: output dataframe 应如下所示：

dt = pd.DataFrame({'id' : ['a', 'a', 'a', 'b', 'b'],
                   'col_a': [1,2,3,1,2],
                   'col_b': [2,2,[2,3],4,[2,3]],
                   'exists': [0,1,1,0,1]})

How could I do that?我怎么能那样做？

Answer 1

You can use:您可以使用：

dt["exists"] = dt.col_a.isin(dt.col_b.explode()).astype(int)

explode the list-containing column and check if col_a isin it. col_a explode在isin 。 Lastly cast to int .最后转换为int 。

to get要得到

>>> dt
  id  col_a   col_b  exists
0  a      1       2       0
1  a      2       2       1
2  a      3  [2, 3]       1
3  b      1       4       0
4  b      2  [2, 3]       1

If row-by-row comparison is required, you can use:如果需要逐行比较，您可以使用：

dt["exists"] = dt.col_a.eq(dt.col_b.explode()).groupby(level=0).any().astype(int)

which checks eq uality by row and if any of the (grouped) explode d values gives True , we say it exists .它按行检查explode性，如果any （分组） eq d 值给出True ，我们说它exists 。

Answer 2

Solutions if need test values per rows (it means not each value of column cola_a by all values of col_b ):如果需要每行测试值的解决方案（这意味着cola_a的所有值不是col_b列的每个值）：

You can use custom function with if-else statement:您可以将自定义 function 与if-else语句一起使用：

f = lambda x: x['col_a'] in x['col_b'] 
              if isinstance(x['col_b'], list) 
              else x['col_a']== x['col_b']
dt['e'] = dt.apply(f, axis=1).astype(int)
print (dt)
  id  col_a   col_b  exists  e
0  a      1       2       0  0
1  a      2       2       1  1
2  a      3  [2, 3]       1  1
3  b      1       4       0  0
4  b      2  [2, 3]       1  1

Or DataFrame.explode with compare both columns and then test it at least one True per index values:或者DataFrame.explode比较两列，然后每个索引值至少测试一个True ：

dt['e'] = dt.explode('col_b').eval('col_a == col_b').any(level=0).astype(int)
print (dt)
  id  col_a   col_b  exists  e
0  a      1       2       0  0
1  a      2       2       1  1
2  a      3  [2, 3]       1  1
3  b      1       4       0  0
4  b      2  [2, 3]       1  1

检查列的值是否在 pandas 中另一个 numpy 数组列的值中

问题描述

2 个解决方案

解决方案1
4 2021-04-19 08:18:47

解决方案2
1 已采纳 2021-04-19 08:17:32

检查列的值是否在 pandas 中另一个 numpy 数组列的值中

问题描述

2 个解决方案

解决方案1 4 2021-04-19 08:18:47

解决方案2 1 已采纳 2021-04-19 08:17:32

解决方案1
4 2021-04-19 08:18:47

解决方案2
1 已采纳 2021-04-19 08:17:32