简体   繁体   English

检查列的值是否在 pandas 中另一个 numpy 数组列的值中

[英]check if values of a column are in values of another numpy array column in pandas

I have a pandas dataframe我有一个 pandas dataframe

import pandas as pd
dt = pd.DataFrame({'id' : ['a', 'a', 'a', 'b', 'b'],
                   'col_a': [1,2,3,1,2],
                   'col_b': [2,2,[2,3],4,[2,3]]})

I would like to create a column which will assess if values col_a are in col_b .我想创建一个列来评估值col_a是否在col_b中。

The output dataframe should look like this: output dataframe 应如下所示:

dt = pd.DataFrame({'id' : ['a', 'a', 'a', 'b', 'b'],
                   'col_a': [1,2,3,1,2],
                   'col_b': [2,2,[2,3],4,[2,3]],
                   'exists': [0,1,1,0,1]})

How could I do that?我怎么能那样做?

You can use:您可以使用:

dt["exists"] = dt.col_a.isin(dt.col_b.explode()).astype(int)

explode the list-containing column and check if col_a isin it. col_a explodeisin Lastly cast to int .最后转换为int

to get要得到

>>> dt
  id  col_a   col_b  exists
0  a      1       2       0
1  a      2       2       1
2  a      3  [2, 3]       1
3  b      1       4       0
4  b      2  [2, 3]       1 

If row-by-row comparison is required, you can use:如果需要逐行比较,您可以使用:

dt["exists"] = dt.col_a.eq(dt.col_b.explode()).groupby(level=0).any().astype(int)

which checks eq uality by row and if any of the (grouped) explode d values gives True , we say it exists .它按行检查explode性,如果any (分组) eq d 值给出True ,我们说它exists

Solutions if need test values per rows (it means not each value of column cola_a by all values of col_b ):如果需要每行测试值的解决方案(这意味着cola_a的所有值不是col_b列的每个值):

You can use custom function with if-else statement:您可以将自定义 function 与if-else语句一起使用:

f = lambda x: x['col_a'] in x['col_b'] 
              if isinstance(x['col_b'], list) 
              else x['col_a']== x['col_b']
dt['e'] = dt.apply(f, axis=1).astype(int)
print (dt)
  id  col_a   col_b  exists  e
0  a      1       2       0  0
1  a      2       2       1  1
2  a      3  [2, 3]       1  1
3  b      1       4       0  0
4  b      2  [2, 3]       1  1

Or DataFrame.explode with compare both columns and then test it at least one True per index values:或者DataFrame.explode比较两列,然后每个索引值至少测试一个True

dt['e'] = dt.explode('col_b').eval('col_a == col_b').any(level=0).astype(int)
print (dt)
  id  col_a   col_b  exists  e
0  a      1       2       0  0
1  a      2       2       1  1
2  a      3  [2, 3]       1  1
3  b      1       4       0  0
4  b      2  [2, 3]       1  1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM