Python：有效地检查列表中的值是否在另一个列表中

Question

我有一个数据帧user_df ，行数约为500,000，格式如下：

|  id  |  other_ids   |
|------|--------------|
|  1   |['abc', efg'] |
|  2   |['bbb']       |
|  3   |['ccc', 'ddd']|

我还有一个列表， other_ids_that_clicked ，包含大约5000个其他ID的项目：

 ['abc', 'efg', 'ccc']

我期待重复数据删除使用user_df通过在DF增加另一列，用于当在other_ids值在user_df [“other_ids”]作为这样other_ids_that_clicked：

|  id  |  other_ids   |  clicked  |
|------|--------------|-----------|
|  1   |['abc', efg'] |     1     |
|  2   |['bbb']       |     0     |
|  3   |['ccc', 'ddd']|     1     |

我正在检查的方法是通过循环遍历user_df中每行的other_ids_that_clicked 。

def otheridInList(row):
  isin = False
  for other_id in other_ids_that_clicked:
    if other_id in row['other_ids']:
        isin = True
        break
    else: 
        isin = False
  if isin:
    return 1
  else:
    return 0

这是永远的，所以我一直在寻找有关最佳方法的建议。

谢谢！

Answer 1

你实际上可以加快这一点。 取出该列，将其转换为自己的数据帧，并使用df.isin进行一些检查 -

l = ['abc', 'efg', 'ccc']
df['clicked'] = pd.DataFrame(df.other_ids.tolist()).isin(l).any(1).astype(int)

   id   other_ids  clicked
0   1  [abc, efg]        1
1   2       [bbb]        0
2   3  [ccc, ddd]        1

细节

首先，将other_ids转换为列表列表 -

i = df.other_ids.tolist()

i
[['abc', 'efg'], ['bbb'], ['ccc', 'ddd']]

现在，将其加载到新的数据框中 -

j = pd.DataFrame(i)

j
     0     1
0  abc   efg
1  bbb  None
2  ccc   ddd

用isin执行检查 -

k = j.isin(l)

k
       0      1
0   True   True
1  False  False
2   True  False

clicked可以通过使用df.any检查任何行中是否存在True来计算。 结果将转换为整数。

k.any(1).astype(int)

0    1
1    0
2    1
dtype: int64

Answer 2

使用set

df['New']=(df.other_ids.apply(set)!=(df.other_ids.apply(set)-set(l))).astype(int)
df
Out[114]: 
   id   other_ids  New
0   1  [abc, efg]    1
1   2       [bbb]    0
2   3  [ccc, ddd]    1

Python：有效地检查列表中的值是否在另一个列表中

问题描述

2 个解决方案

解决方案1
5 已采纳 2017-12-18 18:52:13

解决方案2
3 2017-12-18 19:03:20

Python：有效地检查列表中的值是否在另一个列表中

问题描述

2 个解决方案

解决方案1 5 已采纳 2017-12-18 18:52:13

解决方案2 3 2017-12-18 19:03:20

解决方案1
5 已采纳 2017-12-18 18:52:13

解决方案2
3 2017-12-18 19:03:20