比较两个列表并在结果中添加一个新列

Question

Comparing two lists and add a new column with findKB different比较两个列表并添加一个 findKB 不同的新列

df = pd.DataFrame({'A': [['10', '20', '30', '40'],['50', '60', '70', '80']], 
               'B': [['a', 'b'],['c','d']]})
findKBs = ['10','90']


                  A       B
0  [10, 20, 30, 40]  [a, b]
1  [50, 60, 70, 80]  [c, d]

This will be the desired behavior这将是期望的行为

                  A       B         C
0  [10, 20, 30, 40]  [a, b]      [90]
1  [50, 60, 70, 80]  [c, d]   [10,90]

Thanks in advance提前致谢

Answer 1

We can use np.isin我们可以使用np.isin

df['C'] = [find_kb[~np.isin(find_kb, a)] 
           for a, find_kb in zip(df['A'], np.array([findKBs] * len(df)))]
print(df) 
                  A       B         C
0  [10, 20, 30, 40]  [a, b]      [90]
1  [50, 60, 70, 80]  [c, d]  [10, 90]

Or we can use filter或者我们可以使用filter

df['C'] = [list(filter(lambda val: val not in a, find_kb))
           for a, find_kb in zip(df['A'],[findKBs] * len(df))]

#df['C'] = df['A'].map(lambda list_a: list(filter(lambda val: val not in list_a, 
#                                                 findKBs)
#                                         )
#                     )

filter is more difficult to read but more efficient: filter更难阅读但更有效：

%%timeit
df['C'] = [list(filter(lambda val: val not in a, find_kb))
           for a, find_kb in zip(df['A'],[findKBs] * len(df))]

194 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)



%%timeit
df['C'] = [find[~np.isin(find, a)] for a, find in zip(df['A'], np.array([findKBs] * len(df)))]
334 µs ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
df['C'] = df['A'].map(lambda x: np.setdiff1d(findKBs,x))
534 µs ± 17.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Answer 2

You can try this using np.setdiff1d here.您可以在此处使用np.setdiff1d尝试此操作。

df['C'] = df['A'].map(lambda x: np.setdiff1d(findKBs,x))

                  A       B         C
0  [10, 20, 30, 40]  [a, b]      [90]
1  [50, 60, 70, 80]  [c, d]  [10, 90]

To avoid lambda you can use functools.partial here.为避免 lambda 您可以在此处使用functools.partial 。

from functools import partial
diff = partial(np.setdiff1d, findKBs)

df['C'] = df['A'].map(diff)

Answer 3

sub from set子set

df['C']=(set(findKBs)-df.A.map(set)).map(list)
df
Out[253]: 
                  A       B         C
0  [10, 20, 30, 40]  [a, b]      [90]
1  [50, 60, 70, 80]  [c, d]  [10, 90]

比较两个列表并在结果中添加一个新列

问题描述

3 个解决方案

解决方案1
4 已采纳 2020-07-20 15:41:30

解决方案2
4 2020-07-20 15:49:03

解决方案3
2 2020-07-20 15:52:59

比较两个列表并在结果中添加一个新列

问题描述

3 个解决方案

解决方案1 4 已采纳 2020-07-20 15:41:30

解决方案2 4 2020-07-20 15:49:03

解决方案3 2 2020-07-20 15:52:59

解决方案1
4 已采纳 2020-07-20 15:41:30

解决方案2
4 2020-07-20 15:49:03

解决方案3
2 2020-07-20 15:52:59