[英]Remove elements from a list that occur in another list and return their indices
此list
,例如:
my_list = ['a', 'd', 'a', 'd', 'c','e']
words_2_remove = ['a', 'c']
输出应为:
my_list = ['d', 'd', 'e']
loc = [0, 2, 4]
我目前正在使用此:
loc = []
for word in my_list:
if word in words_2_remove:
loc.append( my_list.index(word) )
my_list.remove(word)
有更好的选择吗?
做两个列表理解:
my_list =['a', 'd', 'a', 'd', 'c','e']
words_2_remove = ['a', 'c']
loc = [i for i, x in enumerate(my_list) if x in words_2_remove]
my_list = [x for x in my_list if x not in words_2_remove]
print(my_list) # ['d', 'd', 'e']
print(loc) # [0, 2, 4]
对于更大的数组,使用NumPy会更有效:
import numpy as np
my_list = np.array(['a', 'd', 'a', 'd', 'c','e'])
words_2_remove = np.array(['a', 'c'])
mask = np.isin(my_list, words_2_remove, invert=True)
# mask will be [False True False True False True]
loc = np.where(~mask)[0]
print(loc)
>>> [0 2 4]
print(my_list[mask])
>>> ['d' 'd' 'e']
获得loc
索引的补码也很容易:
print(np.where(mask)[0])
>>> [1 3 5]
时序:
与@Austin中的列表推导版本进行比较。
对于原始数组:
my_list = np.array(['a', 'd', 'a', 'd', 'c','e'])
words_2_remove = np.array(['a', 'c'])
%%timeit
mask = np.isin(my_list, words_2_remove, invert=True)
loc = np.where(~mask)[0]
>>> 11 µs ± 53.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
my_list =['a', 'd', 'a', 'd', 'c','e']
words_2_remove = ['a', 'c']
%%timeit
loc = [i for i, x in enumerate(my_list) if x in words_2_remove]
res = [x for x in my_list if x not in words_2_remove]
>>> 1.31 µs ± 7.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
对于大型数组:
n = 10 ** 3
my_list = np.array(['a', 'd', 'a', 'd', 'c','e'] * n)
words_2_remove = np.array(['a', 'c'])
%%timeit
mask = np.isin(my_list, words_2_remove, invert=True)
loc = np.where(~mask)[0]
>>> 114 µs ± 906 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
my_list =['a', 'd', 'a', 'd', 'c','e'] * n
words_2_remove = ['a', 'c']
%%timeit
loc = [i for i, x in enumerate(my_list) if x in words_2_remove]
res = [x for x in my_list if x not in words_2_remove]
>>> 841 µs ± 677 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
根据使用情况,您可以选择更合适的产品。
进一步阅读:
np.isin
上的np.isin
: https : np.isin
将布尔值掩码数组转换为索引: 如何在numpy中将布尔值数组转换为索引数组
np.where
上的np.where
: https : np.where
有关使用NumPy进行索引的更多信息: https : //docs.scipy.org/doc/numpy-1.15.1/reference/arrays.indexing.html
使用列表理解和枚举
loc = [idx for idx, item in enumerate(my_list) if item in words_2_remove]
my_list = [i for i in my_list if i not in words_2_remove]
或使用过滤器 :
my_list = list(filter(lambda x: x not in words_2_remove, my_list))
扩展说明:
loc = []
new_my_list = []
for idx, item in enumerate(my_list):
if item in words_2_remove:
loc.append(idx)
else:
new_my_list.append(item)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.