[英]Filter list of strings to not contain any of the string from another list as a substring
I have following code to select the values which are not contained in the another list. 我有以下代码来选择未包含在另一个列表中的值。
import re
isbn = ["1111","2222","3333","4444","5555"]
sku = ["k1 1111", "k2 2222", "k3 3333", "k4 4444", "k5 5555", "k6 6666", "k7 7777", "k8 8888" ,"k9 1111"]
for x in isbn:
for i in sku:
if x not in i:
print (i)
Expected outcome should be like this: 预期结果应如下:
k6 6666
k7 7777
k8 8888
But I get all unmatched values. 但我得到了所有无与伦比的价值观。 How can I get the expected outcome as I showed above.
如上所示,我怎样才能得到预期的结果。
You should be using any
within your loop. 你应该在你的循环中使用
any
。 Infact you may achieve it using below list comprehension as: 事实上,你可以使用下面的列表理解来实现它:
>>> list_1 = ["1111","2222","3333","4444","5555"]
>>> list_2 = ["k1 1111", "k2 2222", "k3 3333", "k4 4444", "k5 5555", "k6 6666", "k7 7777", "k8 8888" ,"k9 1111"]
>>> [x for x in list_2 if not any( y in x for y in list_1)]
['k6 6666', 'k7 7777', 'k8 8888']
Here any
will return True
if any of string in list_1
is present as substring in list2
. 如果
list_1
任何字符串作为list2
子字符串存在,则any
将返回True
。 As soon as it finds the match, it will short-circuit the iteration (without checking for other matches) and will return the result as True
. 一旦找到匹配,它将使迭代短路(不检查其他匹配)并将结果返回
True
。
In case if you are not interested in using any
, you may get the same result with the below for
loop as: 如果您对使用
any
不感兴趣,可以使用以下for
循环获得相同的结果:
for x in list_2:
for y in list_1:
if y in x:
break
else:
print(x)
which will print your desired output: 这将打印您想要的输出:
k6 6666
k7 7777
k8 8888
You would need to test all values in isbn
before you can conclude none of those match. 您需要先测试
isbn
所有值,然后才能得出这些值中的所有值。
Rather than loop over isbn
first, loop over sku
and test that value with each of the isbn
values; 而不是首先遍历
isbn
,循环遍历sku
并使用每个isbn
值测试该值; the any()
function makes that easier and more efficient: any()
函数使得更容易和更有效:
for value in sku:
if not any(i in value for i in isbn):
print(value)
More efficient still would be to split out the ISBN portion, and test against a set: 更高效的仍然是拆分 ISBN部分,并测试一组:
isbn_set = set(isbn)
for value in sku:
isbn_part = value.partition(' ')[-1] # everything after the first space
if isbn_part not in isbn_set:
print(value)
This avoids looping over isbn
altogther; 这避免了在
isbn
altogther上的循环; set membership testing takes O(1) constant time; 集合成员测试需要O(1)恒定时间; for N skus and M ISBN values, this makes a O(N) loop (vs O(NM) loop with
any()
). 对于N skus和M ISBN值,这使得O(N)循环(对O(NM)循环与
any()
)。
Either version can be converted to a list comprehension to produce a list of matches; 可以将任一版本转换为列表解析以生成匹配列表; the preferred set version then becomes:
然后首选的设置版本变为:
isbn_set = set(isbn)
not_matched = [value for value in sku if value.partition(' ')[-1] not in isbn_set]
Demo of the latter: 演示后者:
>>> isbn = ["1111","2222","3333","4444","5555"]
>>> sku = ["k1 1111", "k2 2222", "k3 3333", "k4 4444", "k5 5555", "k6 6666", "k7 7777", "k8 8888" ,"k9 1111"]
>>> isbn_set = set(isbn)
>>> [value for value in sku if value.partition(' ')[-1] not in isbn_set]
['k6 6666', 'k7 7777', 'k8 8888']
If you remove matches from a set, then the left over set is what you are after: 如果你从一个集合中删除匹配,那么左边的集合就是你所追求的:
skus = set(sku)
for x in isbn:
skus -= {i for i in skus if x in i}
isbn = ["1111", "2222", "3333", "4444", "5555"]
sku = ["k1 1111", "k2 2222", "k3 3333", "k4 4444", "k5 5555", "k6 6666",
"k7 7777", "k8 8888", "k9 1111"]
skus = set(sku)
for x in isbn:
skus -= {i for i in skus if x in i}
print(skus)
{'k6 6666', 'k7 7777', 'k8 8888'}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.