简体   繁体   English

过滤字符串列表,使其不包含来自另一个列表的任何字符串作为子字符串

[英]Filter list of strings to not contain any of the string from another list as a substring

I have following code to select the values which are not contained in the another list. 我有以下代码来选择未包含在另一个列表中的值。

import re
isbn  = ["1111","2222","3333","4444","5555"]
sku = ["k1 1111", "k2 2222", "k3 3333", "k4 4444", "k5 5555", "k6 6666", "k7 7777", "k8 8888" ,"k9 1111"]

for x in isbn:
    for i in sku:
        if x not in i:
            print (i)

Expected outcome should be like this: 预期结果应如下:

k6 6666
k7 7777
k8 8888

But I get all unmatched values. 但我得到了所有无与伦比的价值观。 How can I get the expected outcome as I showed above. 如上所示,我怎样才能得到预期的结果。

You should be using any within your loop. 你应该在你的循环中使用any Infact you may achieve it using below list comprehension as: 事实上,你可以使用下面的列表理解来实现它:

>>> list_1  = ["1111","2222","3333","4444","5555"]
>>> list_2 = ["k1 1111", "k2 2222", "k3 3333", "k4 4444", "k5 5555", "k6 6666", "k7 7777", "k8 8888" ,"k9 1111"]

>>> [x for x in list_2 if not any( y in x for y in list_1)]
['k6 6666', 'k7 7777', 'k8 8888']

Here any will return True if any of string in list_1 is present as substring in list2 . 如果list_1任何字符串作为list2子字符串存在,则any将返回True As soon as it finds the match, it will short-circuit the iteration (without checking for other matches) and will return the result as True . 一旦找到匹配,它将使迭代短路(不检查其他匹配)并将结果返回True

In case if you are not interested in using any , you may get the same result with the below for loop as: 如果您对使用any不感兴趣,可以使用以下for循环获得相同的结果:

for x in list_2:
    for y in list_1:
        if y in x:
            break
    else:
        print(x)

which will print your desired output: 这将打印您想要的输出:

k6 6666
k7 7777
k8 8888

You would need to test all values in isbn before you can conclude none of those match. 您需要先测试isbn 所有值,然后才能得出这些值中的所有值。

Rather than loop over isbn first, loop over sku and test that value with each of the isbn values; 而不是首先遍历isbn ,循环遍历sku并使用每个isbn值测试该值; the any() function makes that easier and more efficient: any()函数使得更容易和更有效:

for value in sku:
    if not any(i in value for i in isbn):
        print(value)

More efficient still would be to split out the ISBN portion, and test against a set: 更高效的仍然是拆分 ISBN部分,并测试一组:

isbn_set = set(isbn)
for value in sku:
    isbn_part = value.partition(' ')[-1]  # everything after the first space
    if isbn_part not in isbn_set:
        print(value)

This avoids looping over isbn altogther; 这避免了在isbn altogther上的循环; set membership testing takes O(1) constant time; 集合成员测试需要O(1)恒定时间; for N skus and M ISBN values, this makes a O(N) loop (vs O(NM) loop with any() ). 对于N skus和M ISBN值,这使得O(N)循环(对O(NM)循环与any() )。

Either version can be converted to a list comprehension to produce a list of matches; 可以将任一版本转换为列表解析以生成匹配列表; the preferred set version then becomes: 然后首选的设置版本变为:

isbn_set = set(isbn)
not_matched = [value for value in sku if value.partition(' ')[-1] not in isbn_set]

Demo of the latter: 演示后者:

>>> isbn  = ["1111","2222","3333","4444","5555"]
>>> sku = ["k1 1111", "k2 2222", "k3 3333", "k4 4444", "k5 5555", "k6 6666", "k7 7777", "k8 8888" ,"k9 1111"]
>>> isbn_set = set(isbn)
>>> [value for value in sku if value.partition(' ')[-1] not in isbn_set]
['k6 6666', 'k7 7777', 'k8 8888']

If you remove matches from a set, then the left over set is what you are after: 如果你从一个集合中删除匹配,那么左边的集合就是你所追求的:

Code: 码:

skus = set(sku)
for x in isbn:
    skus -= {i for i in skus if x in i}

Test Code: 测试代码:

isbn = ["1111", "2222", "3333", "4444", "5555"]
sku = ["k1 1111", "k2 2222", "k3 3333", "k4 4444", "k5 5555", "k6 6666",
       "k7 7777", "k8 8888", "k9 1111"]

skus = set(sku)
for x in isbn:
    skus -= {i for i in skus if x in i}
print(skus)

Results: 结果:

{'k6 6666', 'k7 7777', 'k8 8888'}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有没有办法检查列表中的字符串是否包含来自另一个列表的任何子字符串? - Is there a way to check if strings in a list contain any substrings from another list? 如何检查字符串列表中的任何字符串是否是字符串的 substring? - How to check if any string from a list of strings is a substring of a string? Python - 从作为另一个元素的子字符串的字符串列表中删除任何元素 - Python - Remove any element from a list of strings that is a substring of another element 如何根据列表元素是否包含 Python 中另一个列表中的 substring 来过滤掉列表元素 - How to filter out list elements based on if they contain a substring from another list in Python 如果string不包含python中的任何字符串列表 - If string does not contain any of list of strings in python 两个字符串列表:从列表A中删除包含列表B中任何字符串的字符串? - Two Lists of strings: remove strings from list A that contain any string from list B? 从另一个列表中搜索任何子字符串的字符串列表 - Search a list of strings for any sub-string from another list Pandas:从列表中选择包含任何子字符串的行 - Pandas: Select rows that contain any substring from a list Python - 如果任何 substring 存在于另一个列表中,则返回字符串列表 - Python - Return list of string if any substring is present in another list 匹配字符串列表中 substring 列表中的 substring - Matching a substring from a substring list in a list of strings
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM