简体   繁体   English

Python检查列表项是否包含任何其他列表项

[英]Python Check if list item does (not) contain any of other list items

I have this problem where I want to remove a list element if it contains 'illegal' characters. 我有这个问题,我想删除列表元素,如果它包含'非法'字符。 The legal characters are specified in multiple lists. 合法字符在多个列表中指定。 They are formed like this, where alpha stands for the alphabet (az + AZ), digit stands for digits (0-9) and punct stands for punctuation (sort of). 它们是这样形成的,其中alpha代表字母(az + AZ), digit代表数字(0-9), punct代表标点符号(排序)。

alpha = list(string.ascii_letters)
digit = list(string.digits)
punct = list(string.punctuation)

This way I can specify something as an illegal character if it doesn't appear in one of these lists. 这样我可以将某些内容指定为非法字符,如果它没有出现在其中一个列表中。

After that I have a list containing elements: 之后我有一个包含元素的列表:

Input = ["Amuu2", "Q1BFt", "dUM€n", "o°8o1G", "mgF)`", "ZR°p", "Y9^^M", "W0PD7"]

I want to filter out the elements containing illegal characters. 我想过滤掉包含非法字符的元素。 So this is the result I want to get (doesn't need to be ordered): 所以这是我想得到的结果(不需要订购):

var = ["Amuu2", "Q1BFt", "mgF)`", "Y9^^M", "W0PD7"]

EDIT: 编辑:

I have tried (and all variants of it): 我试过(以及它的所有变体):

for InItem in Input:
    if any(AlItem in InItem for AlItem in alpha+digit+punct):
        FilInput.append(InItem)

where a new list is created with only the filtered elements, but the problem here is that the elements get added when the contain at least one legal character. 其中仅使用过滤后的元素创建新列表,但问题是当包含至少一个合法字符时添加元素。 For example: "ZR°p" got added, because it contains a Z, R and a p. 例如:添加了"ZR°p" ,因为它包含Z,R和a p。

I also tried: 我也尝试过:

for InItem in Input:
    if not any(AlItem in InItem for AlItem in alpha+digit+punct):

but after that, I couldn't figure out how to remove the element. 但在那之后,我无法弄清楚如何删除元素。 Oh, and a little tip, to make it extra difficult, it would be nice if it were a little bit fast because it needs to be done millions of times. 哦,还有一点小小的提示,为了让它变得更加困难,如果它有点快,那将会很好,因为它需要做数百万次。 But it needs to be working first. 但它需要首先工作。

Define a set of legal characters. 定义一组合法字符。 Then apply a list comprehension. 然后应用列表理解。

>>> allowed = set(string.ascii_letters + string.digits + string.punctuation)
>>> inp = ["Amuu2", "Q1BFt", "dUM€n", "o°8o1G", "mgF)`", "ZR°p", "Y9^^M", "W0PD7"]
>>> [x for x in inp if all(c in allowed for c in x)]
['Amuu2', 'Q1BFt', 'mgF)`', 'Y9^^M', 'W0PD7']

You can use a list comprehension and check with all if all characters match your criteria: 您可以使用列表理解与检查all ,如果所有字符匹配您的标准:

>>> [element for element in Input if all(c in alpha + digit + punct for c in element)]
['Amuu2', 'Q1BFt', 'mgF)`', 'Y9^^M', 'W0PD7']

Your code 你的代码

As you mentioned, you append words as soon as any character is a correct one. 正如您所提到的,只要any字符是正确的,您就会附加单词。 You need to check that they are all correct: 你需要检查它们是否all正确:

filtered_words = []
for word in words:
    if all(char in alpha+digit+punct for char in word):
        filtered_words.append(word)

print(filtered_words)
# ['Amuu2', 'Q1BFt', 'mgF)`', 'Y9^^M', 'W0PD7']

You could also check that there's not a single character which isn't correct: 您还可以检查是否有一个不正确的字符:

filtered_words = []
for word in words:
    if not any(char not in alpha+digit+punct for char in word):
        filtered_words.append(word)

print(filtered_words)

It's much less readable though. 但它的可读性要低得多。

For efficiency, you shouldn't concatenate lists during each iteration with alpha+digit+punct . 为了提高效率,您不应该在每次迭代期间使用alpha+digit+punct连接列表。 You should do it once and for all, before any loop. 你应该在任何循环之前一劳永逸地做到这一点。 It's also a good idea to create a set out of those lists, because char in set is much faster than char in list when there are many allowed characters. 创建这些列表的集合也是一个好主意,因为当有许多允许的字符时char in set中的char in listchar in list快得多。

Finally, you could use a list comprehension to avoid the for loop. 最后,您可以使用列表推导来避免for循环。 If you do all this, you end up with @timgeb's solution :) 如果你做了这一切,你最终会得到@ timgeb的解决方案 :)

Alternative with regex 替代正则表达式

You can create a regex pattern from your lists and see which words match: 您可以从列表中创建正则表达式模式,并查看哪些单词匹配:

# encoding: utf-8
import string
import re

alpha = list(string.ascii_letters)
digit = list(string.digits)
punct = list(string.punctuation)

words = ["Amuu2", "Q1BFt", "dUM€n", "o°8o1G", "mgF)`", "ZR°p", "Y9^^M", "W0PD7"]

allowed_pattern = re.compile(
    '^[' +
    ''.join(
        re.escape(char) for char in (
            alpha +
            digit +
            punct)) +
    ']+$')
# ^[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^_\`\{\|\}\~]+$

print([word for word in words if allowed_pattern.match(word)])
# ['Amuu2', 'Q1BFt', 'mgF)`', 'Y9^^M', 'W0PD7']

You could also write: 你也可以这样写:

print(list(filter(allowed_pattern.match, words)))
# ['Amuu2', 'Q1BFt', 'mgF)`', 'Y9^^M', 'W0PD7']

re.compile will probably require more time than simply initializing a set but the filtering might be faster then. re.compile可能比简单地初始化一个set需要更多的时间,但过滤可能会更快。

This is a "not" efficient solution for your problem but it can be interesting for learning how to loop a list, chars, etc. 对于您的问题,这是一个“非”有效的解决方案,但学习如何循环列表,字符等可能很有趣。

# coding=utf-8
import string

# Aux var
result =[]
new_elem = ""

# lists with legal characters
alpha = list(string.ascii_letters)
digit = list(string.digits)
punct = list(string.punctuation)

# Input strings
Input = ["Amuu2", "Q1BFt", "dUM€n", "o°8o1G", "mgF)`", "ZR°p", "Y9^^M", "W0PD7"]

# Loop all elements of the list and each char of them
for elem in Input:
    ## check each char 
    for char in elem:
        if char in alpha:
            #print 'is ascii'
            new_elem += char
        elif char in digit:
            #print 'is digit'
            new_elem += char
        elif char in punct:
            #print 'is punct'
            new_elem += char
        else:
            new_elem = ""
            break
    ## Add to result list
    if new_elem != "":
        result.append(new_elem)
        new_elem = ""

print result

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM