I have this problem where I want to remove a list element if it contains 'illegal' characters. The legal characters are specified in multiple lists. They are formed like this, where alpha
stands for the alphabet (az + AZ), digit
stands for digits (0-9) and punct
stands for punctuation (sort of).
alpha = list(string.ascii_letters)
digit = list(string.digits)
punct = list(string.punctuation)
This way I can specify something as an illegal character if it doesn't appear in one of these lists.
After that I have a list containing elements:
Input = ["Amuu2", "Q1BFt", "dUM€n", "o°8o1G", "mgF)`", "ZR°p", "Y9^^M", "W0PD7"]
I want to filter out the elements containing illegal characters. So this is the result I want to get (doesn't need to be ordered):
var = ["Amuu2", "Q1BFt", "mgF)`", "Y9^^M", "W0PD7"]
EDIT:
I have tried (and all variants of it):
for InItem in Input:
if any(AlItem in InItem for AlItem in alpha+digit+punct):
FilInput.append(InItem)
where a new list is created with only the filtered elements, but the problem here is that the elements get added when the contain at least one legal character. For example: "ZR°p"
got added, because it contains a Z, R and a p.
I also tried:
for InItem in Input:
if not any(AlItem in InItem for AlItem in alpha+digit+punct):
but after that, I couldn't figure out how to remove the element. Oh, and a little tip, to make it extra difficult, it would be nice if it were a little bit fast because it needs to be done millions of times. But it needs to be working first.
Define a set of legal characters. Then apply a list comprehension.
>>> allowed = set(string.ascii_letters + string.digits + string.punctuation)
>>> inp = ["Amuu2", "Q1BFt", "dUM€n", "o°8o1G", "mgF)`", "ZR°p", "Y9^^M", "W0PD7"]
>>> [x for x in inp if all(c in allowed for c in x)]
['Amuu2', 'Q1BFt', 'mgF)`', 'Y9^^M', 'W0PD7']
You can use a list comprehension and check with all
if all characters match your criteria:
>>> [element for element in Input if all(c in alpha + digit + punct for c in element)]
['Amuu2', 'Q1BFt', 'mgF)`', 'Y9^^M', 'W0PD7']
As you mentioned, you append words as soon as any
character is a correct one. You need to check that they are all
correct:
filtered_words = []
for word in words:
if all(char in alpha+digit+punct for char in word):
filtered_words.append(word)
print(filtered_words)
# ['Amuu2', 'Q1BFt', 'mgF)`', 'Y9^^M', 'W0PD7']
You could also check that there's not a single character which isn't correct:
filtered_words = []
for word in words:
if not any(char not in alpha+digit+punct for char in word):
filtered_words.append(word)
print(filtered_words)
It's much less readable though.
For efficiency, you shouldn't concatenate lists during each iteration with alpha+digit+punct
. You should do it once and for all, before any loop. It's also a good idea to create a set out of those lists, because char in set
is much faster than char in list
when there are many allowed characters.
Finally, you could use a list comprehension to avoid the for loop. If you do all this, you end up with @timgeb's solution :)
You can create a regex pattern from your lists and see which words match:
# encoding: utf-8
import string
import re
alpha = list(string.ascii_letters)
digit = list(string.digits)
punct = list(string.punctuation)
words = ["Amuu2", "Q1BFt", "dUM€n", "o°8o1G", "mgF)`", "ZR°p", "Y9^^M", "W0PD7"]
allowed_pattern = re.compile(
'^[' +
''.join(
re.escape(char) for char in (
alpha +
digit +
punct)) +
']+$')
# ^[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^_\`\{\|\}\~]+$
print([word for word in words if allowed_pattern.match(word)])
# ['Amuu2', 'Q1BFt', 'mgF)`', 'Y9^^M', 'W0PD7']
You could also write:
print(list(filter(allowed_pattern.match, words)))
# ['Amuu2', 'Q1BFt', 'mgF)`', 'Y9^^M', 'W0PD7']
re.compile
will probably require more time than simply initializing a set
but the filtering might be faster then.
This is a "not" efficient solution for your problem but it can be interesting for learning how to loop a list, chars, etc.
# coding=utf-8
import string
# Aux var
result =[]
new_elem = ""
# lists with legal characters
alpha = list(string.ascii_letters)
digit = list(string.digits)
punct = list(string.punctuation)
# Input strings
Input = ["Amuu2", "Q1BFt", "dUM€n", "o°8o1G", "mgF)`", "ZR°p", "Y9^^M", "W0PD7"]
# Loop all elements of the list and each char of them
for elem in Input:
## check each char
for char in elem:
if char in alpha:
#print 'is ascii'
new_elem += char
elif char in digit:
#print 'is digit'
new_elem += char
elif char in punct:
#print 'is punct'
new_elem += char
else:
new_elem = ""
break
## Add to result list
if new_elem != "":
result.append(new_elem)
new_elem = ""
print result
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.