I am having a hard time doing Data Analysis on a large text that has lots of non-alphabetical chars. I tried using
string = filter(str.isalnum, string)
but I also have "@"
in my text that I want to keep. How do I make an exception for a character like "@"
?
使用正则表达式更容易:
string = re.sub("[^A-Za-z0-9@]", "", string)
You can use re.sub
re.sub(r'[^\w\s\d@]', '', string)
Example:
>>> re.sub(r'[^\w\s\d@]', '', 'This is @ string 123 *$^%')
This is @ string 123
You could use a lambda
function to specify your allowed characters. But also note that filter
returns a <filter object>
which is an iterator over the returned values. So you will have to stich it back to a string:
string = "?filter_@->me3!"
extra_chars = "@!"
filtered_object = filter(lambda c: c.isalnum() or c in extra_chars, string)
string = "".join(filtered_object)
print(string)
Gives:
filter@me3!
One way to do this would be to create a function that returns True
or False
if an input character is valid.
import string
valid_characters = string.ascii_letters + string.digits + '@'
def is_valid_character(character):
return character in valid_characters
# Instead of using `filter`, we `join` all characters in the input string
# if `is_valid_character` is `True`.
def get_valid_characters(string):
return "".join(char for char in string if is_valid_character(char))
Some example output:
>>> print(valid_characters)
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789@
>>> get_valid_characters("!Hello_#world?")
'Helloworld'
>>> get_valid_characters("user@example")
'user@example'
A simpler way to write it would be using regex. This will accomplish the same thing:
import re
def get_valid_characters(string):
return re.sub(r"[^\w\d@]", "", string)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.