Python Removing non-alphabetical characters with exceptions

Question

I am having a hard time doing Data Analysis on a large text that has lots of non-alphabetical chars. I tried using

string = filter(str.isalnum, string)

but I also have "@" in my text that I want to keep. How do I make an exception for a character like "@" ?

Answer 1

使用正则表达式更容易：

string = re.sub("[^A-Za-z0-9@]", "", string)

Answer 2

You can use re.sub

re.sub(r'[^\w\s\d@]', '', string)

Example:

>>> re.sub(r'[^\w\s\d@]', '', 'This is @ string 123 *$^%')
This is @ string 123

Answer 3

You could use a lambda function to specify your allowed characters. But also note that filter returns a <filter object> which is an iterator over the returned values. So you will have to stich it back to a string:

string = "?filter_@->me3!"

extra_chars = "@!"

filtered_object = filter(lambda c: c.isalnum() or c in extra_chars, string)

string = "".join(filtered_object)

print(string)

Gives:

filter@me3!

Answer 4

One way to do this would be to create a function that returns True or False if an input character is valid.

import string

valid_characters = string.ascii_letters + string.digits + '@'

def is_valid_character(character):
    return character in valid_characters

# Instead of using `filter`, we `join` all characters in the input string
# if `is_valid_character` is `True`.
def get_valid_characters(string):
    return "".join(char for char in string if is_valid_character(char))

Some example output:

>>> print(valid_characters)
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789@

>>> get_valid_characters("!Hello_#world?")
'Helloworld'

>>> get_valid_characters("user@example")
'user@example'

A simpler way to write it would be using regex. This will accomplish the same thing:

import re

def get_valid_characters(string):
    return re.sub(r"[^\w\d@]", "", string)

Python Removing non-alphabetical characters with exceptions

Question

4 answers

solution1
4 2019-12-09 22:04:24

solution2
2 2019-12-09 22:07:08

solution3
1 2019-12-09 22:09:49

solution4
1 2019-12-09 22:10:59

Python Removing non-alphabetical characters with exceptions

Question

4 answers

solution1 4 2019-12-09 22:04:24

solution2 2 2019-12-09 22:07:08

solution3 1 2019-12-09 22:09:49

solution4 1 2019-12-09 22:10:59

solution1
4 2019-12-09 22:04:24

solution2
2 2019-12-09 22:07:08

solution3
1 2019-12-09 22:09:49

solution4
1 2019-12-09 22:10:59