简体   繁体   中英

Remove non-ascii and special characters from a string Python

I need help with a code I want to remove non-ascii and special characters from a string.

   s = "Bjørn 10.2.3"

And I want it so that the output would remove special characters and non-ascii characteres.

like so,

  >>> Bjrn 1023

I'm aware of how to do it when it's only non-ascii or special characters.. not sure how it's done when it's both

What I have so far

For special characters

s = re.sub("[\"\'.]", "", special_character_string)

And For Non Ascii

encode = non_ascii_string.encode("ascii", "ignore")
        
    s = encode.decode()

I mean it all comes down to which characters you want to remove, but the more important thing to focus on is the algorithm. A solution to your problem could be to iterate down your string and validate that each letter is considered "valid", by comparing each character to a list of valid characters.

# Make a list of all your valid letters
valids = ["a", "b", "c" ... ]

# Iterate for each character in your string
final_string = ""
original_string = "Bjørn 10.2.3"
for character in list(original_string):
    # If the character is not valid
    if character in valids:
        final_string += character

# Your final string contains only your valid characters
print(final_string)

You can try using simple Regex and .replace() -

import re

my_string = "Bjørn 10.2.3"
new_string = re.sub('[^A-z0-9 -]', '', my_string).replace(" ", " ")
print (new_string)

Output:

Bjrn 1023

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM