I need help with a code I want to remove non-ascii and special characters from a string.
s = "Bjørn 10.2.3"
And I want it so that the output would remove special characters and non-ascii characteres.
like so,
>>> Bjrn 1023
I'm aware of how to do it when it's only non-ascii or special characters.. not sure how it's done when it's both
What I have so far
For special characters
s = re.sub("[\"\'.]", "", special_character_string)
And For Non Ascii
encode = non_ascii_string.encode("ascii", "ignore")
s = encode.decode()
I mean it all comes down to which characters you want to remove, but the more important thing to focus on is the algorithm. A solution to your problem could be to iterate down your string and validate that each letter is considered "valid", by comparing each character to a list of valid characters.
# Make a list of all your valid letters
valids = ["a", "b", "c" ... ]
# Iterate for each character in your string
final_string = ""
original_string = "Bjørn 10.2.3"
for character in list(original_string):
# If the character is not valid
if character in valids:
final_string += character
# Your final string contains only your valid characters
print(final_string)
You can try using simple Regex and .replace()
-
import re
my_string = "Bjørn 10.2.3"
new_string = re.sub('[^A-z0-9 -]', '', my_string).replace(" ", " ")
print (new_string)
Output:
Bjrn 1023
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.