Remove non-ascii and special characters from a string Python

Question

I need help with a code I want to remove non-ascii and special characters from a string.

   s = "Bjørn 10.2.3"

And I want it so that the output would remove special characters and non-ascii characteres.

like so,

  >>> Bjrn 1023

I'm aware of how to do it when it's only non-ascii or special characters.. not sure how it's done when it's both

What I have so far

For special characters

s = re.sub("[\"\'.]", "", special_character_string)

And For Non Ascii

encode = non_ascii_string.encode("ascii", "ignore")
        
    s = encode.decode()

Answer 1

I mean it all comes down to which characters you want to remove, but the more important thing to focus on is the algorithm. A solution to your problem could be to iterate down your string and validate that each letter is considered "valid", by comparing each character to a list of valid characters.

# Make a list of all your valid letters
valids = ["a", "b", "c" ... ]

# Iterate for each character in your string
final_string = ""
original_string = "Bjørn 10.2.3"
for character in list(original_string):
    # If the character is not valid
    if character in valids:
        final_string += character

# Your final string contains only your valid characters
print(final_string)

Answer 2

You can try using simple Regex and .replace() -

import re

my_string = "Bjørn 10.2.3"
new_string = re.sub('[^A-z0-9 -]', '', my_string).replace(" ", " ")
print (new_string)

Output:

Bjrn 1023

Remove non-ascii and special characters from a string Python

Question

2 answers

solution1
1 2021-03-13 10:31:46

solution2
1 ACCPTED 2021-03-13 10:41:20

Remove non-ascii and special characters from a string Python

Question

2 answers

solution1 1 2021-03-13 10:31:46

solution2 1 ACCPTED 2021-03-13 10:41:20

solution1
1 2021-03-13 10:31:46

solution2
1 ACCPTED 2021-03-13 10:41:20