Remove symbols from string but keep whitespaces

Question

I need to remove special characters from a string but I also need to keep whitespaces. This is my code so far:

from unidecode import unidecode
import re

def cleanstr(string):
    if isinstance(string, str):
        string = string.decode('utf-8')
    string = unidecode(string)
    string = re.sub('[^A-Za-z0-9]+', '', string)
    return string

print cleanstr("She's my friend Adélaïde")
>> ShesmyfriendAdelaide

The expected result should be Shes my friend Adelaide .

Answer 1

Without regular expressions

import string

sentence = "vg583$%#jgv f_vrefg fh4ufrh4 %# dhejrfh #"

print "".join([s for s in sentence if s in string.ascii_letters + string.digits + ' '])

Output

'vg583jgv fvrefg fh4ufrh4  dhejrfh'

I admit, can not handle unicode at the moment but you may need to tweak it a bit.

I think your final solution (in case you do want to deal with unicode) should look like this:

u''.join([transform_char(c) for c in your_unicode_string if condition_met(c)])

Answer 2

[^A-Za-z0-9]+

Here you're matching characters that are not AZ, az or 0-9.

You replace these characters with the empty string; that is, you remove them.

If you want to remove other characters, then simply add them to this list!
\\s means whitespace, so:

[^A-Za-z0-9\s]+

Remove symbols from string but keep whitespaces

Question

2 answers

solution1
0 ACCPTED 2017-03-08 16:08:38

solution2
0 2017-03-08 17:02:36

Remove symbols from string but keep whitespaces

Question

2 answers

solution1 0 ACCPTED 2017-03-08 16:08:38

solution2 0 2017-03-08 17:02:36

solution1
0 ACCPTED 2017-03-08 16:08:38

solution2
0 2017-03-08 17:02:36