finding speical character smileys in a string

Question

Im currently writing a program in python where I have to figure out smileys like these :) , :( , :-) , :-( should be replace if it is followed by special characters and punctuation should be replaced in this pattern : ex : Hi, this is good :)# should be replaced to Hi, this is good :) .

I have created regex pattern for sub it but couldn't enclose this smiley :-) in my re.compile .It is considering that as a range.

re.sub(r"[^a-zA-Z0-9:):D)]+", " " , words) this is working fine I need to add :-) smiley to the regex.

Answer 1

One approach is to use the following pattern:

(:\)|:\(|:-\)|:-\()[^A-Za-z0-9]+

This matches and captures a smiley face, then matches any number of non alphanumeric characters immediately afterwards. The replacement is just the captured smiley face, thereby removing the non alpha characters.

input = "Hi, this is good :)#"
output = re.sub(r"(:\)|:\(|:-\)|:-\()[^A-Za-z0-9]+", "\1" , input)
print(output)

Hi, this is good :)

Answer 2

The [^a-zA-Z0-9:):D)] pattern is erronrous since it is a character class meant to match sequences of chars. You need to add an alternative to this regex that will match char sequences.

To remove any punctuation other than a certain list of smileys you may use

re.sub(r"(:-?[()D])|[^A-Za-z0-9\s]", r"\1" , s)

Or, in Python 3.4 and older, due to the re.sub bug :

re.sub(r"(:-?[()D])|[^A-Za-z0-9,\s]", lambda x: x.group(1) if x.group(1) else "", s)

If you really need to avoid removing commas, add , into the negated character class:

re.sub(r"(:-?[()D])|[^A-Za-z0-9,\s]", r"\1" , s)
                               ^

See the regex demo .

Details

(:-?[()D]) - matches and captures into Group 1 a : , then an optional - , and then a single char from the character class: ( , ) or D (this captures the smileys like :-) , :-( , :) , :( , :-D , :D )
[^A-Za-z0-9,\\s] - matches any char but an ASCII letter, digit, comma and whitespace. To make it fully Unicode aware, replace with (?:[^\\w\\s,]|_) .

See the Python 3.5+ demo :

import re
s = "Hi, this is good :)#"
print( re.sub(r"(:-?[()D])|[^A-Za-z0-9,\s]", r"\1" , s) )
# => Hi, this is good :)

See this Python 3.4- demo :

import re
s = "Hi, this is good :)#"
print( re.sub(r"(:-?[()D])|[^A-Za-z0-9,\s]", lambda x: x.group(1) if x.group(1) else "", s) )
# => Hi, this is good :)

Answer 3

您可以使用\\来转义特殊字符，请尝试：

re.sub("[^a-zA-Z0-9:):D:\-))]+", " " , words)

finding speical character smileys in a string

Question

3 answers

solution1
1 2019-03-05 07:21:31

solution2
1 2019-03-05 07:35:31

solution3
0 2019-03-05 07:19:36

finding speical character smileys in a string

Question

3 answers

solution1 1 2019-03-05 07:21:31

solution2 1 2019-03-05 07:35:31

solution3 0 2019-03-05 07:19:36

solution1
1 2019-03-05 07:21:31

solution2
1 2019-03-05 07:35:31

solution3
0 2019-03-05 07:19:36