Im currently writing a program in python where I have to figure out smileys like these :)
, :(
, :-)
, :-(
should be replace if it is followed by special characters and punctuation should be replaced in this pattern : ex : Hi, this is good :)#
should be replaced to Hi, this is good :)
.
I have created regex pattern for sub it but couldn't enclose this smiley :-)
in my re.compile
.It is considering that as a range.
re.sub(r"[^a-zA-Z0-9:):D)]+", " " , words)
this is working fine I need to add :-)
smiley to the regex.
One approach is to use the following pattern:
(:\)|:\(|:-\)|:-\()[^A-Za-z0-9]+
This matches and captures a smiley face, then matches any number of non alphanumeric characters immediately afterwards. The replacement is just the captured smiley face, thereby removing the non alpha characters.
input = "Hi, this is good :)#"
output = re.sub(r"(:\)|:\(|:-\)|:-\()[^A-Za-z0-9]+", "\1" , input)
print(output)
Hi, this is good :)
The [^a-zA-Z0-9:):D)]
pattern is erronrous since it is a character class meant to match sequences of chars. You need to add an alternative to this regex that will match char sequences.
To remove any punctuation other than a certain list of smileys you may use
re.sub(r"(:-?[()D])|[^A-Za-z0-9\s]", r"\1" , s)
Or, in Python 3.4 and older, due to the re.sub
bug :
re.sub(r"(:-?[()D])|[^A-Za-z0-9,\s]", lambda x: x.group(1) if x.group(1) else "", s)
If you really need to avoid removing commas, add ,
into the negated character class:
re.sub(r"(:-?[()D])|[^A-Za-z0-9,\s]", r"\1" , s)
^
See the regex demo .
Details
(:-?[()D])
- matches and captures into Group 1 a :
, then an optional -
, and then a single char from the character class: (
, )
or D
(this captures the smileys like :-)
, :-(
, :)
, :(
, :-D
, :D
) [^A-Za-z0-9,\\s]
- matches any char but an ASCII letter, digit, comma and whitespace. To make it fully Unicode aware, replace with (?:[^\\w\\s,]|_)
. See the Python 3.5+ demo :
import re
s = "Hi, this is good :)#"
print( re.sub(r"(:-?[()D])|[^A-Za-z0-9,\s]", r"\1" , s) )
# => Hi, this is good :)
See this Python 3.4- demo :
import re
s = "Hi, this is good :)#"
print( re.sub(r"(:-?[()D])|[^A-Za-z0-9,\s]", lambda x: x.group(1) if x.group(1) else "", s) )
# => Hi, this is good :)
您可以使用\\
来转义特殊字符,请尝试:
re.sub("[^a-zA-Z0-9:):D:\-))]+", " " , words)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.