[英]Remove special character and apostrophe and unwanted space in string using re.sub function in python
import re
def removePunctuation(text):
return re.sub(r'[ \W,_,]+', ' ', text.lower()).lstrip()
print removePunctuation('Hi, 'you!')
print removePunctuation(' No's under_score!')
i want result : 我想要结果:
hi you
nos under score
You may try this, 你可以试试看
def removePunctuation(text):
return re.sub(r'^\s+|\s+$|[^A-Za-z\d\s]', '', text.lower())
or 要么
Seems like you want to replace all the underscore with space and all the other special chars with an empty string. 似乎您想用空格替换所有下划线,并用空字符串替换所有其他特殊字符。
>>> re.sub(r'^\s+|\s+$|[^A-Za-z\d\s]', '', " No's under_score!".lower().replace('_', ' '))
'nos under score'
>>> re.sub(r'^\s+|\s+$|[^A-Za-z\d\s]', '', " Hi, 'you!'".lower().replace('_', ' '))
'hi you'
Regex is a wonderful string manipulation tool, but within python it at times may be an overkill, and this particular example is one of its kind. 正则表达式是一个很棒的字符串处理工具,但是在python中有时它可能是一个过大的杀伤力,这个特定示例就是其中的一种。 Python has some thought over neatly crafted string libraries that can do wonders without regex and for this example str.translate and unicode.translate is ideal Python对经过精心设计的字符串库进行了一些思考,这些字符串库无需使用正则表达式就可以实现奇迹,对于本示例,str.translate和unicode.translate是理想的选择
For Python 2.X 对于Python 2.X
def removePunctuation(text):
from string import punctuation
return ' '.join(text.translate(None, punctuation))
For Unicode and Python 3.X 对于Unicode和Python 3.X
def removePunctuationU(text):
from string import punctuation
return u' '.join(text.translate({ord(c): None for c in punctuation}).split())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.