简体   繁体   English

使用python中的re.sub函数删除字符串中的特殊字符和撇号以及多余的空格

[英]Remove special character and apostrophe and unwanted space in string using re.sub function in python

import re
def removePunctuation(text):

    return re.sub(r'[ \W,_,]+', ' ', text.lower()).lstrip()
print removePunctuation('Hi, 'you!')
print removePunctuation(' No's under_score!')

i want result : 我想要结果:

hi you
nos under score 

You may try this, 你可以试试看

def removePunctuation(text):
    return re.sub(r'^\s+|\s+$|[^A-Za-z\d\s]', '', text.lower())

or 要么

Seems like you want to replace all the underscore with space and all the other special chars with an empty string. 似乎您想用空格替换所有下划线,并用空字符串替换所有其他特殊字符。

>>> re.sub(r'^\s+|\s+$|[^A-Za-z\d\s]', '', " No's under_score!".lower().replace('_', ' '))
'nos under score'
>>> re.sub(r'^\s+|\s+$|[^A-Za-z\d\s]', '', " Hi, 'you!'".lower().replace('_', ' '))
'hi you'

Regex is a wonderful string manipulation tool, but within python it at times may be an overkill, and this particular example is one of its kind. 正则表达式是一个很棒的字符串处理工具,但是在python中有时它可能是一个过大的杀伤力,这个特定示例就是其中的一种。 Python has some thought over neatly crafted string libraries that can do wonders without regex and for this example str.translate and unicode.translate is ideal Python对经过精心设计的字符串库进行了一些思考,这些字符串库无需使用正则表达式就可以实现奇迹,对于本示例,str.translate和unicode.translate是理想的选择

For Python 2.X 对于Python 2.X

def removePunctuation(text):
    from string import punctuation
    return ' '.join(text.translate(None, punctuation))

For Unicode and Python 3.X 对于Unicode和Python 3.X

def removePunctuationU(text):
    from string import punctuation
    return u' '.join(text.translate({ord(c): None for c in punctuation}).split())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM