[英]How to remove all but the 26 letters, and . , ( ) ' " ? ! from a string in Python?
I have : 我有 :
string = 'Here it is, your gif! am a bot. [^(Report an issue)] ❤ that bot,I ❤ ur mom **YEET** 😎 ,GOTTEM!"'
and I try : 我尝试:
string = re.sub(r'\W+', ' ', string)
and that gives me : 这给了我:
'Here it is your gif am a bot Report an issue that bot I ur mom YEET GOTTEM'
But I would like this : 但我想这样:
'Here it is, your gif! am a bot. (Report an issue) that bot,I ur mom YEET ,GOTTEM!"'
Just the 26 letters, no numbers and only the most used symbols in this group: .,()'"?!
仅26个字母,没有数字,只有该组中最常用的符号:
.,()'"?!
对要接受的事物进行字符分类(使用[]
)并将其反转(使用前导^
使其变为[^stuff]
):
string = re.sub(r'[^a-zA-Z.,()\'"?! ]+', '', string)
Use this for your regex instead : [^a-zA-Z?!.,()\\'" ]+
改用它作为您的正则表达式:
[^a-zA-Z?!.,()\\'" ]+
The brakets define a collection of elements you wish to select, the caret at the front defines the negation of what is inside. 胸像定义了您要选择的元素的集合,前面的插入符号定义了对内部内容的否定。
Thus leaving you with 这样就让你
pattern = r'[^a-zA-Z?!.,()\'" ]+'
string = re.sub(pattern, ' ', string)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.