简体   繁体   English

如何删除除26个字母之外的所有字母,以及。 ,()'“?!来自Python中的字符串?

[英]How to remove all but the 26 letters, and . , ( ) ' " ? ! from a string in Python?

I have : 我有 :

string = 'Here it is, your gif! am a bot. [^(Report an issue)] ❤ that bot,I ❤ ur mom **YEET** 😎 ,GOTTEM!"'

and I try : 我尝试:

string = re.sub(r'\W+', ' ', string)

and that gives me : 这给了我:

'Here it is your gif am a bot Report an issue that bot I ur mom YEET GOTTEM'  

But I would like this : 但我想这样:

'Here it is, your gif! am a bot. (Report an issue) that bot,I ur mom YEET ,GOTTEM!"'

Just the 26 letters, no numbers and only the most used symbols in this group: .,()'"?! 仅26个字母,没有数字,只有该组中最常用的符号: .,()'"?!

对要接受的事物进行字符分类(使用[] )并将其反转(使用前导^使其变为[^stuff] ):

string = re.sub(r'[^a-zA-Z.,()\'"?! ]+', '', string)

Use this for your regex instead : [^a-zA-Z?!.,()\\'" ]+ 改用它作为您的正则表达式: [^a-zA-Z?!.,()\\'" ]+

The brakets define a collection of elements you wish to select, the caret at the front defines the negation of what is inside. 胸像定义了您要选择的元素的集合,前面的插入符号定义了对内部内容的否定。

Thus leaving you with 这样就让你

pattern = r'[^a-zA-Z?!.,()\'" ]+'
string = re.sub(pattern, ' ', string)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM