简体   繁体   English

替换多个模式python

[英]replace more than one pattern python

I have reviewed various links but all showed how to replace multiple words in one pass. 我已经审查了各种链接,但都展示了如何在一次通过中替换多个单词。 However, instead of words I want to replace patterns eg 但是,我想要替换模式,而不是单词

RT @amrightnow: "The Real Trump" Trump About You" Watch Make #1 https:\\/\\/t.co\\/j58e8aacrE #tcot #pjnet #1A #2A #Tru mp #trump2016 https:\\/\\/t.co\… RT @amrightnow:“真正的特朗普”关于你的特朗普“观看制作#1 https:\\ / \\ / t.co \\ / j58e8aacrE #tcot #pjnet#1A#2A #Tru mp#trump2016 https:\\ / \\ / t。合作\\ U2026

When I perform the following two commands on the above text I get the desired output 当我在上面的文本上执行以下两个命令时,我得到了所需的输出

result = re.sub(r"http\S+","",sent)
result1 = re.sub(r"@\S+","",result)

This way I am removing all the urls and @(handlers from the tweet). 这样我就删除了所有网址和@(来自推文的处理程序)。 The output will be something like follows: 输出将如下所示:

>>> result1
'RT  "The Real Trump" Trump About You" Watch Make #1  #tcot #pjnet #1A #2A #Trump #trump2016 '

Could someone let me know what is the best way to do it? 有人能告诉我这是最好的方法吗? I will be basically reading tweets from a file. 我将基本上从文件中读取推文。 I want to read each tweet and replace these handlers and urls with blanks. 我想阅读每条推文,并用空格替换这些处理程序和网址。

You need the regex "or" operator which is the pipe | 你需要正则表达式“或”运算符,它是管道| :

re.sub(r"http\S+|@\S+","",sent)

If you have a long list of patterns that you want to remove, a common trick is to use join to create the regular expression: 如果您要删除一长串模式,则常用的技巧是使用join来创建正则表达式:

to_match = ['http\S+',
            '@\S+',
            'something_else_you_might_want_to_remove']

re.sub('|'.join(to_match), '', sent)

You can use an "or" pattern by separating the patterns with | 您可以通过使用|分隔模式来使用“或”模式 :

import re

s = u'RT @amrightnow: "The Real Trump" Trump About You" Watch Make #1 https:\/\/t.co\/j58e8aacrE #tcot #pjnet #1A #2A #Tru mp #trump2016 https:\/\/t.co\u2026'
result = re.sub(r"http\S+|@\S+", "", s)
print result

Output 产量

RT  "The Real Trump" Trump About You" Watch Make #1  #tcot #pjnet #1A #2A #Tru mp #trump2016

See the subsection '|' '|'小节 in the regular expression syntax documentation. 正则表达式语法文档中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM