简体   繁体   English

Python Regex使用通配符匹配字符串的开头并替换整个字符串

[英]Python Regex using a wildcard to match the beginning of a string and replacing the entire string

I'm trying to match the beginning of a word and then replace the entire word with something. 我试图匹配一个单词的开头,然后用一些东西替换整个单词。 Below is what I'm trying to do. 以下是我正在尝试做的事情。

add23khh234 > REMOVED
add2asdf675 > REMOVED

Below is the regex statement I'm using. 下面是我正在使用的正则表达式语句。

string_reg = re.sub(ur'add*', 'REMOVED', string_reg)

But this code gives me the following. 但是这段代码给了我以下内容。

add23khh234 > REMOVED23khh234
add2asdf675 > REMOVED2asdf675  

add* is ad '*d' . add*ad '*d' From the document : 从文件

'*'

Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible. 使得到的RE匹配前面RE的0或更多次重复,尽可能多的重复。 ab* will match a , ab , or a followed by any number of b s. ab*将匹配aaba后跟任意数量的b s。

So it matchs ad or add or adddddd... . 所以它匹配adaddadd adddddd... But it doesn't match neither add23khh234 nor add2asdf675 (or something like these). 但它既不匹配add23khh234也不匹配add2asdf675 (或类似的东西)。

You should use .+? 你应该使用.+? or .*? .*? here(not .* , that's greedy). 在这里(不是.* ,那是贪婪的)。 Try something like this: 尝试这样的事情:

string_reg = re.sub(ur'add.+? ', 'REMOVED ', string_reg)

Demo: 演示:

>>> import re
>>> string_reg = """\
... add23khh234 > REMOVED23khh234
... add2asdf675 > REMOVED2asdf675"""

>>> string_reg = re.sub(ur'add.+? ', 'REMOVED ', string_reg)
>>> print string_reg
REMOVED > REMOVED23khh234
REMOVED > REMOVED2asdf675
>>> 

尝试这个

string_reg = re.sub(ur'^add.*', 'REMOVED', string_reg)

如果你在一行上有多个模式

string_reg=re.sub("add[^ ]+","REMOVED",string_reg)

Short answer 简短的回答

\badd\w*

A quantifier such as * is applied to the previous token or subpattern. 诸如*量词应用于先前的标记或子模式。 for example, the regex you're using add* matches a literal ad followed by any number of subsequent d . 例如,您正在使用的正则表达式add*匹配文字ad后跟任意数量的后续d

Meeting your criteria 符合您的标准

  • You need to match add at the beggining of a word, so use a word boundary \\b 您需要在单词的开始处匹配add ,因此请使用单词边界 \\b
  • Then you also need to match the rest of the word in order to replace it. 然后你还需要匹配单词的其余部分才能替换它。 \\w is a shorthand for [a-zA-Z0-9_] , which matches 1 word character, and that's what you need to repeat any number of times with * . \\w[a-zA-Z0-9_]简写 ,它匹配1个字符,这就是你需要用*重复任意次数。

Code

import re

string_reg = 'add23khh234 ... add2asdf675 ... xxxadd2axxx'

string_reg = re.sub(ur'\badd\w*', 'REMOVED', string_reg)
print(string_reg)

Output 产量

REMOVED ... REMOVED ... xxxadd2axxx

ideone demo ideone演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM