[英]Python Regex using a wildcard to match the beginning of a string and replacing the entire string
I'm trying to match the beginning of a word and then replace the entire word with something. 我试图匹配一个单词的开头,然后用一些东西替换整个单词。 Below is what I'm trying to do.
以下是我正在尝试做的事情。
add23khh234 > REMOVED
add2asdf675 > REMOVED
Below is the regex statement I'm using. 下面是我正在使用的正则表达式语句。
string_reg = re.sub(ur'add*', 'REMOVED', string_reg)
But this code gives me the following. 但是这段代码给了我以下内容。
add23khh234 > REMOVED23khh234
add2asdf675 > REMOVED2asdf675
add*
is ad '*d'
. add*
是ad '*d'
。 From the document : 从文件 :
'*'
Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible.
使得到的RE匹配前面RE的0或更多次重复,尽可能多的重复。
ab*
will matcha
,ab
, ora
followed by any number ofb
s.ab*
将匹配a
,ab
或a
后跟任意数量的b
s。
So it matchs ad
or add
or adddddd...
. 所以它匹配
ad
或add
或add
adddddd...
But it doesn't match neither add23khh234
nor add2asdf675
(or something like these). 但它既不匹配
add23khh234
也不匹配add2asdf675
(或类似的东西)。
You should use .+?
你应该使用
.+?
or .*?
或
.*?
here(not .*
, that's greedy). 在这里(不是
.*
,那是贪婪的)。 Try something like this: 尝试这样的事情:
string_reg = re.sub(ur'add.+? ', 'REMOVED ', string_reg)
Demo: 演示:
>>> import re
>>> string_reg = """\
... add23khh234 > REMOVED23khh234
... add2asdf675 > REMOVED2asdf675"""
>>> string_reg = re.sub(ur'add.+? ', 'REMOVED ', string_reg)
>>> print string_reg
REMOVED > REMOVED23khh234
REMOVED > REMOVED2asdf675
>>>
尝试这个
string_reg = re.sub(ur'^add.*', 'REMOVED', string_reg)
如果你在一行上有多个模式
string_reg=re.sub("add[^ ]+","REMOVED",string_reg)
Short answer 简短的回答
\badd\w*
A quantifier such as *
is applied to the previous token or subpattern. 诸如
*
的量词应用于先前的标记或子模式。 for example, the regex you're using add*
matches a literal ad
followed by any number of subsequent d
. 例如,您正在使用的正则表达式
add*
匹配文字ad
后跟任意数量的后续d
。
Meeting your criteria 符合您的标准
add
at the beggining of a word, so use a word boundary \\b
add
,因此请使用单词边界 \\b
\\w
is a shorthand for [a-zA-Z0-9_]
, which matches 1 word character, and that's what you need to repeat any number of times with *
. \\w
是[a-zA-Z0-9_]
的简写 ,它匹配1个字符,这就是你需要用*
重复任意次数。 Code 码
import re
string_reg = 'add23khh234 ... add2asdf675 ... xxxadd2axxx'
string_reg = re.sub(ur'\badd\w*', 'REMOVED', string_reg)
print(string_reg)
Output 产量
REMOVED ... REMOVED ... xxxadd2axxx
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.