简体   繁体   English

Python-正则表达式查找字符串中的所有匹配项并替换

[英]Python - Regex find all matches in string and replace

I have a problem on replacing string using regex, seem I can't get it to work 我在使用正则表达式替换字符串时遇到问题,看来我无法使它正常工作

string = "<font x=''>test</font> <font y=''>test2</font> <font z=''>test3</font>"
if re.search("(<font .*?>)", string, re.IGNORECASE):
    r = re.compile(r"<font (?P<name>.*?)>.*?</font>", re.IGNORECASE)
    string = r.sub(r'', string)

For some reason all the regex deletes the entire string '' . 由于某种原因,所有正则表达式都会删除整个字符串'' It should return as test test2 test3 它应该返回为test test2 test3

Here it is, 这里是,

>>> import re
>>> string = "<font x=''>test</font> <font y=''>test2</font> <font z=''>test3</font>"
>>> if re.search("(<font .*?>)", string, re.IGNORECASE):
...     r = re.compile(r"</?font.*?>", re.IGNORECASE)
...     string = r.sub(r'', string)
... 
>>> string
'test test2 test3'

DEMO 演示

Pattern Explanation: 模式说明:

  • </?font.*?> This regex would match all the opening and closing font tags. </?font.*?>此正则表达式将匹配所有打开和关闭字体标签。 By adding ? 通过添加? after the / will make the previous character that is / as optional. /将使前一个字符是/可选。
  • .*? Will do a shortest possible match. 会进行最短的比赛。 ? after the * would force the regex engine to do a shortest possible match because * is greedy by default. *将强制正则表达式引擎进行最短匹配,因为默认情况下*为贪婪。 It could consume so many chars as much as possible. 它可能会消耗尽可能多的字符。
  • > Matches the > symbol literally. >从字面上匹配>符号。
  • re.IGNORECASE is called case-insensitive modifier. re.IGNORECASE称为不区分大小写的修饰符。
>([^<]*)<\/

you can use it as 您可以将其用作

y="<font x=''>test</font> <font y=''>test2</font> <font z=''>test3</font>"
x=re.findall(r"(?<=>)([^<]*)(?=<\/)",y) 
str=" ".join(x)
print str

See demo 观看演示

http://regex101.com/r/xT7yD8/6 http://regex101.com/r/xT7yD8/6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM