[英]Python regex, avoid skipping brackets
I want to replace a regex with '*', but only if the regex is out side of <>. 我想用'*'替换正则表达式,但前提是正则表达式在<>之外。 The whole point is to not interfere with the html tags.
重点是不要干扰html标签。
I use this to replace: 我用它来代替:
re.sub(r'SOMEREGEX(?=[^>]*(<|$))', '*', line)
However I ran into his problem: if my regex is: 但是我遇到了他的问题:如果我的正则表达式是:
f.*k
Then this: 然后这个:
fzzzzzzzzz<HTMLTAG>zzzzzzzk
Would become an '*', which I don't want. 会变成'*',这是我不想要的。 How do I overcome this problem?
我该如何克服这个问题?
Constraints: 约束:
-All brackets are matched - 所有括号都匹配
-No nested brackets - 没有嵌套括号
-SOMEREGEX is provided by the user. -SOMEREGEX由用户提供。 I prefer not changing that.
我不想改变它。
You could try replacing the .
你可以尝试更换
.
character - "any character at all" - with the character class [^<>]
, which matches any character except the angle brackets, <>
. character - “任何字符” - 使用字符类
[^<>]
,匹配除尖括号<>
之外的任何字符。 This would give the regex f[^<>]*k
. 这将给出正则表达式
f[^<>]*k
。 This would match facebook
but not face<b>book
. 这将匹配
facebook
但不是face<b>book
。
There are still things that can go wrong with this, though. 但是,仍有一些事情可能出错。 Have you considered using a proper HTML parser instead of regular expressions?
您是否考虑过使用正确的HTML解析器而不是正则表达式? BeautifulSoup is easy, tasty and fun.
BeautifulSoup简单,美味,有趣。
Search between the end and start angle brackets: 在结束和开始尖括号之间搜索:
re.sub(r'(^|>)f[^<]*k(<|$)', r'\1*\2', line)
The \\1
and \\2
are required to replace the angle brackets that the pattern may have removed from line
. 需要
\\1
和\\2
来替换图案可能已从line
移除的尖括号。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.