Python正则表达式，避免跳过括号

Question

I want to replace a regex with '*', but only if the regex is out side of <>. 我想用'*'替换正则表达式，但前提是正则表达式在<>之外。 The whole point is to not interfere with the html tags. 重点是不要干扰html标签。

I use this to replace: 我用它来代替：

re.sub(r'SOMEREGEX(?=[^>]*(<|$))', '*', line)

However I ran into his problem: if my regex is: 但是我遇到了他的问题：如果我的正则表达式是：

f.*k

Then this: 然后这个：

fzzzzzzzzz<HTMLTAG>zzzzzzzk

Would become an '*', which I don't want. 会变成'*'，这是我不想要的。 How do I overcome this problem? 我该如何克服这个问题？

Constraints: 约束：

-All brackets are matched - 所有括号都匹配

-No nested brackets - 没有嵌套括号

-SOMEREGEX is provided by the user. -SOMEREGEX由用户提供。 I prefer not changing that. 我不想改变它。

Answer 1

You could try replacing the . 你可以尝试更换. character - "any character at all" - with the character class [^<>] , which matches any character except the angle brackets, <> . character - “任何字符” - 使用字符类[^<>] ，匹配除尖括号<> 之外的任何字符。 This would give the regex f[^<>]*k . 这将给出正则表达式f[^<>]*k 。 This would match facebook but not face<b>book . 这将匹配facebook但不是face<b>book 。

There are still things that can go wrong with this, though. 但是，仍有一些事情可能出错。 Have you considered using a proper HTML parser instead of regular expressions? 您是否考虑过使用正确的HTML解析器而不是正则表达式？ BeautifulSoup is easy, tasty and fun. BeautifulSoup简单，美味，有趣。

Answer 2

Search between the end and start angle brackets: 在结束和开始尖括号之间搜索：

re.sub(r'(^|>)f[^<]*k(<|$)', r'\1*\2', line)

The \\1 and \\2 are required to replace the angle brackets that the pattern may have removed from line . 需要\\1和\\2来替换图案可能已从line移除的尖括号。

Python正则表达式，避免跳过括号

问题描述

2 个解决方案

解决方案1
2 已采纳 2012-06-15 23:08:36

解决方案2
0 2012-06-15 23:37:25

Python正则表达式，避免跳过括号

问题描述

2 个解决方案

解决方案1 2 已采纳 2012-06-15 23:08:36

解决方案2 0 2012-06-15 23:37:25

解决方案1
2 已采纳 2012-06-15 23:08:36

解决方案2
0 2012-06-15 23:37:25