简体   繁体   English

Python正则表达式,避免跳过括号

[英]Python regex, avoid skipping brackets

I want to replace a regex with '*', but only if the regex is out side of <>. 我想用'*'替换正则表达式,但前提是正则表达式在<>之外。 The whole point is to not interfere with the html tags. 重点是不要干扰html标签。

I use this to replace: 我用它来代替:

re.sub(r'SOMEREGEX(?=[^>]*(<|$))', '*', line)

However I ran into his problem: if my regex is: 但是我遇到了他的问题:如果我的正则表达式是:

f.*k

Then this: 然后这个:

fzzzzzzzzz<HTMLTAG>zzzzzzzk

Would become an '*', which I don't want. 会变成'*',这是我不想要的。 How do I overcome this problem? 我该如何克服这个问题?

Constraints: 约束:

-All brackets are matched - 所有括号都匹配

-No nested brackets - 没有嵌套括号

-SOMEREGEX is provided by the user. -SOMEREGEX由用户提供。 I prefer not changing that. 我不想改变它。

You could try replacing the . 你可以尝试更换. character - "any character at all" - with the character class [^<>] , which matches any character except the angle brackets, <> . character - “任何字符” - 使用字符类[^<>] ,匹配尖括号<> 之外的任何字符。 This would give the regex f[^<>]*k . 这将给出正则表达式f[^<>]*k This would match facebook but not face<b>book . 这将匹配facebook但不是face<b>book

There are still things that can go wrong with this, though. 但是,仍有一些事情可能出错。 Have you considered using a proper HTML parser instead of regular expressions? 您是否考虑过使用正确的HTML解析器而不是正则表达式? BeautifulSoup is easy, tasty and fun. BeautifulSoup简单,美味,有趣。

Search between the end and start angle brackets: 在结束和开始尖括号之间搜索:

re.sub(r'(^|>)f[^<]*k(<|$)', r'\1*\2', line)

The \\1 and \\2 are required to replace the angle brackets that the pattern may have removed from line . 需要\\1\\2来替换图案可能已从line移除的尖括号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM