[英]Python regex: how to match strings that DO NOT contain an *exact* sentence?
I want to filter out messages from a log file that contain eg the sentence This is message 12345. Ignore.
我想从日志文件中过滤出包含例如句子This is message 12345. Ignore.
If I would use grep, I could simple pass the sentence and use the -v
switch, for example: 如果我使用grep,我可以简单地传递句子并使用-v
开关,例如:
grep -v "This is message 12345\. Ignore\." data.log
The thing is, I have to do this in Python. 问题是,我必须在Python中执行此操作。 Something like: 就像是:
import re
with open("data.log") as f:
data = f.read()
# This will select all lines that match the given sentence
re.findall(".*This is message 12345\. Ignore\..*$", data)
# HERE --> I would like to select lines that DO NOT match that sentence
# ???
I've tried to use (?...)
and [^...]
syntax (see here ), but I couldn't get it right. 我试过使用(?...)
和[^...]
语法(见这里 ),但我无法正确使用它。
Any ideas? 有任何想法吗?
Use a negative lookahead assertion like this: 使用这样的负向前瞻断言 :
re.findall("(?!^.*This is message 12345\. Ignore\..*$).*", data)
and also enable the m
modifier, so that ^
and $
match the start and the end of a row. 并启用m
修饰符,以便^
和$
匹配行的开头和结尾。
A simpler method to consider is to convert this to a positive matching problem: 一个更简单的方法是将其转换为正匹配问题:
In general, negative matches with regexes get quite complicated. 通常,与正则表达式的否定匹配变得相当复杂。 It is usually easier and more efficient to use a positive match to find the things you don't want, and then exclude those things with programming logic. 使用肯定匹配来查找不需要的内容通常更容易,更有效,然后使用编程逻辑排除这些内容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.