简体   繁体   English

Python正则表达式:如何匹配不包含* exact *句子的字符串?

[英]Python regex: how to match strings that DO NOT contain an *exact* sentence?

I want to filter out messages from a log file that contain eg the sentence This is message 12345. Ignore. 我想从日志文件中过滤出包含例如句子This is message 12345. Ignore.

If I would use grep, I could simple pass the sentence and use the -v switch, for example: 如果我使用grep,我可以简单地传递句子并使用-v开关,例如:

grep -v "This is message 12345\. Ignore\." data.log

The thing is, I have to do this in Python. 问题是,我必须在Python中执行此操作。 Something like: 就像是:

import re
with open("data.log") as f:
    data = f.read()
# This will select all lines that match the given sentence
re.findall(".*This is message 12345\. Ignore\..*$", data)

# HERE --> I would like to select lines that DO NOT match that sentence
# ???

I've tried to use (?...) and [^...] syntax (see here ), but I couldn't get it right. 我试过使用(?...)[^...]语法(见这里 ),但我无法正确使用它。

Any ideas? 有任何想法吗?

Use a negative lookahead assertion like this: 使用这样的负向前瞻断言

re.findall("(?!^.*This is message 12345\. Ignore\..*$).*", data)

and also enable the m modifier, so that ^ and $ match the start and the end of a row. 并启用m修饰符,以便^$匹配行的开头和结尾。

A simpler method to consider is to convert this to a positive matching problem: 一个更简单的方法是将其转换为正匹配问题:

  • Go through the file line by line 逐行浏览文件
  • Perform a positive regex on the line, and if it matches, discard the line. 在该行上执行正面正则表达式,如果匹配,则丢弃该行。

In general, negative matches with regexes get quite complicated. 通常,与正则表达式的否定匹配变得相当复杂。 It is usually easier and more efficient to use a positive match to find the things you don't want, and then exclude those things with programming logic. 使用肯定匹配来查找不需要的内容通常更容易,更有效,然后使用编程逻辑排除这些内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM