Python中的匹配模式

Question

I had a directory "/pcap_test" which contains several log files. 我有一个目录“ / pcap_test”，其中包含几个日志文件。 Each file has a pattern like: 每个文件都有一个类似的模式：

Pkt: 1 (358 bytes), LIFE: 1, App: itunes (INTO), State: TERMINATED, Stack: /ETH/IP/UDP/itunes, Error: None 包长度：1（358字节），寿命：1，应用程序：iTunes（INTO），状态：终止，堆栈：/ ETH / IP / UDP / itunes，错误：无

Pkt: 2 (69 bytes), LIFE: 2, App: zynga (INTO), State: INSPECTING, Stack: /ETH/IP/UDP, Error: None Pkt：2（69字节），LIFE：2，App：zynga（INTO），状态：INSPECTING，堆栈：/ ETH / IP / UDP，错误：None

Pkt: 3 (149 bytes), LIFE: 2, App: pizzeria (INTO), State: TERMINATED, Stack: /ETH/IP/UDP/pizzeria, Error: None 包长度：3（149字节），寿命：2，应用程序：比萨店（INTO），状态：已终止，堆栈：/ ETH / IP / UDP / pizzeria，错误：无

In this case I want the output to be the second line because the content in the "App" is not present in the "Stack: " 在这种情况下，我希望输出是第二行，因为“应用程序”中的内容不在“堆栈：”中

I wrote a small Python script to iterate through the directory, open each file and print the output: 我写了一个小的Python脚本来遍历目录，打开每个文件并输出输出：

import os
list = os.listdir("/home/test/Downloads/pcap_test")
print list
for infile in list:
  infile = os.path.join("/home/test/Downloads/pcap_test" , infile)

if os.path.isfile(infile):
str = file(infile, 'r').read()
print str

I somehow got the output using grep but unable to use the same in the python script. 我以某种方式使用grep获得了输出，但无法在python脚本中使用相同的输出。 Its something like: 它类似于：

grep -vP 'App: ([^, ]*) \(INTO\).*Stack: .*\1.*$' xyz.pcap.log | grep -P 'App: ([^, ]*) \(INTO\)'

Since I already have the file named "str" , I want to use that rather than individual log files, to get the output. 由于我已经有了名为“ str”的文件，因此我想使用该文件而不是单个日志文件来获取输出。

Any help in this regard will be highly appreciated. 在这方面的任何帮助将不胜感激。

Answer 1

First, I'd advise against variable names like str as that's Python's name for the String primitive data type. 首先，我建议不要使用诸如str之类的变量名，因为这是String原始数据类型的Python名称。

Since grep is a command-line regular expression tool and since you already have a working regular expression all you need to do is learn to use Python's re module . 由于grep是一个命令行正则表达式工具，并且由于您已经拥有一个有效的正则表达式，因此您所要做的就是学习使用Python的re模块。

What's a little difficult is capturing grep's -v behaviour. 捕获grep的-v行为有点困难。 I suggest reading the file line by line and printing the line only if it does not match your first regular expression but does match the second, like so: 我建议逐行读取文件并仅在不匹配第一个正则表达式但匹配第二个正则表达式时打印该行，如下所示：

if os.path.isfile(infile):
    with file(infile, 'r') as logFile: #this will close the file pointer automatically when you finish
        for line in logFile: #read logFile one line at a time
            firstReMatch = re.match(r'App: ([^, ]*) \(INTO\).*Stack: .*\1.*$', line) #check if this line matches your first regex
            secondReMatch = re.match(r'App: ([^, ]*) \(INTO\)', line) #check if this line matched your second regex
            if secondReMatch and not firstReMatch: #"not" to capture the inverse match
                print line #print the line.

Depending on your data, you may want to use re.search() instead of re.match() 根据您的数据，您可能需要使用re.search()而不是re.match()

Python中的匹配模式

问题描述

1 个解决方案

解决方案1
0 2013-02-19 07:23:43

Python中的匹配模式

问题描述

1 个解决方案

解决方案1 0 2013-02-19 07:23:43

解决方案1
0 2013-02-19 07:23:43