简体   繁体   中英

Matching pattern in Python

I had a directory "/pcap_test" which contains several log files. Each file has a pattern like:

Pkt: 1 (358 bytes), LIFE: 1, App: itunes (INTO), State: TERMINATED, Stack: /ETH/IP/UDP/itunes, Error: None

Pkt: 2 (69 bytes), LIFE: 2, App: zynga (INTO), State: INSPECTING, Stack: /ETH/IP/UDP, Error: None

Pkt: 3 (149 bytes), LIFE: 2, App: pizzeria (INTO), State: TERMINATED, Stack: /ETH/IP/UDP/pizzeria, Error: None

In this case I want the output to be the second line because the content in the "App" is not present in the "Stack: "

I wrote a small Python script to iterate through the directory, open each file and print the output:

import os
list = os.listdir("/home/test/Downloads/pcap_test")
print list
for infile in list:
  infile = os.path.join("/home/test/Downloads/pcap_test" , infile)

if os.path.isfile(infile):
str = file(infile, 'r').read()
print str

I somehow got the output using grep but unable to use the same in the python script. Its something like:

grep -vP 'App: ([^, ]*) \(INTO\).*Stack: .*\1.*$' xyz.pcap.log | grep -P 'App: ([^, ]*) \(INTO\)'

Since I already have the file named "str" , I want to use that rather than individual log files, to get the output.

Any help in this regard will be highly appreciated.

First, I'd advise against variable names like str as that's Python's name for the String primitive data type.

Since grep is a command-line regular expression tool and since you already have a working regular expression all you need to do is learn to use Python's re module .

What's a little difficult is capturing grep's -v behaviour. I suggest reading the file line by line and printing the line only if it does not match your first regular expression but does match the second, like so:

if os.path.isfile(infile):
    with file(infile, 'r') as logFile: #this will close the file pointer automatically when you finish
        for line in logFile: #read logFile one line at a time
            firstReMatch = re.match(r'App: ([^, ]*) \(INTO\).*Stack: .*\1.*$', line) #check if this line matches your first regex
            secondReMatch = re.match(r'App: ([^, ]*) \(INTO\)', line) #check if this line matched your second regex
            if secondReMatch and not firstReMatch: #"not" to capture the inverse match
                print line #print the line.

Depending on your data, you may want to use re.search() instead of re.match()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM