Given this kind of input:
.-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
|
| server = 127.0.0.1/502
| os = ???
| dist = 0
| params = none
| raw_sig = 4:64+0:0:0:32768,0:::0
|
`----
.-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
|
| server = 127.0.0.1/502
| os = ???
| dist = 0
| params = none
| raw_sig = 4:64+0:0:0:32768,0:::0
|
`----
...
I'm trying use regex to get the value of all the os
in the output (there will be hundreds).
I've tried this:
import os, subprocess, re
dir = '/home/user/Documents/ics-passif-asset-enumeration/pcap/'
for filename in os.listdir(dir):
inp = '...'
match = re.match( r'(.*)os(.*)\n(.*)', inp )
print match.group(1)
But match is a NoneType
. Never really played with regex before and I'm a bit lost.
Edit:
The expected output is a list of all the os
values. In this case it would be:
???
???
I hope this is what you are looking for
>>> import re
>>> string = """.-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
... |
... | server = 127.0.0.1/502
... | os = ???
... | dist = 0
... | params = none
... | raw_sig = 4:64+0:0:0:32768,0:::0
... |
... `----"""
>>> match = re.match( r'(.*)os\s*=(.*?)\n', string, re.DOTALL)
>>> match.group(2)
' ???'
Changes made
re.DOTALL
This flag is required so that you are trying to match multiline inputs.
os\\s*=(.*?)
\\s*=
The =
and spaces are made out of the capture group since we are not interested in them.
(.*?)
The ?
makes it non greedy so that it matches till the end of the first line
match.group(2)
it is the second match group not the first.
A better and short solution
You can use the re.findall()
with slighter different regex
os\s*=(.*)
Test
>>> string = """.-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
... |
... | server = 127.0.0.1/502
... | os = ???
... | dist = 0
... | params = none
... | raw_sig = 4:64+0:0:0:32768,0:::0
... |
... `----
...
... .-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
... |
... | server = 127.0.0.1/502
... | os = ???
... | dist = 0
... | params = none
... | raw_sig = 4:64+0:0:0:32768,0:::0
... |
... `----
... ..."""
>>> re.findall(r"os\s*=(.*)", string)
[' ???', ' ???']
re.findall
will return an array of results! Fantastic! Assuming the format of your input is pretty consistent, this should work like a charm:
>>> inp = '''
... .-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
... |
... | server = 127.0.0.1/502
... | os = ???
... | dist = 0
... | params = none
... | raw_sig = 4:64+0:0:0:32768,0:::0
... |
... `----
...
... .-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
... |
... | server = 127.0.0.1/502
... | os = ???
... | dist = 0
... | params = none
... | raw_sig = 4:64+0:0:0:32768,0:::0
... |
... `----
... ...
... '''
>>> re.findall(r'^| os\s+= (.*)$', inp, flags=re.MULTILINE)
['???', '???']
I agree with the idea that the format should be strict to ensure that the string won't appear somewhere else. If this all came from a script then the strictness shouldn't be a problem (you'd hope). If it was via manual entry... I'd be surprised.
为了使点运算符(。)匹配换行符,请在匹配调用中添加一个标志:
match = re.match( r'(.*)os(.*)\n(.*)', inp, flags=re.DOTALL )
If I understand what you wished (and assuming your input is what you copied here (multiline, multientry) this regex should do with modifier gm
to match all and let ^ and $ match respectively start and end of line:
^|\\s*os\\s*=\\s*(.*)$
Demo Here
You may try use findall() method:
for filename in os.listdir(dir):
inp = '...'
match = re.findall('os(.*)\n', inp)
print match
As @Tensibai says, you're probably best to use ^
and $
to match the start and end of the line, and a very specific pattern (as he gives) to make sure that the string "os" is not matched somewhere else, like within a hostname for example.
To directly find all of the matching "os = " lines, use re.findall( r'^|\\s*os\\s*=\\s*(.*)$', inp, re.MULTILINE )
, which returns a list of the matching os values.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.