简体   繁体   中英

Regex with end of line in group

Given this kind of input:

.-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
|
| server   = 127.0.0.1/502
| os       = ???
| dist     = 0
| params   = none
| raw_sig  = 4:64+0:0:0:32768,0:::0
|
`----

.-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
|
| server   = 127.0.0.1/502
| os       = ???
| dist     = 0
| params   = none
| raw_sig  = 4:64+0:0:0:32768,0:::0
|
`----
...

I'm trying use regex to get the value of all the os in the output (there will be hundreds).

I've tried this:

import os, subprocess, re

dir = '/home/user/Documents/ics-passif-asset-enumeration/pcap/'

for filename in os.listdir(dir):
    inp = '...'
    match = re.match( r'(.*)os(.*)\n(.*)', inp  )
    print match.group(1)

But match is a NoneType . Never really played with regex before and I'm a bit lost.

Edit:

The expected output is a list of all the os values. In this case it would be:

???
???

I hope this is what you are looking for

>>> import re
>>> string = """.-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
... |
... | server   = 127.0.0.1/502
... | os       = ???
... | dist     = 0
... | params   = none
... | raw_sig  = 4:64+0:0:0:32768,0:::0
... |
... `----"""
>>> match = re.match( r'(.*)os\s*=(.*?)\n', string, re.DOTALL)
>>> match.group(2)
' ???'

Changes made

  • re.DOTALL This flag is required so that you are trying to match multiline inputs.

  • os\\s*=(.*?)

    • \\s*= The = and spaces are made out of the capture group since we are not interested in them.

    • (.*?) The ? makes it non greedy so that it matches till the end of the first line

  • match.group(2) it is the second match group not the first.


A better and short solution

You can use the re.findall() with slighter different regex

os\s*=(.*)

Test

>>> string = """.-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
... |
... | server   = 127.0.0.1/502
... | os       = ???
... | dist     = 0
... | params   = none
... | raw_sig  = 4:64+0:0:0:32768,0:::0
... |
... `----
... 
... .-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
... |
... | server   = 127.0.0.1/502
... | os       = ???
... | dist     = 0
... | params   = none
... | raw_sig  = 4:64+0:0:0:32768,0:::0
... |
... `----
... ..."""

>>> re.findall(r"os\s*=(.*)", string)
[' ???', ' ???']

re.findall will return an array of results! Fantastic! Assuming the format of your input is pretty consistent, this should work like a charm:

>>> inp = '''
... .-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
... |
... | server   = 127.0.0.1/502
... | os       = ???
... | dist     = 0
... | params   = none
... | raw_sig  = 4:64+0:0:0:32768,0:::0
... |
... `----
... 
... .-[ 127.0.0.1/44963 -> 127.0.0.1/502 (syn+ack) ]-
... |
... | server   = 127.0.0.1/502
... | os       = ???
... | dist     = 0
... | params   = none
... | raw_sig  = 4:64+0:0:0:32768,0:::0
... |
... `----
... ...
... '''
>>> re.findall(r'^| os\s+= (.*)$', inp, flags=re.MULTILINE)
['???', '???']

I agree with the idea that the format should be strict to ensure that the string won't appear somewhere else. If this all came from a script then the strictness shouldn't be a problem (you'd hope). If it was via manual entry... I'd be surprised.

为了使点运算符(。)匹配换行符,请在匹配调用中添加一个标志:

match = re.match( r'(.*)os(.*)\n(.*)', inp, flags=re.DOTALL  )

If I understand what you wished (and assuming your input is what you copied here (multiline, multientry) this regex should do with modifier gm to match all and let ^ and $ match respectively start and end of line:

^|\\s*os\\s*=\\s*(.*)$

Demo Here

You may try use findall() method:

for filename in os.listdir(dir):
    inp = '...'
    match = re.findall('os(.*)\n', inp)
    print match

As @Tensibai says, you're probably best to use ^ and $ to match the start and end of the line, and a very specific pattern (as he gives) to make sure that the string "os" is not matched somewhere else, like within a hostname for example.

To directly find all of the matching "os = " lines, use re.findall( r'^|\\s*os\\s*=\\s*(.*)$', inp, re.MULTILINE ) , which returns a list of the matching os values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM