简体   繁体   中英

Regex to get word before and after given word

Can someone please help with regex pattern for below string in Python? I have .log file and I want to find below line from string I have to get user and ip.

I want regex that can get me one word before from and one after from .

Failed password for root from 123.183.209.132 port 39706 ssh2

I want root and 123.183.209.132 from above string

Failed password for invalid user packer from 13.82.211.217 port 45832 ssh2

I want packer and 13.82.211.217 from above string

reverse mapping checking getaddrinfo for undefined.datagroup.ua
[93.183.207.5] failed - POSSIBLE BREAK-IN ATTEMPT!

reverse mapping checking getaddrinfo for nsg-static-226.127.71.182.airtel.in [182.71.127.226] failed - POSSIBLE BREAK-IN ATTEMPT!

reverse mapping checking getaddrinfo for 179.185.44.168.static.gvt.net.br [179.185.44.168] failed - POSSIBLE BREAK-IN ATTEMPT!

I want undefined.datagroup.ua and 93.183.207.5 from(new regex).

My working code.

def parse(filename, date=None):
    try:
        # string = 'Failed password for ([a-z]*|[a-z]* [a-z]* [a-z]*) from '
        string = 'Failed password for ([a-z]*|[a-z]* [a-z]* [a-z]*) from [0-9]+(?:\.[0-9]+){3}'
        # string_sub = 'for (?<user>[a-zA-Z\.]+).*?(?<ip>(?:\d{1,3}\.){3}\d{1,3})'
        # string_re = re.compile(r"^[^ ]+ - (C[^ ]*) \[([^ ]+)").match
        match_list =[]
        with open(filename, 'r') as file:
            for line in file:
                for match in re.finditer(string, line, re.S):
                    match_text = match.group()
                    user_ip = re.search(r'Failed password for .*?(\w+) from (\d+(?:\.\d+){3})', match_text)
                    user = user_ip.groups()[0]
        print(user)
    except KeyError as e:
        msg="key %s is missing" % str(e)
        return msg
    except Exception as e:
        return str(e)

I'm stuck with regex.

Regex may be overkill for your use case... Did you try simpler things, like this, for instance:

s1 = "Failed password for root from 123.183.209.132 port 39706 ssh2"
s2 = "Failed password for invalid user packer from 13.82.211.217 port 45832 ssh2"

parsed = s1.split('from',1)
user = parsed[0].split()[-1]
ip = parsed[1].split()[0]

print(f'User is {user} and IP is {ip}')

If I understand correctly you basically want the word (username) after for and the ip of that line? If that's the case, how about:

for (?<user>[a-zA-Z\.]+).*?(?<ip>(?:\d{1,3}\.){3}\d{1,3})

https://regex101.com/r/aojbyS/1 . Granted, this is a short-hand form for an IP, but to make it more correct you should use a proper ipv4 regex .

Additionally, in your question, you don't say what should be captured from the following, which might modify the above regex.

Failed password for invalid user packer from 13.82.211.217 port 45832 ssh2.
import re

inp = [
    'Failed password for root from 123.183.209.132 port 39706 ssh2',
    'Failed password for invalid user packer from 13.82.211.217 port 45832 ssh2',
    '''reverse mapping checking getaddrinfo for undefined.datagroup.ua
[93.183.207.5] failed - POSSIBLE BREAK-IN ATTEMPT!''',
]
for s in inp:
    result = re.search(r'(?:Failed password|reverse mapping.+?) for .*?([\w.]+)\s+(?:from |\[)(\d+(?:\.\d+){3})', s)
    print result.groups()

Output:

('root', '123.183.209.132')
('packer', '13.82.211.217')
('undefined.datagroup.ua', '93.183.207.5')

Explanation:

(?:                     # non capture group
    Failed password     # literally
  |                   # OR
    reverse mapping     # literally
    .+?                 # 1 or more any character, not greedy
)                       # end group
 for                    # literally
 .*?                    # 0 or more any character
 ([\w.]+)               # group 1, 1 or more word character or dot
 \s+                    # 1 or more spaces
 (?:from |\[)           # non capture group, from OR opening square bracket
(\d+(?:\.\d+){3})       # group 2, IP

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM