简体   繁体   中英

Python Regex Grouping finditer

Input: 146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622

Expexcted Ouput:

example_dict = {"host":"146.204.224.152", "user_name":"feest6811","time":"21/Jun/2019:15:45:24 -0700",
"request":"POST /incentivize HTTP/1.1"}

My code is working for grouping individually, eg:

for item in re.finditer('(?P<host>\d*\.\d*\.\d*.\d*)',logdata):
        print(item.groupdict())

Output: {'host': '146.204.224.152'}

But I am not getting the output by combining every group. Below is my code:

for item in re.finditer('(?P<host>\d*\.\d*\.\d*.\d*)(?P<user_name>(?<=-\s)[\w]+\d)(?P<time>(?<=\[).+(?=]))(?P<request>(?<=").+(?="))',logdata):
           print(item.groupdict())

I might simplify your regex pattern and just use re.findall here:

inp = '146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622'
matches = re.findall(r'(\d+\.\d+\.\d+\.\d+) - (\S+) \[(.*?)\] "(.*?)"', inp)
print(matches)

This will generate a list of tuples containing the four captured terms you want:

[('146.204.224.152', 'feest6811', '21/Jun/2019:15:45:24 -0700', 'POST /incentivize HTTP/1.1')]

If you paste two regular expressions back-to-back, they will only match text back-to-back. For example, if you combine a and b , the regular expression ab will match the text ab , but not acb .

Your combined regex suffers from this problem; you have melded together regular expressions which apparently work fine in isolation, but they didn't match immediately adjacent strings, so you have to add some padding to cover the intervening substrings in the input.

Here's a slightly refactored version with adaptations to add padding, and also a few routine fixes to avoid common regex beginner mistakes.

for item in re.finditer(r'''
        (?P<host>\d+\.\d+\.\d+.\d+)
        (?:[-\s]+)
        (?P<user_name>\w+\d)
        (?:[^[]+\[)
        (?P<time>[^]]+)
        (?:\][^]"]+")
        (?P<request>[^"]+)''',
        logdata, re.VERBOSE):
    print(item.groupdict())

Demo: https://ideone.com/BsNLG7

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM