简体   繁体   English

计算 Python 中正则表达式匹配的出现次数

[英]Counting occurrences of Regex Matches in Python

I have a log checker that is using regex to find the IP from lines in a logfile.我有一个日志检查器,它使用正则表达式从日志文件中的行中查找 IP。 I want to find the lines, and count the total occurrences of those same lines, matched via IP我想找到这些行,并计算通过 IP 匹配的相同行的总出现次数

The goal is to generate statistics from the events, based on their IP.目标是根据事件的 IP 生成事件的统计数据。 example:例子:

WARNING - 192.168.1.1 TIMING OUT    
WARNING - 192.168.1.5 TIMING OUT    
WARNING - 192.168.1.1 TIMING OUT    
WARNING - 192.168.1.5 TIMING OUT    
WARNING - 192.168.1.1 TIMING OUT    
WARNING - 10.1.1.1 TIMING OUT    
WARNING - 10.72.3.1 TIMING OUT    

192.168.1.1 - 3 EVENTS    
192.168.1.5 - 2 EVENTS    
10.1.1.1 - 1 EVENT    
10.72.3.1 - 1 EVENT

So on and so forth.等等等等。 I'm a python novice so I'm still learning what is best suitable for this purpose.我是 python 新手,所以我仍在学习什么最适合这个目的。 As of this moment, I have the log file open, perform a for loop using the regex pattern to find the IP's in each line but from there I'm a bit lost.到目前为止,我打开了日志文件,使用正则表达式模式执行 for 循环以查找每一行中的 IP,但从那里我有点迷失了。 Cheers.干杯。

You could use re.findall here to capture all IP address events, then use a map to tally the number of occurrences:您可以在此处使用re.findall来捕获所有 IP 地址事件,然后使用地图来统计出现次数:

inp = """WARNING - 192.168.1.1 TIMING OUT    
WARNING - 192.168.1.5 TIMING OUT    
WARNING - 192.168.1.1 TIMING OUT    
WARNING - 192.168.1.5 TIMING OUT    
WARNING - 192.168.1.1 TIMING OUT    
WARNING - 10.1.1.1 TIMING OUT    
WARNING - 10.72.3.1 TIMING OUT"""

matches = re.findall(r'\bWARNING - (\b\d+\.\d+\.\d+\.\d+\b)', inp)
d = {}

for elem in matches:
    try:
        val = d.get(elem) or 0
        d[elem] = val + 1
    except KeyError:
        d[elem] = d[elem]

print(d)

This prints:这打印:

{'10.1.1.1': 1, '192.168.1.5': 2, '10.72.3.1': 1, '192.168.1.1': 3}

Below is a modified variant of my answer at https://stackoverflow.com/a/64220148/6632736 .以下是我在https://stackoverflow.com/a/64220148/6632736 上的答案的修改变体。

It is assumed that the log is in a file, which is read line by line.假设日志在一个文件中,该文件是逐行读取的。

#!/usr/bin/python
import os
import re

def increment(ips: dict, line: str):
    match = re.match(r'^.+?\s+-\s+(?P<ip>\d{1,3}(\.\d{1,3}){3})\s.*$', line)
    if match:
        ip = match.group('ip')
        if not ip in ips:
            ips[ip] = 0
        ips[ip] += 1

def parse_log_file(log: str) -> dict:
    ips = dict()
    with open(log, 'r') as file:
        for line in file:
            increment(ips, line)
    return ips

# log is the path to the log file:
for key, value in parse_log_file(log).items():
    print(key, ":", value)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM