简体   繁体   English

使用 RegEx 逐行遍历日志文件以查找 IP 地址模式。 每个 IP 都需要使用 extend 方法将每个 ip 添加到列表 IPS

[英]Iterate through log file line by line looking for IP address pattern with RegEx. Each IP will need to use the extend method to add each ip to list IPS

I am working on a script and trying to iterate through a log file line by line using regex to look for IP address pattern and then use the extend method to add each ip to the list ips.我正在编写一个脚本并尝试使用正则表达式逐行遍历日志文件以查找 IP 地址模式,然后使用扩展方法将每个 ip 添加到列表 ips 中。 I believe that I have got everything right up until the point of iteration and then trying to print(ips), as you can see in the script.正如您在脚本中看到的那样,我相信在迭代之前我已经做好了一切,然后尝试打印(ips)。

    import urllib.request
    import json
    import datetime
    import os
    import re 
    import azuremaps

3 empty lists I have created to store data found我创建了 3 个空列表来存储找到的数据

     ips = []
     unique_ips = []
     toJson = []

Log file I have opened file = open('logs/access.log', 'r')我打开的日志文件 file = open('logs/access.log', 'r')

This might be where I have messed up trying to use regex to iterate through the log file line by line to get ip addresses and then using the extend method to store those in list ips.这可能是我试图使用正则表达式逐行遍历日志文件以获取 ip 地址然后使用扩展方法将这些地址存储在列表 ips 中的地方。 Would like this code to be less than 5 lines.希望此代码少于 5 行。

    pattern = re.compiler(r'(\d{1,3}\.\d{1,3}\.\d{1,3} \.\d{1,3})')
    for line in file:
    ips.extend(pattern.search(line)[0])
    print(ips)

New list populated removing all duplicates.新列表填充删除所有重复。

    unique_ips = list(set(ips))

Before I move forward I need to validate my lists, however when I type print(ips) in the terminal I get bash:syntax error near unexpected token 'ips'在我继续之前,我需要验证我的列表,但是当我在终端中键入 print(ips) 时,我得到 bash:syntax error near unexpected token 'ips'

    #print(ips)
    #print(len(ips))
    #print(len(unique_ips))
    #print(unique_ips)

Is there any reason to do it line-by-line?有什么理由逐行进行吗?

Assuming the access.log file as:假设 access.log 文件为:

43.53.250.2
65.66.66.69
noise234.85.98.12something
whatever65.66.66.69

I think you can try this:我想你可以试试这个:

import re

with open('logs/access.log', 'r') as file:
    file = file.read()

pattern = re.compile(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
ips = pattern.findall(file)
unique_ips = list(set(ips))

print(unique_ips)
['234.85.98.12', '65.66.66.69', '43.53.250.2']

Notes regarding your code:关于您的代码的注释:

  • I believe it should be re.compile instead of re.compiler我相信它应该是 re.compile 而不是 re.compiler
  • It looks like there's an extra space in your regex string看起来你的正则表达式字符串中有一个额外的空间
  • I think you can just use append since you're trying to extend your list with one element at a time anyway.我认为您可以只使用append ,因为无论如何您都试图一次使用一个元素来extend您的列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM