![](/img/trans.png)
[英]How to parse for specific unique values from a JSON lines file with Python and store into an array
[英]Parse Log lines and save unique IPs as JSON blobs
我正在尝试解析包含 SSH 日志的文本文件。 示例日志行如下所示:
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.173] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
日志行中会有重复的 IP。
我的用例是解析每个日志行并获取唯一的 IP 地址并生成 JSON blob 作为 output。 最终的 JSON 文件应该只有唯一的 IP 地址的 JSON 结果。
我可以使用 Python 实现这一目标的可能方法是什么。
我认为使用ipaddress python 标准库来回答这个问题会很好。
import sys
import json
import re
from ipaddress import IPv4Address, AddressValueError
ipre = re.compile(r'\d+\.\d+\.\d+\.\d+')
with open(sys.argv[1]) as fin:
data = fin.read()
ips = []
for ip in ipre.findall(data):
# Validate found IP addresses
try:
ips.append(str(IPv4Address(ip)))
except AddressValueError as e:
print(f"IP address '{ip}' is invalid: {e}")
print(json.dumps(list(set(ips))))
这将为您提供经过验证的 IP 地址的唯一列表,其中包含 JSON 格式的列表。
输入文件的第一行显示了无效的 IP 地址。
输入
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,500.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.173] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
Output
IP address '500.188.86.172' is invalid: Octet 500 (> 255) not permitted in '500.188.86.172'
["5.188.86.172", "5.188.86.173"]
您是否尝试读取 IP 并创建一个集合? 而不是基于它制作与json相关的东西? 根据定义,集合具有唯一值
没有正则表达式的简单解决方案:
logs = """
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.173] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
"""
logs_to_list = logs.split("\n")
all_ips = []
for log in logs_to_list:
for item in log.split():
if "HoneyPotSSHTransport" in item:
all_ips.append(item.split(",")[2].replace("]",""))
print(set(all_ips))
你会得到:
Output: {'5.188.86.172', '5.188.86.173'}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.