I am trying to parse text files containing SSH logs. Example log lines look like:
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.173] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
There would be duplicate IPs in the log lines.
My use-case is to parse each of the log lines and fetch the unique IP addresses and generating a JSON blob as output. The final JSON file should only have unique IP addresses's JSON results.
What are the possible ways I can achieve this using Python.
I thought it would be neat to answer this question using the ipaddress python standard library.
import sys
import json
import re
from ipaddress import IPv4Address, AddressValueError
ipre = re.compile(r'\d+\.\d+\.\d+\.\d+')
with open(sys.argv[1]) as fin:
data = fin.read()
ips = []
for ip in ipre.findall(data):
# Validate found IP addresses
try:
ips.append(str(IPv4Address(ip)))
except AddressValueError as e:
print(f"IP address '{ip}' is invalid: {e}")
print(json.dumps(list(set(ips))))
This will give you a unique list of validated IP addresses with a list in JSON format.
The first line in the input file demonstrates an invalid IP address.
Input
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,500.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.173] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
Output
IP address '500.188.86.172' is invalid: Octet 500 (> 255) not permitted in '500.188.86.172'
["5.188.86.172", "5.188.86.173"]
did you try to read the IP's and create a set? than make a json-related stuff based on it? the set has unique values by definition
To extract the text and get the IP address, you can use split()
method or use a regular expression . Then, add the IP address to a Set()
. When you use a Set()
, there will be no duplicates. Finally, convert the Set()
into a list variable and save as a JSON File .
A simple solution without regex:
logs = """
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.173] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
"""
logs_to_list = logs.split("\n")
all_ips = []
for log in logs_to_list:
for item in log.split():
if "HoneyPotSSHTransport" in item:
all_ips.append(item.split(",")[2].replace("]",""))
print(set(all_ips))
And you will get:
Output: {'5.188.86.172', '5.188.86.173'}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.