简体   繁体   中英

Parse Log lines and save unique IPs as JSON blobs

I am trying to parse text files containing SSH logs. Example log lines look like:

6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.173] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded

There would be duplicate IPs in the log lines.

My use-case is to parse each of the log lines and fetch the unique IP addresses and generating a JSON blob as output. The final JSON file should only have unique IP addresses's JSON results.

What are the possible ways I can achieve this using Python.

I thought it would be neat to answer this question using the ipaddress python standard library.

import sys
import json
import re

from ipaddress import IPv4Address, AddressValueError

ipre = re.compile(r'\d+\.\d+\.\d+\.\d+')

with open(sys.argv[1]) as fin:
    data = fin.read()

ips = []

for ip in ipre.findall(data):

    # Validate found IP addresses
    try:
        ips.append(str(IPv4Address(ip)))
    except AddressValueError as e:
        print(f"IP address '{ip}' is invalid: {e}")

print(json.dumps(list(set(ips))))

This will give you a unique list of validated IP addresses with a list in JSON format.

The first line in the input file demonstrates an invalid IP address.

Input

6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,500.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.173] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded

Output

IP address '500.188.86.172' is invalid: Octet 500 (> 255) not permitted in '500.188.86.172'
["5.188.86.172", "5.188.86.173"]

did you try to read the IP's and create a set? than make a json-related stuff based on it? the set has unique values by definition

To extract the text and get the IP address, you can use split() method or use a regular expression . Then, add the IP address to a Set() . When you use a Set() , there will be no duplicates. Finally, convert the Set() into a list variable and save as a JSON File .

A simple solution without regex:

logs = """
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.173] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
6T08:07:19.052699Z [SSHService b'ssh-userauth' on HoneyPotSSHTransport,666,5.188.86.172] login attempt [b'root'/b'admin'] succeeded
"""

logs_to_list = logs.split("\n")
all_ips = []
for log in logs_to_list:
    for item in log.split():
        if "HoneyPotSSHTransport" in item:
            all_ips.append(item.split(",")[2].replace("]",""))

print(set(all_ips))

And you will get:

Output: {'5.188.86.172', '5.188.86.173'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM