I'm new to python and have been going through some tutorials on log parsing with regular expressions. In the code below I am able to parse a log and create a file with remote IP's making a connection to the server. I'm missing the piece that will eliminate duplicate IP's in the out.txt file created. Thanks
import re
import sys
infile = open("/var/log/user.log","r")
outfile = open("/var/log/intruders.txt","w")
pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
regexp = re.compile(pattern, re.VERBOSE)
for line in infile:
result = regexp.search(line)
if result:
outfile.write("%s\n" % (result.group()))
infile.close()
outfile.close()
You can save the results seen so far in a set() and then only write-out results that have not yet been seen. This logic is easy to add to your existing code:
import re
import sys
seen = set()
infile = open("/var/log/user.log","r")
outfile = open("/var/log/intruders.txt","w")
pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
regexp = re.compile(pattern, re.VERBOSE)
for line in infile:
mo = regexp.search(line)
if mo is not None:
ip_addr = mo.group()
if ip_addr not in seen:
seen.add(ip_addr)
outfile.write("%s\n" % ip_addr)
infile.close()
outfile.close()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.