this is my first question asked here at stackoverflow and am really looking forward to being part of this community. I am new to program and python was the most recommended first program by many people.
Anyways . I have a log file which looks like this:
"No.","Time","Source","Destination","Protocol","Info"
"1","0.000000","120.107.103.180","172.16.112.50","TELNET","Telnet Data ..."
"2","0.000426","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..."
"3","0.019849","172.16.113.168","172.16.112.50","TCP","21582 > telnet [ACK]"
"4","0.530125","172.16.113.168","172.16.112.50","TELNET","Telnet Data ..."
"5","0.530634","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..."
And I wanted to parse the log file using Python to make it look like this as the result:
From IP 135.13.216.191 Protocol Count: (IMF 1) (SMTP 38) (TCP 24) (Total: 63)
I would really like some help on what path to take to tackle this problem should I use lists and loop through it or dictionaries/tuples?
Thanks in advance for your help!
You can parse the file using the csv
module :
import csv
with open('logfile.txt') as logfile:
for row in csv.reader(logfile):
no, time, source, dest, protocol, info = row
# do stuff with these
I can't quite tell what you're asking, but I think you want:
import csv
from collections import defaultdict
# A dictionary whose values are by default (a
# dictionary whose values are by default 0)
bySource = defaultdict(lambda: defaultdict(lambda: 0))
with open('logfile.txt') as logfile:
for row in csv.DictReader(logfile):
bySource[row["Source"]][row["Protocol"]] += 1
for source, protocols in bySource.iteritems():
protocols['Total'] = sum(protocols.values())
print "From IP %s Protocol Count: %s" % (
source,
' '.join("(%s: %d)" % item for item in protocols.iteritems())
)
I would begin by first reading the file into a list:
contents = []
with open("file_path") as f:
contents = f.readlines()
Then you can split each line into a list of it's own:
ips = [l[1:-1].split('","') for l in contents]
We can then map these into a dict:
sourceIps = {}
for ip in ips:
try:
sourceIps[ip[2]].append(ip)
except:
sourceIps[ip[2]] = [ip]
And finally print out the result:
for ip, stuff in sourceIps.iteritems():
print "From {0} ... ".format(ip, ...)
First you'll want to read in the text file
# Open the file
file = open('log_file.csv')
# readlines() will return the data as a list of strings, one for each line
log_data = file.readlines()
# close the log file
file.close()
Set up a dictionary to hold your results
results = {}
Now iterate over your data, one line at a time, and record the protocol in the dictionary
for entry in log_data:
entry_data = entry.split(',')
# We are going to have a separate entry for each source ip
# If we haven't already seen this ip, we need to make an entry for it
if entry_data[2] not in results:
results[entry_data[2]] = {'total':0}
# Now check to see if we've seen the protocol for this ip before
# If we haven't, add a new entry set to 0
if entry_data[4] not in results[entry_data[2]]:
results[entry_data[2]][entry_data[4]] = 0
# Now we increment the count for this protocol
results[entry_data[2]][entry_data[4]] += 1
# And we increment the total count
results[entry_data[2]]['total'] += 1
Once you've counted everything, just iterate over your counts and print out the results
for ip in results:
# Here we're printing a string with placeholders. the {0}, {1} and {2} will be filled
# in by the call to format
print "from: IP {0} Protocol Count: {1})".format(
ip,
# And finally create the value for the protocol counts with another format call
# The square braces with the for statement inside create a list with one entry
# for each entry, in this case, one entry for each protocol
# We use ' '.join to join each of the counts with a string
' '.join(["({0}: {1})".format(protocol, results[ip][protocol] for protocol in results[ip])]))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.