简体   繁体   English

Python解析IP地址和协议的日志文件

[英]Python parsing log file for IP address and Protocol

this is my first question asked here at stackoverflow and am really looking forward to being part of this community. 这是我在stackoverflow上提出的第一个问题,我真的很期待成为这个社区的一员。 I am new to program and python was the most recommended first program by many people. 我是程序新手,python是很多人推荐的第一个程序。

Anyways . 无论如何 I have a log file which looks like this: 我有一个日志文件,如下所示:

"No.","Time","Source","Destination","Protocol","Info"
"1","0.000000","120.107.103.180","172.16.112.50","TELNET","Telnet Data ..." 
"2","0.000426","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..." 
"3","0.019849","172.16.113.168","172.16.112.50","TCP","21582 > telnet [ACK]" 
"4","0.530125","172.16.113.168","172.16.112.50","TELNET","Telnet Data ..." 
"5","0.530634","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..."

And I wanted to parse the log file using Python to make it look like this as the result: 我想用Python解析日志文件,使其看起来如下:

From IP 135.13.216.191 Protocol Count: (IMF 1) (SMTP 38) (TCP 24) (Total: 63) 来自IP 135.13.216.191协议计数:(IMF 1)(SMTP 38)(TCP 24)(总计:63)

I would really like some help on what path to take to tackle this problem should I use lists and loop through it or dictionaries/tuples? 如果我使用列表并循环遍历它或词典/元组,我真的想要一些帮助解决这个问题的路径?

Thanks in advance for your help! 在此先感谢您的帮助!

You can parse the file using the csv module : 您可以使用csv模块解析文件:

import csv

with open('logfile.txt') as logfile:
     for row in csv.reader(logfile):
         no, time, source, dest, protocol, info = row
         # do stuff with these

I can't quite tell what you're asking, but I think you want: 我不能完全说出你在问什么,但我想你想要:

import csv
from collections import defaultdict

# A dictionary whose values are by default (a
# dictionary whose values are by default 0)
bySource = defaultdict(lambda: defaultdict(lambda: 0))

with open('logfile.txt') as logfile:
     for row in csv.DictReader(logfile):
         bySource[row["Source"]][row["Protocol"]] += 1

for source, protocols in bySource.iteritems():
    protocols['Total'] = sum(protocols.values())

    print "From IP %s Protocol Count: %s" % (
        source,
        ' '.join("(%s: %d)" % item for item in protocols.iteritems())
    )

I would begin by first reading the file into a list: 我首先将文件读入列表:

contents = []
with open("file_path") as f:
    contents = f.readlines()

Then you can split each line into a list of it's own: 然后你可以将每一行拆分成一个自己的列表:

ips = [l[1:-1].split('","') for l in contents]

We can then map these into a dict: 然后我们可以将这些映射到一个字典:

sourceIps = {}
for ip in ips:
    try:
       sourceIps[ip[2]].append(ip)
    except:
       sourceIps[ip[2]] = [ip]

And finally print out the result: 最后打印出结果:

for ip, stuff in sourceIps.iteritems():
   print "From {0} ... ".format(ip, ...)

First you'll want to read in the text file 首先,您需要阅读文本文件

# Open the file
file = open('log_file.csv')
# readlines() will return the data as a list of strings, one for each line
log_data = file.readlines()
# close the log file
file.close()

Set up a dictionary to hold your results 设置字典以保存结果

results = {}

Now iterate over your data, one line at a time, and record the protocol in the dictionary 现在迭代您的数据,一次一行,并在字典中记录协议

for entry in log_data:
    entry_data = entry.split(',')
    # We are going to have a separate entry for each source ip
    # If we haven't already seen this ip, we need to make an entry for it
    if entry_data[2] not in results:
        results[entry_data[2]] = {'total':0}
    # Now check to see if we've seen the protocol for this ip before
    # If we haven't, add a new entry set to 0
    if entry_data[4] not in results[entry_data[2]]:
         results[entry_data[2]][entry_data[4]] = 0
    # Now we increment the count for this protocol
    results[entry_data[2]][entry_data[4]] += 1
    # And we increment the total count
    results[entry_data[2]]['total'] += 1

Once you've counted everything, just iterate over your counts and print out the results 一旦你计算了所有内容,只需重复计算并打印出结果

for ip in results:
    # Here we're printing a string with placeholders. the {0}, {1} and {2} will be filled
    # in by the call to format
    print "from: IP {0} Protocol Count: {1})".format(
        ip,
        # And finally create the value for the protocol counts with another format call
        # The square braces with the for statement inside create a list with one entry
        # for each entry, in this case, one entry for each protocol
        # We use ' '.join to join each of the counts with a string
        ' '.join(["({0}: {1})".format(protocol, results[ip][protocol] for protocol in results[ip])]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM