简体   繁体   中英

Parsing large log file - Python

I have a firewall log file which looks like this:

"No.","Time","Source","Destination","Protocol","Info" "1","0.000000","172.16.113.168","172.16.112.50","TELNET","Telnet Data ..." "2","0.000426","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..." "3","0.019849","172.16.113.168","172.16.112.50","TCP","21582 > telnet [ACK] Seq=2Ack=2 Win=32120 Len=0" "4","0.530125","172.16.113.168","172.16.112.50","TELNET","Telnet Data ..." "5","0.530634","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..." "6","0.549962","172.16.113.168","172.16.112.50","TCP","21582

telnet [ACK] Seq=3 Ack=3 Win=32120 Len=0"

I want to be able to run the file by its name (I am using Linux) eg.

log1.py logfile.csv (name of the program followed by the name of the logfile) and get the following output:

$ log1.py logfile.csv Source IP Destination IP Protocol Count

  0.0.0.0 255.255.255.255 BOOTP 20 0.1.125.174 131.84.1.31 TCP 2 192.168.1.1 172.168.1.2 TCP 100 (............lots more here .....................) Oracle_89:a5:9f 3com_9c:b2:54 ARP 14 Total: 649787 

And another very useful feature I would like to have is when i run the program with a source IP address and destination IP address. I would like the output to look something similar to the following:

$ log1.py 172.16.112.50 logfile.csv

  Source IP Destination IP Protocol Count 172.16.112.50 135.13.216.191 IMF 4 SMTP 53 TCP 43 TELNET 35 (............lots more here .....................) 172.16.112.194 SMTP 7 TCP 42 TELNET 3745 Total: 38369 

And finally, I would like to be able to specify both the source IP address and destination IP and adress and get the following output:

$ log1.py 172.16.112.50 202.77.162.213 packets.csv Source IP Destination IP Protocol Count

  172.16.112.50 202.77.162.213 ICMP 1 Portmap 5 RSH 9 SADMIND 1 TCP 30 TELNET 41 Total: 87 

I am a junior systems administrator and don't really have a lot of experience with programming (just HTML) I have started learning however, I have been stuck on this problem for the past 3 days here is what I have so far:

# Function for validating IP address is valid or not 
def ip_validation(ip_address):
    ip_regex= re.match('^[\d]{1,3}[.][\d]{1,3}[.][\d]{1,3}[.][\d]{1,3}$', ip_address)
    return ip_regex
def filereader(file_name):
    file_dump= open(file_name,'r')
    for eachline in file_dump:
        line_a= eachline.replace('\"','') # removes all quotes from the file
        line_b= line_a.split(',') # Delimate each fild based on ','             
        src_ip= line_b[2] # Source IP
        dst_ip= line_b[3] # Destination IP
        prot= line_b[4] # Protocol
        eachline= src_ip, dst_ip, prot      
        itlist.append(eachline) 
        itlist.sort()
        print itlist

parse logfile and create list of lists, where each sub-list containing (Source IP,Destination IP,Protocol,Count).

Now, all you have to apply filter function on this outer list. Reply if you need further clarification.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM