[英]Python counting the unique occurences of a string in a file
I'm trying to count the unique IP addresses in a Apache log-file using python 3.3.1 The thing is I don't think that it is counting everything correctly. 我正在尝试使用python 3.3.1计算Apache日志文件中的唯一IP地址。问题是我认为它不能正确计数所有内容。
Here is my code: 这是我的代码:
import argparse
import os
import sys
from collections import Counter
#
# This function counts the unique IP adresses in the logfile
#
def print_unique_ip(logfile):
IPset = set()
for line in logfile:
head, sep, tail = line.partition(" ")
if(len(head) > 1):
IPset.update(head)
print(len(IPset))
return
#
# This is the main function of the program
#
def main():
parser = argparse.ArgumentParser(description="An appache log file processor")
parser.add_argument('-l', '--log-file', help='This is the log file to work on', required=True)
parser.add_argument('-n', help='Displays the number of unique IP adresses', action='store_true')
parser.add_argument('-t', help='Displays top T IP adresses', type=int)
parser.add_argument('-v', help='Displays the number of visits of a IP adress')
arguments = parser.parse_args()
if(os.path.isfile(arguments.log_file)):
logfile = open(arguments.log_file)
else:
print('The file <', arguments.log_file, '> does not exist')
sys.exit
if(arguments.n == True):
print_unique_ip(logfile)
if(arguments.t):
print_top_n_ip(arguments.t, logfile)
if(arguments.v):
number_of_ocurrences(arguments.v, logfile)
return
if __name__ == '__main__':
main()
I have left put everything else. 我已经把其他所有东西都留了。
When I run it I get 当我运行它时,我得到
$ python3 assig4.py -l apache_short.log -n
12
But I know that there are more than 12 unique IPs in the file 但我知道文件中有12个以上的唯一IP
It doesn't seem to be giving me the right result. 它似乎没有给我正确的结果。 What I am trying to do is to read the file line by line, then when I find an IP address I put it into a set as it only saves unique elements and then I print out the length of said set.
我要做的是逐行读取文件,然后在找到IP地址时将其放入集合中,因为它仅保存唯一元素,然后打印出所述集合的长度。
IPset.update(head)
Bug. 错误。 This will not do what you're expecting.
这不会达到您的期望。 You want to
add
each IP to your set instead. 您想
add
每个IP add
到您的集合中。 Examples make it clearest: 例子最清楚:
>>> s1 = set()
>>> s2 = set()
>>> s1.add('11.22.33.44')
>>> s2.update('11.22.33.44')
>>> s1
set(['11.22.33.44'])
>>> s2
set(['1', '3', '2', '4', '.'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.