计算字典中唯一项的数量

Question

My program reads in a large log file. 我的程序读取一个大的日志文件。 It then searches the file for the IP and TIME(whatever is in the brackets). 然后，它将在文件中搜索IP和TIME（无论括号内是什么）。

5.63.145.71 - - [30/Jun/2013:08:04:46 -0500] "HEAD / HTTP/1.1" 200 - "-" "checks.panopta.com" 5.63.145.71 - - [30/Jun/2013:08:04:49 -0500] "HEAD / HTTP/1.1" 200 - "-" "checks.panopta.com" 5.63.145.71 - - [30/Jun/2013:08:04:51 -0500] "HEAD / HTTP/1.1" 200 - "-" "checks.panopta.com" 5.63.145.71--[30 / Jun / 2013：08：04：46 -0500]“ HEAD / HTTP / 1.1” 200-“-”“ checks.panopta.com” 5.63.145.71--[30 / Jun / 2013 ：08：04：49 -0500]“ HEAD / HTTP / 1.1” 200-“-”“ checks.panopta.com” 5.63.145.71--[30 / Jun / 2013：08：04：51 -0500]“ HEAD / HTTP / 1.1“ 200-”-“” checks.panopta.com“

I want to read the whole file, and summarize the entries as follows: 我想阅读整个文件，并总结如下条目：

Num 3 IP 5.63.145.1 TIME [30/Jun/2013:08:04:46 -0500] Number of entries, IP, TIME and DATE Num 3 IP 5.63.145.1 TIME [30 / Jun / 2013：08：04：46 -0500]条目数，IP，TIME和DATE

What I have so far: 到目前为止，我有：

import re


x = open("logssss.txt")

dic={}


for line in x:
    m = re.search(r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b",line).group().split()
    c = re.search(r"\[(.+)\]",line).group().split()
    for i in range(len(m)):
        try:
            dic[m[i]] += 1 
        except:
            dic[m[i]] = 1
        k = dic.keys()
for i in range(len(k)):
    print dic[k[i]], k[i]

The above code displays correctly now! 上面的代码现在可以正确显示！ Thanks. 谢谢。

6 199.21.99.83 6 199.21.99.83

1 5.63.145.71 1 5.63.145.71

EDIT: So how about adding c into my output now, the timestamps are going to differ obviously, but just getting one of the values, on the same line, is that possible? 编辑：那么现在如何将c添加到我的输出中，时间戳将明显不同，但是仅在同一行上获取值之一，这可能吗？

Answer 1

Move your print statement outside of the main loop 将打印语句移出主循环

import re
x = open("logssss.txt")

dic={}


for line in x:
    m = re.search(r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b",line).group().split()
    c = re.search(r"\[(.+)\]",line).group().split()
    for i in range(len(m)):
        try:
            dic[m[i]] += 1 
        except:
            dic[m[i]] = 1

for k,v in dic.iteritems(): #or items if Python 3.X
    print k, v

As a tip you could take advantage of pythons Counter class to replace your try except block 作为提示，您可以利用pythons Counter类来代替try块

from collections import Counter
dic = Counter()
for line in x:
    m = re.search(r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b",line).group().split()
    c = re.search(r"\[(.+)\]",line).group().split()
    for i in range(len(m)):
        dic[m[i]] += 1

for k,v in dic.iteritems(): #or items if Python 3.X
    print k, v

From your comment, I would just use a dictionary of lists, the count for each ip address could be extracted from the length of the list: 根据您的评论，我只使用列表字典，可以从列表的长度中提取每个IP地址的计数：

dic = {}
for line in x:
    m = re.search(r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b",line).group().split()
    c = re.search(r"\[(.+)\]",line).group().split()
    for i in range(len(m)):
        dic.setdefault(m[i], []).append(c)

for k,v in dic.iteritems(): #or items if Python 3.X
    print k, len(v), v

Answer 2

You could use a Counter which is much more efficient: 您可以使用效率更高的Counter ：

from collections import Counter
cnt = Counter()
for line in x:
    m = re.search(r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b",line).group().split()
    cnt.update(m)

Then the printing done outside the main loop : 然后在主循环外完成打印：

for k,v in cnt.iteritems():
    print k, v

to include c, a defaultdict would be more appropriate: 要包含c，则defaultdict更合适：

dict = defaultdict(list)
for line in x:
    m = re.search(r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b",line).group().split()[0]
    c = re.search(r"\[(.+)\]",line).group().split()[0]
    dict[m].append(c)

for k,v in dict.iteritems():
    print k, len(v), v

It is my understanding that there is only 1 ip and date per line, hence the [0] to take the first and only occurence. 据我了解，每行只有1个ip和日期，因此[0]是第一个也是唯一的出现。

计算字典中唯一项的数量

问题描述

2 个解决方案

解决方案1
3 已采纳 2013-07-17 17:44:16

解决方案2
2 2013-07-17 17:49:10

计算字典中唯一项的数量

问题描述

2 个解决方案

解决方案1 3 已采纳 2013-07-17 17:44:16

解决方案2 2 2013-07-17 17:49:10

解决方案1
3 已采纳 2013-07-17 17:44:16

解决方案2
2 2013-07-17 17:49:10