繁体   English   中英

如何计算字典键和总和值的出现并打印它们?

[英]How to count occurrences of dictionary key and sum values and print them?

我将计算域访问的带宽,并且需要知道域被击中了多少次。 我能够计算带宽,但不确定如何计算日志中域的出现。 任何想法都会有很大帮助。 非常感谢您提前提供帮助。

代码:

import os
import re
from collections import defaultdict  
import string

merged_logs = []
line=[]
dict = defaultdict(int)
bandwidth = 0

path = ["/var/logs/"]

for i in path:
   for filename in os.listdir(i):
       with open(os.path.join(i, filename), 'r') as filedata:
           merged_logs += filedata.readlines()


for line in merged_logs:
  line_split = line.split(" ")
  start = "CONNECT "
  end = " -"
  domain_str = line[line.find(start)+len(start):line.find(end)]
if domain_str.find("/")>0:
   domain_split = domain_str.split("/")
   domain = domain_split[0]
   if len(line_split)==10:

     bandwidth = line_split[3]

   if len(line_split)==11:

      bandwidth = line_split[4]
   else:
     domain = domain_str

  if len(line_split)==10:
     bandwidth = line_split[3]

  if len(line_split)==11:

     bandwidth = line_split[4]

if domain not in dict:
   dict[domain] = int(bandwidth)
else:
   dict[domain] += int(bandwidth)

for key, value in dict.items():
    print key, (value * 2 )/(1024 * 1024) 

/var/logs 下的示例日志文件包含以下行:

1569935790.563 1010 192.168.10.3 TCP_TUNNEL/200 1001803 CONNECT www.google.com:443 - HIER_DIRECT/www.google.com - 192.168.100.3
1569935790.563 1010 192.168.10.3 TCP_TUNNEL/200 1001085 CONNECT www.google.com:443 - HIER_DIRECT/www.google.com - 192.168.100.3
1569935790.563 1010 192.168.10.3 TCP_TUNNEL/200 1000182 CONNECT www.google.com:443 - HIER_DIRECT/www.google.com - 192.168.100.3
1569935790.563 1010 192.168.10.3 TCP_TUNNEL/200 1006183 CONNECT www.xyz.com/index.php - HIER_DIRECT/www.xyz.com - 192.168.100.3
1569935790.563 1010 192.168.10.3 TCP_TUNNEL/200 1091083 CONNECT www.xyz.com/index.php - HIER_DIRECT/www.xyz.com - 192.168.100.3
1569935790.563 1010 192.168.10.3 TCP_TUNNEL/200 2091803 CONNECT www.xyz.com/index.php - HIER_DIRECT/www.xyz.com - 192.168.100.3
1569935790.563 1010 192.168.10.3 TCP_TUNNEL/200 2091083 CONNECT www.xyz.com/index.php - HIER_DIRECT/www.xyz.com - 192.168.100.3
59375 192.168.10.3 TAG_NONE/503 10 CONNECT www.google.com - HIER_NONE/- - 192.168.100.3

Output 应采用以下格式:

Domain        Bandwidth (MB)     Hit (Count)

www.xyz.com        11                  4
www.google.com      5                  3
import os
import re
from collections import defaultdict, Counter
import string

# Compile Regex pattern beforehand for optimized computation
domain_pattern = re.compile("(CONNECT )(?P<domain>.*?)( -)")
# Initialize a defaultdict for Storing and Updating the Sum of Bandwidths
bandwidths = defaultdict(int)
# Initialize a Counter for Storing and Updating the Count of Hits
counts = Counter()

path = ["/var/logs/"]

for i in path:
    for filename in os.listdir(i):
        with open(os.path.join(i, filename), 'r') as filedata:
            merged_logs += filedata.readlines()

for line in merged_logs:
    line_split = line.split(" ")
    # Use re.search function to get the string matching the Regex Pattern
    # Use group method to just fetch the named group: 'domain' as specified in the pattern
    domain_str = re.search(domain_pattern, line).group('domain')
    domain = domain_str.split("/")[0]

    if len(line_split) == 10:
        bandwidth = line_split[3]

    elif len(line_split) == 11:
        bandwidth = line_split[4]

    else:
        pass

    # Update the defaultdict to add the bandwidth
    bandwidths[domain] += int(bandwidth)
    # Update the Counter to increment the count by 1
    counts[domain] += 1

for domain in bandwidths:
    bandwidth = int((bandwidths[domain] * 2 ) / (1024 * 1024))
    hits = counts[domain]
    print domain, bandwidth, hits

我在示例日志上运行上述代码,得到以下 output:

www.google.com:443  5       3
www.xyz.com         11      4
www.google.com      0       1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM