[英]Count how many times a string occurs in a specific column
I am trying to see how many times a string occurs in column 4. More specifically how much times a port number occurs in some Netflow data. 我试图查看字符串在第4列中出现了多少次。更具体地说,某些Netflow数据中端口号出现了多少次。 There are thousands of ports so I'm not looking for anything specific other than recursion.
有成千上万的端口,因此除了递归之外,我没有在寻找其他任何特定的东西。 I have already parsed into the column using the numbers after the colon and I want the code to check how much times that number occurs so the final output should print the number with how many times it occurred like so..
我已经使用冒号后面的数字将其解析为该列,并且我想让代码检查该数字出现了多少次,因此最终输出应打印出该数字出现了多少次。
[OUTPUT] [OUTPUT]
Port: 80 found: 3 times.
Port: 53 found: 2 times.
Port: 21 found: 1 times.
[CODE] [码]
import re
frequency = {}
file = open('/Users/rojeliomaestas/Desktop/nettest2.txt', 'r')
with open('/Users/rojeliomaestas/Desktop/nettest2.txt', 'r') as infile:
next(infile)
for line in infile:
data = line.split()[4].split(":")[1]
text_string = file.read().lower()
match_pattern = re.findall(data, text_string)
for word in match_pattern:
count = frequency.get(word,0)
frequency[word] = count + 1
frequency_list = frequency.keys()
for words in frequency_list:
print ("port:", words,"found:", frequency[words], "times.")
[FILE] [文件]
Date first seen Duration Proto Src IP Addr:Port Dst IP Addr:Port Packets Bytes Flows
2017-04-02 12:07:32.079 9.298 UDP 8.8.8.8:80 -> 205.166.231.250:8080 1 345 1
2017-04-02 12:08:32.079 9.298 TCP 8.8.8.8:53 -> 205.166.231.250:80 1 75 1
2017-04-02 12:08:32.079 9.298 TCP 8.8.8.8:80 -> 205.166.231.250:69 1 875 1
2017-04-02 12:08:32.079 9.298 TCP 8.8.8.8:53 -> 205.166.231.250:443 1 275 1
2017-04-02 12:08:32.079 9.298 UDP 8.8.8.8:80 -> 205.166.231.250:23 1 842 1
2017-04-02 12:08:32.079 9.298 TCP 8.8.8.8:21 -> 205.166.231.250:25 1 146 1
From python standard library. 来自python标准库。 Will return a dictionary with exactly what you are looking for.
将返回包含您所要查找内容的字典。
from collections import Counter
counts = Counter(column)
counts.most_common(n) # will return the most common values for specified number (n)
You need something like: 您需要类似:
frequency = {}
with open('/Users/rojeliomaestas/Desktop/nettest2.txt', 'r') as infile:
next(infile)
for line in infile:
port = line.split()[4].split(":")[1]
frequency[port] = frequency.get(port,0) + 1
for port, count in frequency.items():
print("port:", port, "found:", count, "times.")
The heart of this is that you keep a dict of port to count, and increment this for every line. 这样做的核心是,您要保留要计算的端口的数量,并为每一行增加该数量。
dict.get
will return the current value or a default (in this case 0). dict.get
将返回当前值或默认值(在这种情况下为0)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.