簡體   English   中英

計算字符串在特定列中出現的次數

[英]Count how many times a string occurs in a specific column

我試圖查看字符串在第4列中出現了多少次。更具體地說,某些Netflow數據中端口號出現了多少次。 有成千上萬的端口,因此除了遞歸之外,我沒有在尋找其他任何特定的東西。 我已經使用冒號后面的數字將其解析為該列,並且我想讓代碼檢查該數字出現了多少次,因此最終輸出應打印出該數字出現了多少次。

[OUTPUT]

Port: 80 found: 3 times.
Port: 53 found: 2 times.
Port: 21 found: 1 times.

[碼]

import re


frequency = {}

file = open('/Users/rojeliomaestas/Desktop/nettest2.txt', 'r')

with open('/Users/rojeliomaestas/Desktop/nettest2.txt', 'r') as infile:    
    next(infile)
    for line in infile:
        data = line.split()[4].split(":")[1]
        text_string = file.read().lower()
        match_pattern = re.findall(data, text_string)


for word in match_pattern:
    count = frequency.get(word,0)
    frequency[word] = count + 1

frequency_list = frequency.keys()

for words in frequency_list:
    print ("port:", words,"found:", frequency[words], "times.")

[文件]

Date first seen          Duration Proto      Src IP Addr:Port          Dst IP Addr:Port   Packets    Bytes Flows
2017-04-02 12:07:32.079     9.298 UDP            8.8.8.8:80 ->     205.166.231.250:8080     1      345     1
2017-04-02 12:08:32.079     9.298 TCP            8.8.8.8:53 ->     205.166.231.250:80       1       75     1
2017-04-02 12:08:32.079     9.298 TCP            8.8.8.8:80 ->     205.166.231.250:69       1      875     1
2017-04-02 12:08:32.079     9.298 TCP            8.8.8.8:53 ->     205.166.231.250:443      1      275     1
2017-04-02 12:08:32.079     9.298 UDP            8.8.8.8:80 ->     205.166.231.250:23       1      842     1
2017-04-02 12:08:32.079     9.298 TCP            8.8.8.8:21 ->     205.166.231.250:25       1      146     1

來自python標准庫。 將返回包含您所要查找內容的字典。

from collections import Counter
counts = Counter(column)
counts.most_common(n) # will return the most common values for specified number (n)

您需要類似:

frequency = {}
with open('/Users/rojeliomaestas/Desktop/nettest2.txt', 'r') as infile:    
    next(infile)
    for line in infile:
        port = line.split()[4].split(":")[1]
        frequency[port] = frequency.get(port,0) + 1

for port, count in frequency.items(): 
    print("port:", port, "found:", count, "times.")

這樣做的核心是,您要保留要計算的端口的數量,並為每一行增加該數量。 dict.get將返回當前值或默認值(在這種情況下為0)。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM