[英]How to group Wireshark TCP packets per flow using Python
I captured tcp data in Wireshark and export the data to csv and now I am trying to group the tcp packets per flow, using python but I'm not sure how to do it. 我在Wireshark中捕获了tcp数据,并将数据导出到csv,现在我正尝试使用python对每个流的tcp数据包进行分组,但是我不确定该怎么做。
if Source, Src Port, Destination, Dest Port is the same across the row forward and backward it's considered apart of the same flow ie A->B and B->A 如果Source,Src Port,Destination,Dest Port在行的前后是相同的,则视为相同流的一部分,即A-> B和B-> A
in the example below there are two flow: 在下面的示例中,有两个流程:
Source Src Port Destination Dest Port
10.129.200.119 49298 17.248.144.77 443
10.129.200.119 49299 17.253.37.210 80
No. Time Source Src Port Destination Dest Port Protocol Length Flags
37 12.045906 10.129.200.119 49298 17.248.144.77 443 TCP 54 0x010
38 12.04922 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x010
39 13.634783 10.129.200.119 49298 17.248.144.77 443 TLSv1.2 112 0x018
40 13.635868 10.129.200.119 49298 17.248.144.77 443 TLSv1.2 97 0x018
41 13.636239 10.129.200.119 49298 17.248.144.77 443 TCP 66 0x011
42 13.640724 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x010
43 13.640731 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x011
44 13.640732 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x010
45 13.640852 10.129.200.119 49298 17.248.144.77 443 TCP 66 0x011
47 14.472724 10.129.200.119 49299 17.253.37.210 80 TCP 78 0x0c2
48 14.478233 17.253.37.210 80 10.129.200.119 49299 TCP 74 0x052
50 14.478405 10.129.200.119 49299 17.253.37.210 80 TCP 66 0x010
51 14.479316 10.129.200.119 49299 17.253.37.210 80 HTTP 361 0x018
52 14.483419 17.253.37.210 80 10.129.200.119 49299 TCP 66 0x010
53 14.483425 17.253.37.210 80 10.129.200.119 49299 TCP 1514 0x010
54 14.483427 17.253.37.210 80 10.129.200.119 49299 TCP 1514 0x010
55 14.48343 17.253.37.210 80 10.129.200.119 49299 OCSP 319 0x018
56 14.48355 10.129.200.119 49299 17.253.37.210 80 TCP 66 0x010
57 14.483551 10.129.200.119 49299 17.253.37.210 80 TCP 66 0x010
58 14.486264 10.129.200.119 49299 17.253.37.210 80 TCP 66 0x011
59 14.490827 17.253.37.210 80 10.129.200.119 49299 TCP 66 0x011
60 14.490914 10.129.200.119 49299 17.253.37.210 80 TCP 66 0x010
I would recommend to export the data from wireshark to .json format, there is a better way to group tcp session using information that isn't exported to the csv format. 我建议将数据从wirehark导出为.json格式,这是一种更好的方法,可以使用未导出为csv格式的信息对tcp会话进行分组。 In order to do make a json file from your pcap do: File->Export Packet Dissection->AS JSON...
为了从您的pcap中制作一个json文件,请执行以下操作:File-> Export Packet Dissection-> AS JSON ...
After you do so, you can look at the field tcp.stream
, it has the same value for tcp stream ("flow"). 完成后,您可以查看字段
tcp.stream
,其字段与tcp stream(“ flow”)的值相同。
Then you can use this code in order to go over the packet, and search for specific tcp.stream
value: 然后,您可以使用以下代码来遍历数据包,并搜索特定的
tcp.stream
值:
import json
with open('path_to_your_json.json') as json_file:
packets = json.load(json_file)
count = 0
for packet in packets:
layers = packet["_source"]['layers']
if "tcp" in layers:
if layers["tcp"]["tcp.stream"]=="11":
count=count+1
print(count)
this code for example, follow all the tcp packets that are in stream number 11, and count them. 例如,此代码跟随流编号11中的所有tcp数据包,并对它们进行计数。
In order to work efficently and understand what you are doing, I recommend that you open the json file in text editor (like sublime), and see what it contains and the hierarchy of things. 为了有效地工作并了解您在做什么,我建议您在文本编辑器(如sublime)中打开json文件,并查看其中包含的内容和事物的层次结构。 In addition, I would recommend to read about json in python: w3schools python and json
另外,我建议阅读有关python中的json的信息: w3schools python和json
You can use pandas to do this. 您可以使用熊猫来做到这一点。 If you rename your columns
Src Port
to Src_Port
and Dest Port
to Dest_Port
. 如果将列
Src Port
重命名为Src_Port
而Dest Port
重命名为Dest_Port
。
Assuming that the pair of ['Source', 'Src_Port', 'Destination', 'Dest_Port', 'Protocol']
is 'flow' (I am by no means a domain expert) and your data is in 'wireshark_dump.csv', you can do the following 假设一对
['Source', 'Src_Port', 'Destination', 'Dest_Port', 'Protocol']
为“流”(我绝不是域专家),并且您的数据位于“ wireshark_dump.csv”中,您可以执行以下操作
import pandas as pd
df = pd.read_csv('wireshark_dump.csv', delim_whitespace=True)
flow_columns = ['Source', 'Src_Port', 'Destination', 'Dest_Port', 'Protocol']
for flow, flow_data in df.groupby(flow_columns):
print(flow)
print(flow_data)
Note that depending on what your further processing looks like, you might not want to iterate over the groupby groups as it is slow. 请注意,根据您的进一步处理情况,您可能不想遍历groupby组,因为它很慢。
May be you can try pandas. 也许您可以尝试熊猫。 Below snippet.
在摘要下方。 groups the rows of data according to the source ip address.
根据源IP地址对数据行进行分组。
I am not familiar with what you mean by flow. 我不熟悉您所说的流程。 I am assuming it means according to the source and destination ip pairs.
我假设这意味着根据源和目标IP对。
import pandas as pd
with open('data.txt') as f:
lines = f.readlines()
data = []
for line in lines:
tokens = line.split()
data.append(tokens)
df = pd.DataFrame(data, columns=list("ABCDEFGHI"))
print(df)
grouped_df = df.groupby('C', as_index=False)
for key, item in grouped_df:
print(grouped_df.get_group(key), "\n\n")
gives so an output 这样输出
[8 rows x 9 columns]
A B C D ... F G H I
0 37 12.045906 10.129.200.119 49298 ... 443 TCP 54 0x010
2 39 13.634783 10.129.200.119 49298 ... 443 TLSv1.2 112 0x018
3 40 13.635868 10.129.200.119 49298 ... 443 TLSv1.2 97 0x018
4 41 13.636239 10.129.200.119 49298 ... 443 TCP 66 0x011
[4 rows x 9 columns]
A B C D E F G H I
1 38 12.04922 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x010
5 42 13.640724 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x010
6 43 13.640731 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x011
7 44 13.640732 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x010
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.