如何使用Python对每个流的Wireshark TCP数据包进行分组

Question

I captured tcp data in Wireshark and export the data to csv and now I am trying to group the tcp packets per flow, using python but I'm not sure how to do it. 我在Wireshark中捕获了tcp数据，并将数据导出到csv，现在我正尝试使用python对每个流的tcp数据包进行分组，但是我不确定该怎么做。

if Source, Src Port, Destination, Dest Port is the same across the row forward and backward it's considered apart of the same flow ie A->B and B->A 如果Source，Src Port，Destination，Dest Port在行的前后是相同的，则视为相同流的一部分，即A-> B和B-> A

in the example below there are two flow: 在下面的示例中，有两个流程：

Source          Src Port    Destination     Dest Port
10.129.200.119  49298       17.248.144.77   443 
10.129.200.119  49299       17.253.37.210   80

No. Time    Source  Src Port    Destination Dest Port   Protocol    Length  Flags
37  12.045906   10.129.200.119  49298   17.248.144.77   443 TCP 54  0x010
38  12.04922    17.248.144.77   443 10.129.200.119  49298   TCP 66  0x010
39  13.634783   10.129.200.119  49298   17.248.144.77   443 TLSv1.2 112 0x018
40  13.635868   10.129.200.119  49298   17.248.144.77   443 TLSv1.2 97  0x018
41  13.636239   10.129.200.119  49298   17.248.144.77   443 TCP 66  0x011
42  13.640724   17.248.144.77   443 10.129.200.119  49298   TCP 66  0x010
43  13.640731   17.248.144.77   443 10.129.200.119  49298   TCP 66  0x011
44  13.640732   17.248.144.77   443 10.129.200.119  49298   TCP 66  0x010
45  13.640852   10.129.200.119  49298   17.248.144.77   443 TCP 66  0x011
47  14.472724   10.129.200.119  49299   17.253.37.210   80  TCP 78  0x0c2
48  14.478233   17.253.37.210   80  10.129.200.119  49299   TCP 74  0x052
50  14.478405   10.129.200.119  49299   17.253.37.210   80  TCP 66  0x010
51  14.479316   10.129.200.119  49299   17.253.37.210   80  HTTP    361 0x018
52  14.483419   17.253.37.210   80  10.129.200.119  49299   TCP 66  0x010
53  14.483425   17.253.37.210   80  10.129.200.119  49299   TCP 1514    0x010
54  14.483427   17.253.37.210   80  10.129.200.119  49299   TCP 1514    0x010
55  14.48343    17.253.37.210   80  10.129.200.119  49299   OCSP    319 0x018
56  14.48355    10.129.200.119  49299   17.253.37.210   80  TCP 66  0x010
57  14.483551   10.129.200.119  49299   17.253.37.210   80  TCP 66  0x010
58  14.486264   10.129.200.119  49299   17.253.37.210   80  TCP 66  0x011
59  14.490827   17.253.37.210   80  10.129.200.119  49299   TCP 66  0x011
60  14.490914   10.129.200.119  49299   17.253.37.210   80  TCP 66  0x010

Answer 1

I would recommend to export the data from wireshark to .json format, there is a better way to group tcp session using information that isn't exported to the csv format. 我建议将数据从wirehark导出为.json格式，这是一种更好的方法，可以使用未导出为csv格式的信息对tcp会话进行分组。 In order to do make a json file from your pcap do: File->Export Packet Dissection->AS JSON... 为了从您的pcap中制作一个json文件，请执行以下操作：File-> Export Packet Dissection-> AS JSON ...

After you do so, you can look at the field tcp.stream , it has the same value for tcp stream ("flow"). 完成后，您可以查看字段tcp.stream ，其字段与tcp stream（“ flow”）的值相同。

Then you can use this code in order to go over the packet, and search for specific tcp.stream value: 然后，您可以使用以下代码来遍历数据包，并搜索特定的tcp.stream值：

import json

with open('path_to_your_json.json') as json_file:
    packets = json.load(json_file)

    count = 0
    for packet in packets:
        layers = packet["_source"]['layers']
        if "tcp" in layers:
            if layers["tcp"]["tcp.stream"]=="11":
                count=count+1
    print(count)

this code for example, follow all the tcp packets that are in stream number 11, and count them. 例如，此代码跟随流编号11中的所有tcp数据包，并对它们进行计数。

In order to work efficently and understand what you are doing, I recommend that you open the json file in text editor (like sublime), and see what it contains and the hierarchy of things. 为了有效地工作并了解您在做什么，我建议您在文本编辑器（如sublime）中打开json文件，并查看其中包含的内容和事物的层次结构。 In addition, I would recommend to read about json in python: w3schools python and json 另外，我建议阅读有关python中的json的信息： w3schools python和json

Answer 2

You can use pandas to do this. 您可以使用熊猫来做到这一点。 If you rename your columns Src Port to Src_Port and Dest Port to Dest_Port . 如果将列Src Port重命名为Src_Port而Dest Port重命名为Dest_Port 。

Assuming that the pair of ['Source', 'Src_Port', 'Destination', 'Dest_Port', 'Protocol'] is 'flow' (I am by no means a domain expert) and your data is in 'wireshark_dump.csv', you can do the following 假设一对['Source', 'Src_Port', 'Destination', 'Dest_Port', 'Protocol']为“流”（我绝不是域专家），并且您的数据位于“ wireshark_dump.csv”中，您可以执行以下操作

import pandas as pd


df = pd.read_csv('wireshark_dump.csv', delim_whitespace=True)

flow_columns = ['Source', 'Src_Port', 'Destination', 'Dest_Port', 'Protocol']
for flow, flow_data in df.groupby(flow_columns):
    print(flow)
    print(flow_data)

Note that depending on what your further processing looks like, you might not want to iterate over the groupby groups as it is slow. 请注意，根据您的进一步处理情况，您可能不想遍历groupby组，因为它很慢。

Answer 3

May be you can try pandas. 也许您可以尝试熊猫。 Below snippet. 在摘要下方。 groups the rows of data according to the source ip address. 根据源IP地址对数据行进行分组。

I am not familiar with what you mean by flow. 我不熟悉您所说的流程。 I am assuming it means according to the source and destination ip pairs. 我假设这意味着根据源和目标IP对。

import pandas as pd

with open('data.txt') as f:
    lines = f.readlines()
    data = []
    for line in lines:
        tokens = line.split()
        data.append(tokens)
    df = pd.DataFrame(data, columns=list("ABCDEFGHI"))
    print(df)
    grouped_df = df.groupby('C', as_index=False)
    for key, item in grouped_df:
        print(grouped_df.get_group(key), "\n\n")

gives so an output 这样输出

[8 rows x 9 columns]
    A          B               C      D  ...    F        G    H      I
0  37  12.045906  10.129.200.119  49298  ...  443      TCP   54  0x010
2  39  13.634783  10.129.200.119  49298  ...  443  TLSv1.2  112  0x018
3  40  13.635868  10.129.200.119  49298  ...  443  TLSv1.2   97  0x018
4  41  13.636239  10.129.200.119  49298  ...  443      TCP   66  0x011

[4 rows x 9 columns] 


    A          B              C    D               E      F    G   H      I
1  38   12.04922  17.248.144.77  443  10.129.200.119  49298  TCP  66  0x010
5  42  13.640724  17.248.144.77  443  10.129.200.119  49298  TCP  66  0x010
6  43  13.640731  17.248.144.77  443  10.129.200.119  49298  TCP  66  0x011
7  44  13.640732  17.248.144.77  443  10.129.200.119  49298  TCP  66  0x010

如何使用Python对每个流的Wireshark TCP数据包进行分组

问题描述

3 个解决方案

解决方案1
1 已采纳 2019-11-23 16:24:18

解决方案2
0 2019-11-23 16:29:53

解决方案3
0 2019-11-23 16:30:33

如何使用Python对每个流的Wireshark TCP数据包进行分组

问题描述

3 个解决方案

解决方案1 1 已采纳 2019-11-23 16:24:18

解决方案2 0 2019-11-23 16:29:53

解决方案3 0 2019-11-23 16:30:33

解决方案1
1 已采纳 2019-11-23 16:24:18

解决方案2
0 2019-11-23 16:29:53

解决方案3
0 2019-11-23 16:30:33