是否可以使用 Scapy 从 pcap 文件中读取会话以提高内存效率

Question

At the moment I'm trying to write a quick Python program that reads in a .pcap file and writes out data about the various sessions that are stored in there.目前，我正在尝试编写一个快速的 Python 程序，该程序读取 .pcap 文件并写出有关存储在其中的各种会话的数据。

The information I write out includes srcip, dstip, srcport and dstport etc.我写出来的信息包括srcip、dstip、srcport、dstport等。

However, even for a fairly small pcap this takes a lot of memory and ends up running for a very long time.然而，即使对于相当小的 pcap，这也会占用大量内存并最终运行很长时间。 We're talking 8GB+ of memory used for a pcap of a size of 212MB.我们说的是 8GB+ 的内存用于大小为 212MB 的 pcap。

As usual I guess there might be a more efficient way of doing this that I'm just unaware of.像往常一样，我想可能有一种我不知道的更有效的方法。

Here is a quick skeleton of my code - no vital parts missing.这是我的代码的快速骨架 - 没有缺少重要部分。

import socket
from scapy.all import *


edges_file = "edges.csv"
pcap_file = "tcpdump.pcap"

try:
    print '[+] Reading and parsing pcap file: %s' % pcap_file
    a = rdpcap(pcap_file)

except Exception as e:
    print 'Something went wrong while opening/reading the pcap file.' \
          '\n\nThe error message is: %s' % e
    exit(0)

sessions = a.sessions()

print '[+] Writing to edges.csv'
f1 = open(edges_file, 'w')
f1.write('source,target,protocol,sourceport,destinationport,'
         'num_of_packets\n')
for k, v in sessions.iteritems():

    tot_packets = len(v)

    if "UDP" in k:
        proto, source, flurp, target = k.split()
        srcip, srcport = source.split(":")
        dstip, dstport = target.split(":")
        f1.write('%s,%s,%s,%s,%s,%s\n' % (srcip, dstip, proto, srcport,
                                          dstport, tot_packets))
        continue

    elif "TCP" in k:
        proto, source, flurp, target = k.split()
        srcip, srcport = source.split(":")
        dstip, dstport = target.split(":")
        f1.write('%s,%s,%s,%s,%s,%s\n' % (srcip, dstip, proto, srcport,
                                          dstport, tot_packets))
        continue

    elif "ICMP" in k:
        continue  # Not bothered about ICMP right now

    else:
        continue  # Or any other 'weird' pacakges for that matter ;)

print '[+] Closing the edges file'
f1.close()

As always - grateful for any assistance.一如既往 - 感谢任何帮助。

Answer 1

I know I'm late to the party, but hopefully this will be useful to future visitors.我知道我迟到了，但希望这对未来的访客有用。

rdpcap() dissects the entire pcap file and retains an in-memory representation of each and every packet , which explains why it eats up a lot of memory. rdpcap()剖析整个 pcap 文件并保留每个数据包的内存表示，这解释了为什么它会占用大量内存。

As far as I am aware (I am a novice Scapy user myself), the only two ways you can invoke Scapy's session reassembly are:据我所知（我自己是 Scapy 的新手用户），可以调用 Scapy 会话重组的仅有两种方法是：

By calling scapy.plist.PacketList.sessions() .通过调用scapy.plist.PacketList.sessions() 。 This is what you're currently doing ( rdpcap(pcap_file) returns a scapy.plist.PacketList ).这就是您目前正在做的事情（ rdpcap(pcap_file)返回一个scapy.plist.PacketList ）。
By reading the pcap using sniff() in offline mode while also providing the function with a session decoder implementation .通过在离线模式下使用sniff()读取 pcap，同时还提供具有会话解码器实现的功能。 For example, for TCP reassembly you'd do sniff(offline='stackoverflow.pcap', session=TCPSession) .例如，对于 TCP 重组，您可以执行sniff(offline='stackoverflow.pcap', session=TCPSession) 。 (This was added in Scapy 2.4.3). （这是在 Scapy 2.4.3 中添加的）。

Option 1 is obviously a dead end (as it requires that we keep all packets of all sessions in memory at one time), so let's explore option 2...选项 1 显然是一个死胡同（因为它要求我们一次将所有会话的所有数据包都保存在内存中），所以让我们探索选项 2...

Let's launch Scapy in interactive mode to access the documentation for sniff() :让我们以交互模式启动 Scapy 以访问sniff()的文档：

$ scapy
>>> help(sniff)

Help on function sniff in module scapy.sendrecv:

sniff(*args, **kwargs)
    Sniff packets and return a list of packets.
    
    Args:
        count: number of packets to capture. 0 means infinity.
        store: whether to store sniffed packets or discard them
        prn: function to apply to each packet. If something is returned, it
             is displayed.
             --Ex: prn = lambda x: x.summary()
        session: a session = a flow decoder used to handle stream of packets.
                 e.g: IPSession (to defragment on-the-flow) or NetflowSession
        filter: BPF filter to apply.
        lfilter: Python function applied to each packet to determine if
                 further action may be done.
                 --Ex: lfilter = lambda x: x.haslayer(Padding)
        offline: PCAP file (or list of PCAP files) to read packets from,
                 instead of sniffing them
        timeout: stop sniffing after a given time (default: None).
        L2socket: use the provided L2socket (default: use conf.L2listen).
        opened_socket: provide an object (or a list of objects) ready to use
                      .recv() on.
        stop_filter: Python function applied to each packet to determine if
                     we have to stop the capture after this packet.
                     --Ex: stop_filter = lambda x: x.haslayer(TCP)
        iface: interface or list of interfaces (default: None for sniffing
               on all interfaces).
        monitor: use monitor mode. May not be available on all OS
        started_callback: called as soon as the sniffer starts sniffing
                          (default: None).
    
    The iface, offline and opened_socket parameters can be either an
    element, a list of elements, or a dict object mapping an element to a
    label (see examples below).

Notice the store parameter.注意store参数。 We can set this to False to make sniff() operate in a streamed fashion (read a single packet, process it, then release it from memory):我们可以将其设置为False以使sniff()以流式方式运行（读取单个数据包，处理它，然后将其从内存中释放）：

sniff(offline='stackoverflow.pcap', session=TCPSession, store=False)

I just tested this with a 193 MB pcap.我刚刚用 193 MB 的 pcap 对此进行了测试。 For store=True (default value), this eats up about 1.7 GB of memory on my system (macOS), but only approximately 47 MB when store=False .对于store=True （默认值），这在我的系统 (macOS) 上消耗了大约 1.7 GB 的内存，但当store=False时仅消耗大约 47 MB。

Processing the reassembled TCP sessions (open question)处理重组的 TCP 会话（开放问题）

So we managed to reduce our memory footprint - great!所以我们设法减少了内存占用 - 太棒了！ But how do we process the (supposedly) reassembled TCP sessions?但是我们如何处理（据说）重组的 TCP 会话呢？ The usage instructions indicates that we should use the prn parameter of sniff() to specify a callback function that will then be invoked with the reassembled TCP session (emphasis mine): 使用说明表明我们应该使用sniff()的prn参数来指定一个回调函数，然后将使用重组的 TCP 会话调用该函数（重点是我的）：

sniff() also provides Sessions, that allows to dissect a flow of packets seamlessly . sniff()还提供会话，允许无缝剖析数据包流。 For instance, you may want your sniff(prn=...) function to automatically defragment IP packets, before executing the prn .例如，您可能希望sniff(prn=...)函数在执行prn之前自动对 IP 数据包进行碎片整理。

The example is in the context of IP fragmentation, but I'd expect the TCP analog to be to group all packets of a session and then invoke prn once for each session.该示例是在 IP 分段的上下文中，但我希望 TCP 模拟能够将会话的所有数据包分组，然后为每个会话调用prn一次。 Unfortunately, that's not how it works: I tried this on my example pcap, and the callback is invoked once for every packet---exactly as indicated in sniff() 's documentation shown above.不幸的是，这不是它的工作原理：我在我的示例 pcap 上尝试了这个，并且为每个数据包调用了一次回调——正如上面显示的sniff()的文档中所示。

The usage instructions linked above also states the following about using session=TCPSession in sniff() :上面链接的使用说明还说明了有关在sniff()使用session=TCPSession的以下内容：

TCPSession -> defragment certain TCP protocols*. TCPSession -> 对某些 TCP 协议进行碎片整理*。 Only HTTP 1.0 currently uses this functionality.目前只有 HTTP 1.0 使用此功能。

With the output of the experiment above in mind, I now interpret this as that whenever Scapy finds an HTTP (1.0) request/response that spans across multiple TCP segments, it'll create a single packet in which the payload is the merged payload of those TCP segments (which in total is the full HTTP request/response).考虑到上述实验的输出，我现在将其解释为每当 Scapy 发现跨多个 TCP 段的 HTTP (1.0) 请求/响应时，它将创建一个单个数据包，其中的有效负载是合并的有效负载这些 TCP 段（总共是完整的 HTTP 请求/响应）。 I'd appreciate it if anyone can help clarify the meaning of the above quote on TCPSession---or even better: clarify if TCP reassembly is indeed possible this way and that I'm just misunderstanding the API.如果有人能帮助澄清上面关于 TCPSession 的引用的含义，我将不胜感激——或者甚至更好：澄清 TCP 重组是否确实可以通过这种方式进行，而我只是误解了 API。

是否可以使用 Scapy 从 pcap 文件中读取会话以提高内存效率

问题描述

1 个解决方案

解决方案1
2 2020-08-07 07:02:23

是否可以使用 Scapy 从 pcap 文件中读取会话以提高内存效率

问题描述

1 个解决方案

解决方案1 2 2020-08-07 07:02:23

解决方案1
2 2020-08-07 07:02:23