[英]Can reading sessions from a pcap file with Scapy be made more memory efficient
At the moment I'm trying to write a quick Python program that reads in a .pcap file and writes out data about the various sessions that are stored in there.目前,我正在尝试编写一个快速的 Python 程序,该程序读取 .pcap 文件并写出有关存储在其中的各种会话的数据。
The information I write out includes srcip, dstip, srcport and dstport etc.我写出来的信息包括srcip、dstip、srcport、dstport等。
However, even for a fairly small pcap this takes a lot of memory and ends up running for a very long time.然而,即使对于相当小的 pcap,这也会占用大量内存并最终运行很长时间。 We're talking 8GB+ of memory used for a pcap of a size of 212MB.
我们说的是 8GB+ 的内存用于大小为 212MB 的 pcap。
As usual I guess there might be a more efficient way of doing this that I'm just unaware of.像往常一样,我想可能有一种我不知道的更有效的方法。
Here is a quick skeleton of my code - no vital parts missing.这是我的代码的快速骨架 - 没有缺少重要部分。
import socket
from scapy.all import *
edges_file = "edges.csv"
pcap_file = "tcpdump.pcap"
try:
print '[+] Reading and parsing pcap file: %s' % pcap_file
a = rdpcap(pcap_file)
except Exception as e:
print 'Something went wrong while opening/reading the pcap file.' \
'\n\nThe error message is: %s' % e
exit(0)
sessions = a.sessions()
print '[+] Writing to edges.csv'
f1 = open(edges_file, 'w')
f1.write('source,target,protocol,sourceport,destinationport,'
'num_of_packets\n')
for k, v in sessions.iteritems():
tot_packets = len(v)
if "UDP" in k:
proto, source, flurp, target = k.split()
srcip, srcport = source.split(":")
dstip, dstport = target.split(":")
f1.write('%s,%s,%s,%s,%s,%s\n' % (srcip, dstip, proto, srcport,
dstport, tot_packets))
continue
elif "TCP" in k:
proto, source, flurp, target = k.split()
srcip, srcport = source.split(":")
dstip, dstport = target.split(":")
f1.write('%s,%s,%s,%s,%s,%s\n' % (srcip, dstip, proto, srcport,
dstport, tot_packets))
continue
elif "ICMP" in k:
continue # Not bothered about ICMP right now
else:
continue # Or any other 'weird' pacakges for that matter ;)
print '[+] Closing the edges file'
f1.close()
As always - grateful for any assistance.一如既往 - 感谢任何帮助。
I know I'm late to the party, but hopefully this will be useful to future visitors.我知道我迟到了,但希望这对未来的访客有用。
rdpcap()
dissects the entire pcap file and retains an in-memory representation of each and every packet , which explains why it eats up a lot of memory. rdpcap()
剖析整个 pcap 文件并保留每个数据包的内存表示,这解释了为什么它会占用大量内存。
As far as I am aware (I am a novice Scapy user myself), the only two ways you can invoke Scapy's session reassembly are:据我所知(我自己是 Scapy 的新手用户),可以调用 Scapy 会话重组的仅有两种方法是:
scapy.plist.PacketList.sessions()
.scapy.plist.PacketList.sessions()
。 This is what you're currently doing ( rdpcap(pcap_file)
returns a scapy.plist.PacketList
).rdpcap(pcap_file)
返回一个scapy.plist.PacketList
)。sniff()
in offline mode while also providing the function with a session decoder implementation .sniff()
读取 pcap, 同时还提供具有会话解码器实现的功能。 For example, for TCP reassembly you'd do sniff(offline='stackoverflow.pcap', session=TCPSession)
.sniff(offline='stackoverflow.pcap', session=TCPSession)
。 (This was added in Scapy 2.4.3). Option 1 is obviously a dead end (as it requires that we keep all packets of all sessions in memory at one time), so let's explore option 2...选项 1 显然是一个死胡同(因为它要求我们一次将所有会话的所有数据包都保存在内存中),所以让我们探索选项 2...
Let's launch Scapy in interactive mode to access the documentation for sniff()
:让我们以交互模式启动 Scapy 以访问
sniff()
的文档:
$ scapy
>>> help(sniff)
Help on function sniff in module scapy.sendrecv:
sniff(*args, **kwargs)
Sniff packets and return a list of packets.
Args:
count: number of packets to capture. 0 means infinity.
store: whether to store sniffed packets or discard them
prn: function to apply to each packet. If something is returned, it
is displayed.
--Ex: prn = lambda x: x.summary()
session: a session = a flow decoder used to handle stream of packets.
e.g: IPSession (to defragment on-the-flow) or NetflowSession
filter: BPF filter to apply.
lfilter: Python function applied to each packet to determine if
further action may be done.
--Ex: lfilter = lambda x: x.haslayer(Padding)
offline: PCAP file (or list of PCAP files) to read packets from,
instead of sniffing them
timeout: stop sniffing after a given time (default: None).
L2socket: use the provided L2socket (default: use conf.L2listen).
opened_socket: provide an object (or a list of objects) ready to use
.recv() on.
stop_filter: Python function applied to each packet to determine if
we have to stop the capture after this packet.
--Ex: stop_filter = lambda x: x.haslayer(TCP)
iface: interface or list of interfaces (default: None for sniffing
on all interfaces).
monitor: use monitor mode. May not be available on all OS
started_callback: called as soon as the sniffer starts sniffing
(default: None).
The iface, offline and opened_socket parameters can be either an
element, a list of elements, or a dict object mapping an element to a
label (see examples below).
Notice the store
parameter.注意
store
参数。 We can set this to False
to make sniff()
operate in a streamed fashion (read a single packet, process it, then release it from memory):我们可以将其设置为
False
以使sniff()
以流式方式运行(读取单个数据包,处理它,然后将其从内存中释放):
sniff(offline='stackoverflow.pcap', session=TCPSession, store=False)
I just tested this with a 193 MB pcap.我刚刚用 193 MB 的 pcap 对此进行了测试。 For
store=True
(default value), this eats up about 1.7 GB of memory on my system (macOS), but only approximately 47 MB when store=False
.对于
store=True
(默认值),这在我的系统 (macOS) 上消耗了大约 1.7 GB 的内存,但当store=False
时仅消耗大约 47 MB。
Processing the reassembled TCP sessions (open question)处理重组的 TCP 会话(开放问题)
So we managed to reduce our memory footprint - great!所以我们设法减少了内存占用 - 太棒了! But how do we process the (supposedly) reassembled TCP sessions?
但是我们如何处理(据说)重组的 TCP 会话呢? The usage instructions indicates that we should use the
prn
parameter of sniff()
to specify a callback function that will then be invoked with the reassembled TCP session (emphasis mine): 使用说明表明我们应该使用
sniff()
的prn
参数来指定一个回调函数,然后将使用重组的 TCP 会话调用该函数(重点是我的):
sniff()
also provides Sessions, that allows to dissect a flow of packets seamlessly .sniff()
还提供会话,允许无缝剖析数据包流。 For instance, you may want yoursniff(prn=...)
function to automatically defragment IP packets, before executing theprn
.例如,您可能希望
sniff(prn=...)
函数在执行prn
之前自动对 IP 数据包进行碎片整理。
The example is in the context of IP fragmentation, but I'd expect the TCP analog to be to group all packets of a session and then invoke prn
once for each session.该示例是在 IP 分段的上下文中,但我希望 TCP 模拟能够将会话的所有数据包分组,然后为每个会话调用
prn
一次。 Unfortunately, that's not how it works: I tried this on my example pcap, and the callback is invoked once for every packet---exactly as indicated in sniff()
's documentation shown above.不幸的是,这不是它的工作原理:我在我的示例 pcap 上尝试了这个,并且为每个数据包调用了一次回调——正如上面显示的
sniff()
的文档中所示。
The usage instructions linked above also states the following about using session=TCPSession
in sniff()
:上面链接的使用说明还说明了有关在
sniff()
使用session=TCPSession
的以下内容:
TCPSession -> defragment certain TCP protocols*.
TCPSession -> 对某些 TCP 协议进行碎片整理*。 Only HTTP 1.0 currently uses this functionality.
目前只有 HTTP 1.0 使用此功能。
With the output of the experiment above in mind, I now interpret this as that whenever Scapy finds an HTTP (1.0) request/response that spans across multiple TCP segments, it'll create a single packet in which the payload is the merged payload of those TCP segments (which in total is the full HTTP request/response).考虑到上述实验的输出,我现在将其解释为每当 Scapy 发现跨多个 TCP 段的 HTTP (1.0) 请求/响应时,它将创建一个单个数据包,其中的有效负载是合并的有效负载这些 TCP 段(总共是完整的 HTTP 请求/响应)。 I'd appreciate it if anyone can help clarify the meaning of the above quote on TCPSession---or even better: clarify if TCP reassembly is indeed possible this way and that I'm just misunderstanding the API.
如果有人能帮助澄清上面关于 TCPSession 的引用的含义,我将不胜感激——或者甚至更好:澄清 TCP 重组是否确实可以通过这种方式进行,而我只是误解了 API。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.