简体   繁体   中英

Can reading sessions from a pcap file with Scapy be made more memory efficient

At the moment I'm trying to write a quick Python program that reads in a .pcap file and writes out data about the various sessions that are stored in there.

The information I write out includes srcip, dstip, srcport and dstport etc.

However, even for a fairly small pcap this takes a lot of memory and ends up running for a very long time. We're talking 8GB+ of memory used for a pcap of a size of 212MB.

As usual I guess there might be a more efficient way of doing this that I'm just unaware of.

Here is a quick skeleton of my code - no vital parts missing.

import socket
from scapy.all import *


edges_file = "edges.csv"
pcap_file = "tcpdump.pcap"

try:
    print '[+] Reading and parsing pcap file: %s' % pcap_file
    a = rdpcap(pcap_file)

except Exception as e:
    print 'Something went wrong while opening/reading the pcap file.' \
          '\n\nThe error message is: %s' % e
    exit(0)

sessions = a.sessions()

print '[+] Writing to edges.csv'
f1 = open(edges_file, 'w')
f1.write('source,target,protocol,sourceport,destinationport,'
         'num_of_packets\n')
for k, v in sessions.iteritems():

    tot_packets = len(v)

    if "UDP" in k:
        proto, source, flurp, target = k.split()
        srcip, srcport = source.split(":")
        dstip, dstport = target.split(":")
        f1.write('%s,%s,%s,%s,%s,%s\n' % (srcip, dstip, proto, srcport,
                                          dstport, tot_packets))
        continue

    elif "TCP" in k:
        proto, source, flurp, target = k.split()
        srcip, srcport = source.split(":")
        dstip, dstport = target.split(":")
        f1.write('%s,%s,%s,%s,%s,%s\n' % (srcip, dstip, proto, srcport,
                                          dstport, tot_packets))
        continue

    elif "ICMP" in k:
        continue  # Not bothered about ICMP right now

    else:
        continue  # Or any other 'weird' pacakges for that matter ;)

print '[+] Closing the edges file'
f1.close()

As always - grateful for any assistance.

I know I'm late to the party, but hopefully this will be useful to future visitors.

rdpcap() dissects the entire pcap file and retains an in-memory representation of each and every packet , which explains why it eats up a lot of memory.

As far as I am aware (I am a novice Scapy user myself), the only two ways you can invoke Scapy's session reassembly are:

  1. By calling scapy.plist.PacketList.sessions() . This is what you're currently doing ( rdpcap(pcap_file) returns a scapy.plist.PacketList ).
  2. By reading the pcap using sniff() in offline mode while also providing the function with a session decoder implementation . For example, for TCP reassembly you'd do sniff(offline='stackoverflow.pcap', session=TCPSession) . (This was added in Scapy 2.4.3).

Option 1 is obviously a dead end (as it requires that we keep all packets of all sessions in memory at one time), so let's explore option 2...

Let's launch Scapy in interactive mode to access the documentation for sniff() :

$ scapy
>>> help(sniff)

Help on function sniff in module scapy.sendrecv:

sniff(*args, **kwargs)
    Sniff packets and return a list of packets.
    
    Args:
        count: number of packets to capture. 0 means infinity.
        store: whether to store sniffed packets or discard them
        prn: function to apply to each packet. If something is returned, it
             is displayed.
             --Ex: prn = lambda x: x.summary()
        session: a session = a flow decoder used to handle stream of packets.
                 e.g: IPSession (to defragment on-the-flow) or NetflowSession
        filter: BPF filter to apply.
        lfilter: Python function applied to each packet to determine if
                 further action may be done.
                 --Ex: lfilter = lambda x: x.haslayer(Padding)
        offline: PCAP file (or list of PCAP files) to read packets from,
                 instead of sniffing them
        timeout: stop sniffing after a given time (default: None).
        L2socket: use the provided L2socket (default: use conf.L2listen).
        opened_socket: provide an object (or a list of objects) ready to use
                      .recv() on.
        stop_filter: Python function applied to each packet to determine if
                     we have to stop the capture after this packet.
                     --Ex: stop_filter = lambda x: x.haslayer(TCP)
        iface: interface or list of interfaces (default: None for sniffing
               on all interfaces).
        monitor: use monitor mode. May not be available on all OS
        started_callback: called as soon as the sniffer starts sniffing
                          (default: None).
    
    The iface, offline and opened_socket parameters can be either an
    element, a list of elements, or a dict object mapping an element to a
    label (see examples below).

Notice the store parameter. We can set this to False to make sniff() operate in a streamed fashion (read a single packet, process it, then release it from memory):

sniff(offline='stackoverflow.pcap', session=TCPSession, store=False)

I just tested this with a 193 MB pcap. For store=True (default value), this eats up about 1.7 GB of memory on my system (macOS), but only approximately 47 MB when store=False .

Processing the reassembled TCP sessions (open question)

So we managed to reduce our memory footprint - great! But how do we process the (supposedly) reassembled TCP sessions? The usage instructions indicates that we should use the prn parameter of sniff() to specify a callback function that will then be invoked with the reassembled TCP session (emphasis mine):

sniff() also provides Sessions, that allows to dissect a flow of packets seamlessly . For instance, you may want your sniff(prn=...) function to automatically defragment IP packets, before executing the prn .

The example is in the context of IP fragmentation, but I'd expect the TCP analog to be to group all packets of a session and then invoke prn once for each session. Unfortunately, that's not how it works: I tried this on my example pcap, and the callback is invoked once for every packet---exactly as indicated in sniff() 's documentation shown above.

The usage instructions linked above also states the following about using session=TCPSession in sniff() :

TCPSession -> defragment certain TCP protocols*. Only HTTP 1.0 currently uses this functionality.

With the output of the experiment above in mind, I now interpret this as that whenever Scapy finds an HTTP (1.0) request/response that spans across multiple TCP segments, it'll create a single packet in which the payload is the merged payload of those TCP segments (which in total is the full HTTP request/response). I'd appreciate it if anyone can help clarify the meaning of the above quote on TCPSession---or even better: clarify if TCP reassembly is indeed possible this way and that I'm just misunderstanding the API.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM