What is the best way to process large amount of network packets in Python?

Question

I'm doing a research in malware detection systems. To make a model of infected systems behavior I need to process a large amount of packets from a Pcap file, group them in flows (packets with the same IPs a ports of source and destiny) and then extract some features from those flows.

I'm using DPKT to parse and read the info from the packets. My question is about the most efficient way to do the grouping process. I have begun using a PostgreSQL database, querying if a flow with the info of the package exists, and adding it to the flow or creating a new one. But I think this method is very inefficient, so I'm asking for other alternatives like using in-memory structures, improve the database or any other thing.

Answer 1

If the data fits into memory, then pythons dict data structure seems very efficient, especially speed wise.

One way to solve your problem could be to use the Counter class which is a subclass of dict:

from collections import Counter
grouped = Counter()

with open('packets.txt') as f:
    for line in f:
        src_ip, src_port, dst_ip, dst_port = ... # extract the ip address
        key = "{}--{}--{}--{}".format(src_ip, src_port, dst_ip, dest_port)
        grouped[key] += 1

most_common_combinations = grouped.most_common()

What is the best way to process large amount of network packets in Python?

Question

1 answers

solution1
1 ACCPTED 2017-12-26 19:29:41

What is the best way to process large amount of network packets in Python?

Question

1 answers

solution1 1 ACCPTED 2017-12-26 19:29:41

solution1
1 ACCPTED 2017-12-26 19:29:41