简体   繁体   中英

Python combine overlapping time ranges in csv file

I am attempting to use python to iterate over a csv file, find overlapping time ranges and then sum the corresponding bandwidth per second (bps) value in the last column. The resulting csv file should indicate how much bandwidth or bps is consumed during each time period.

The source file has the following format; start time, end time, Proto, SrcIP, DstIP, bps 00:06:01,00:06:02,TCP,10.33.239.176,172.16.168.7,699619 00:06:01,00:06:02,ICMP,10.33.236.247,172.16.171.254,0 00:06:01,00:06:02,UDP,10.33.238.55,172.16.175.253,12473 03:10:02,03:10:02,UDP,10.33.238.55,172.16.160.2,25 03:10:02,03:10:02,TCP,10.33.236.59,172.16.168.9,5

The resulting csv file should have the following format; start time, end time, bps 00:06:01,00:06:02, 712092 03:10:02,03:10:02, 30

I am a python novice and have tried using dictionaries to remove duplicates. I am sure there is a better way to do this ...

Here is my non working code;

import csv

src_file = open('c:/test/format1.csv', 'rb')
dst_file = open('c:/test/format2.csv', 'wb')
reader = csv.reader(src_file)
writer = csv.writer(dst_file,delimiter=',')

dict1 = {}
dict2 = {}
dkey = 1

# read csv values into dict1
for row in reader:
    start = row[0]
    end = row[1]
    bps = int(row[7])
    dkey += 1
    dict1[dkey] = [start, end, bps]

# read dict1 results into a new dict2 removing duplicates and summing the bps column
for k, v in dict2.items():
    if v[0] and v[1] in v:
        dict2[k] = [v[0], v[1]]
        dict2[k] += [v[2]]
    else:
        dict2[k] = [v]

print dict2

The code returns: {}

Thanks.

It looks like you are perhaps making this a little more complicated than it needs to be... if by overlapping time stamps you mean exactly the same [which is what you code assumes] then you can simply construct the dict using a tuple of the timestamps as the key to the dictionary and then sum up the bps (row[5]). Using a defaultdict(int) for a convenience of automatically setting the default for a key to 0:

from collections import defaultdict

dict1 = defaultdict(int)
# read csv values into dict1
for row in reader:
    dict1[(row[0], row[1])] += int(row[5])

print(dict(dict1))

Output:

{('00:06:01', '00:06:02'): 712092, ('03:10:02', '03:10:02'): 30}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM