Python combine overlapping time ranges in csv file

Question

I am attempting to use python to iterate over a csv file, find overlapping time ranges and then sum the corresponding bandwidth per second (bps) value in the last column. The resulting csv file should indicate how much bandwidth or bps is consumed during each time period.

The source file has the following format; start time, end time, Proto, SrcIP, DstIP, bps 00:06:01,00:06:02,TCP,10.33.239.176,172.16.168.7,699619 00:06:01,00:06:02,ICMP,10.33.236.247,172.16.171.254,0 00:06:01,00:06:02,UDP,10.33.238.55,172.16.175.253,12473 03:10:02,03:10:02,UDP,10.33.238.55,172.16.160.2,25 03:10:02,03:10:02,TCP,10.33.236.59,172.16.168.9,5

The resulting csv file should have the following format; start time, end time, bps 00:06:01,00:06:02, 712092 03:10:02,03:10:02, 30

I am a python novice and have tried using dictionaries to remove duplicates. I am sure there is a better way to do this ...

Here is my non working code;

import csv

src_file = open('c:/test/format1.csv', 'rb')
dst_file = open('c:/test/format2.csv', 'wb')
reader = csv.reader(src_file)
writer = csv.writer(dst_file,delimiter=',')

dict1 = {}
dict2 = {}
dkey = 1

# read csv values into dict1
for row in reader:
    start = row[0]
    end = row[1]
    bps = int(row[7])
    dkey += 1
    dict1[dkey] = [start, end, bps]

# read dict1 results into a new dict2 removing duplicates and summing the bps column
for k, v in dict2.items():
    if v[0] and v[1] in v:
        dict2[k] = [v[0], v[1]]
        dict2[k] += [v[2]]
    else:
        dict2[k] = [v]

print dict2

The code returns: {}

Thanks.

Answer 1

It looks like you are perhaps making this a little more complicated than it needs to be... if by overlapping time stamps you mean exactly the same [which is what you code assumes] then you can simply construct the dict using a tuple of the timestamps as the key to the dictionary and then sum up the bps (row[5]). Using a defaultdict(int) for a convenience of automatically setting the default for a key to 0:

from collections import defaultdict

dict1 = defaultdict(int)
# read csv values into dict1
for row in reader:
    dict1[(row[0], row[1])] += int(row[5])

print(dict(dict1))

Output:

{('00:06:01', '00:06:02'): 712092, ('03:10:02', '03:10:02'): 30}

Python combine overlapping time ranges in csv file

Question

1 answers

solution1
0 ACCPTED 2015-04-07 03:24:18

Python combine overlapping time ranges in csv file

Question

1 answers

solution1 0 ACCPTED 2015-04-07 03:24:18

solution1
0 ACCPTED 2015-04-07 03:24:18