Python在csv文件中合并重叠的时间范围

Question

我正在尝试使用python遍历一个csv文件，找到重叠的时间范围，然后在最后一列中求和相应的每秒带宽（bps）值。 生成的csv文件应指示每个时间段消耗了多少带宽或bps。

源文件具有以下格式； 开始时间，结束时间，Proto，SrcIP，DstIP，bps 00：06：01,00：06：02，TCP，10.33.239.176,172.16.168.7,699619 00：06：01,00：06：02，ICMP， 10.33.236.247,172.16.171.254,0 00：06：01,00：06：02，UDP，10.33.238.55,172.16.175.253,12473 03：10：02,03：10：02，UDP，10.33.238.55， 172.16.160.2,25 03：10：02,03：10：02，TCP，10.33.236.59,172.16.168.9,5

生成的csv文件应采用以下格式； 开始时间，结束时间，bps 00：06：01,00：06：02，712092 03：10：02,03：10：02，30

我是python新手，并尝试使用字典删除重复项。 我相信有更好的方法可以做到这一点...

这是我的无效代码；

import csv

src_file = open('c:/test/format1.csv', 'rb')
dst_file = open('c:/test/format2.csv', 'wb')
reader = csv.reader(src_file)
writer = csv.writer(dst_file,delimiter=',')

dict1 = {}
dict2 = {}
dkey = 1

# read csv values into dict1
for row in reader:
    start = row[0]
    end = row[1]
    bps = int(row[7])
    dkey += 1
    dict1[dkey] = [start, end, bps]

# read dict1 results into a new dict2 removing duplicates and summing the bps column
for k, v in dict2.items():
    if v[0] and v[1] in v:
        dict2[k] = [v[0], v[1]]
        dict2[k] += [v[2]]
    else:
        dict2[k] = [v]

print dict2

代码返回：{}

谢谢。

Answer 1

看起来您可能正在使它变得比所需的要复杂一些。。。如果重叠时间戳意味着完全相同（这是您的代码所假设的），那么您可以简单地使用时间戳作为字典的键，然后将bps相加（行[5]）。 使用defaultdict（int）可以方便地将键的默认值自动设置为0：

from collections import defaultdict

dict1 = defaultdict(int)
# read csv values into dict1
for row in reader:
    dict1[(row[0], row[1])] += int(row[5])

print(dict(dict1))

输出：

{('00:06:01', '00:06:02'): 712092, ('03:10:02', '03:10:02'): 30}

Python在csv文件中合并重叠的时间范围

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-04-07 03:24:18

Python在csv文件中合并重叠的时间范围

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-04-07 03:24:18

解决方案1
0 已采纳 2015-04-07 03:24:18