简体   繁体   English

根据时间间隔分割CSV文件

[英]Split csv files based on time intervals

I have exported wireshark pcap files to csv. 我已经将wireshark pcap文件导出到csv。 I need to split these csv files based on time intervals. 我需要根据时间间隔拆分这些csv文件。 In the csv file there is a 'time' column. 在csv文件中,有一个“时间”列。 I would like to split these files into 1 sec time interval. 我想将这些文件分成1秒的时间间隔。 So the first few packets arrived in the first 1 sec will get written to one file, next packets arrived in the next 1 sec into another file, so on. 因此,在前1秒到达的前几个数据包将被写入一个文件,在后1秒到达的下一个数据包将被写入另一个文件,依此类推。 If the input file name is AAA.csv the split files will get the same name with a number attached to end. 如果输入文件名为AAA.csv,则拆分文件将获得相同的名称,并在末尾附加一个数字。 AAA1.csv,.....AAA5.csv so on. AAA1.csv,..... AAA5.csv等。 I am new to programming so not quite sure how to go forward from this point. 我是编程新手,所以不太确定如何从这一点着手。 Please help. 请帮忙。 Thanks https://fil.email/8wSH9ohq 谢谢https://fil.email/8wSH9ohq

import os
startdir='.'
suffix='.csv'
for root, dirs, files in os.walk(startdir):
  for name in files:
    if name.endswith(suffix):
      filename=os.path.join(root,name)

Here is an extract of a csv file with rows from 2 consecutive seconds: 这是一个csv文件的摘录,其中包含连续2秒的行:

"No.","Time","Time delta from previous displayed frame","Length","Source","Destination","Protocol","Info"
"100","23:39:52.634388","0.000502000","28","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","802.11 Block Ack, Flags=........"
"101","23:39:52.634393","0.000005000","102","Htc_9b:92:24","HuaweiTe_3a:d0:16","802.11","QoS Data, SN=45, FN=0, Flags=.p.....T"
"102","23:39:52.695277","0.060884000","28","Microsof_d2:8b:4f (30:59:b7:d2:8b:4f) (TA)","Sagemcom_28:38:64 (d0:6e:de:28:38:64) (RA)","802.11","802.11 Block Ack, Flags=........"
"103","23:39:52.695278","0.000001000","10","","Sagemcom_28:38:64 (d0:6e:de:28:38:64) (RA)","802.11","Clear-to-send, Flags=........"
"104","23:39:52.717845","0.022567000","16","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Request-to-send, Flags=........"
"105","23:39:52.717845","0.000000000","406","HuaweiTe_3a:d0:16","Htc_9b:92:24","802.11","QoS Data, SN=3446, FN=0, Flags=.p....F."
"106","23:39:52.717852","0.000007000","28","Htc_9b:92:24 (ac:37:43:9b:92:24) (TA)","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (RA)","802.11","802.11 Block Ack, Flags=........"
"107","23:39:52.717853","0.000001000","10","","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (RA)","802.11","Clear-to-send, Flags=........"
"108","23:39:52.719380","0.001527000","28","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","802.11 Block Ack, Flags=........"
"109","23:39:52.719384","0.000004000","102","Htc_9b:92:24","HuaweiTe_3a:d0:16","802.11","QoS Data, SN=46, FN=0, Flags=.p.....T"
"110","23:39:52.719389","0.000005000","10","","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Clear-to-send, Flags=........"
"111","23:39:53.109091","0.389702000","24","Htc_9b:92:24","HuaweiTe_3a:d0:1a","802.11","Null function (No data), SN=4069, FN=0, Flags=...P...T"
"112","23:39:53.109586","0.000495000","10","","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Acknowledgement, Flags=........"
"113","23:39:53.149481","0.039895000","28","Sagemcom_28:38:64 (d0:6e:de:28:38:64) (TA)","Microsof_a0:a4:2c (58:82:a8:a0:a4:2c) (RA)","802.11","802.11 Block Ack, Flags=........"
"114","23:39:53.157218","0.007737000","24","Htc_9b:92:24","HuaweiTe_3a:d0:1a","802.11","Null function (No data), SN=4070, FN=0, Flags=.......T"
"115","23:39:53.159251","0.002033000","10","","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Acknowledgement, Flags=........"
"116","23:39:53.159252","0.000001000","16","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Request-to-send, Flags=........"
"117","23:39:53.159267","0.000015000","10","","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (RA)","802.11","Clear-to-send, Flags=........"
"118","23:39:53.160276","0.001009000","16","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Request-to-send, Flags=........"
"119","23:39:53.160277","0.000001000","1500","HuaweiTe_3a:d0:16","Htc_9b:92:24","802.11","QoS Data, SN=3447, FN=0, Flags=.p....F."
"120","23:39:53.160290","0.000013000","28","Htc_9b:92:24 (ac:37:43:9b:92:24) (TA)","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (RA)","802.11","802.11 Block Ack, Flags=........"

The csv module is enough here. 在这里,csv模块就足够了。 You just have to read every file one line at a time. 您只需要一次读取每个文件一行。 If the 8 first characters of the Time field (second one) are the same as the ones of the previous row, then copy the row in the same output file, else create a new output file. 如果“时间”字段的前8个字符(第二个)与上一行相同,则将该行复制到同一输出文件中,否则创建一个新的输出文件。

It can be coded as: 它可以编码为:

import os
import csv
startdir='.'
suffix='.csv'
for root, dirs, files in os.walk(startdir):
    for name in files:
        if name.endswith(suffix):
            filename=os.path.join(root,name)
            with open(filename) as fd:        # open the csv file
                rd = csv.reader(fd)           #  as a csv input file
                old = None                    # no previous line
                i = 0                         # we will start numbering output files with 1
                header = next(rd)             # store the header line
                for row in rd:
                    if row[1][:8] != old:     # we have a different second (or the first one...)
                        old = row[1][:8]      # store current time for next rows
                        i += 1                # increase output file number
                        if old is not None:   # eventually close previous output file
                            fdout.close()
                        fdout = open(filename[:-4] + str(i) + filename[-4:],
                                 'w', newline='')     # open a new output file
                        wr = csv.writer(fdout, quoting=csv.QUOTE_ALL)  # with expected csv params
                        _ = wr.writerow(header)   # write the header
                    _ = wr.writerow(row)      # copy the row to the current output file
                fdout.close()

Above code uses the fact that a second can be determined without parsing directly in the Time string. 上面的代码使用这样的事实,即无需直接在Time字符串中进行解析就可以确定秒。 If you need variable duration eventually smaller that the second, you need to parse the time string and convert it to a decimal (of more exactly floating point) number of seconds and divide it by the choosen duration in seconds: 如果您需要可变的持续时间最终小于秒,则需要解析时间字符串并将其转换为十进制(更精确地为浮点数)秒,然后将其除以以秒为单位的所选持续时间:

...
sec_duration=0.5     # for half a second
                ...
                for row in rd:
                    # convert the Time field to a total number of seconds in day
                    #  as a flot
                    cur = datetime.datetime.strptime(row[1], "%H:%M:%S.%f")
                    cur -= cur.replace(hour=0, minute=0, second=0, microsecond=0)
                    # make it a number of periods of sec_duration
                    cur = int(cur.total_seconds() / sec_duration)
                    if cur != old:     # we have a different period (or the first one...)
                        if old is not None:   # eventually close previous output file
                            fdout.close()
                        old = cur      # store current time for next rows
                        i += 1                # increase output file number
                ...

This should get you started. 这应该使您入门。 This will split your sample csv into 11 different files. 这会将您的示例csv分为11个不同的文件。 I suggest creating a test directory and test with the code below if it does what you want it to do. 我建议创建一个测试目录,并使用下面的代码进行测试(如果它符合您的期望)。

import os
# pandas to read / write csv and process the data
import pandas as pd
startdir='.'
suffix='.csv'
for root, dirs, files in os.walk(startdir):
  for name in files:
    if name.endswith(suffix):
      filename=os.path.join(root,name)
      #print(filename)
      df = pd.read_csv(filename) 
      # Extract the time for grouping
      col_time = pd.to_datetime(dat1['Time'])
      # Group the values according to second(minute might be not needed)
      df2 = df.groupby([col_time.dt.second,col_time.dt.minute]) 
      # now split the data frame according to group and put them in a list
      list_of_df = [df2.get_group(x) for x in df2.groups]
      # get the data frame from the list and write them 
      for i in range(len(list_of_df)):
        list_of_df[i].to_csv(file_nme[:-4]+str(i)+".csv")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM