简体   繁体   中英

Split csv files based on time intervals

I have exported wireshark pcap files to csv. I need to split these csv files based on time intervals. In the csv file there is a 'time' column. I would like to split these files into 1 sec time interval. So the first few packets arrived in the first 1 sec will get written to one file, next packets arrived in the next 1 sec into another file, so on. If the input file name is AAA.csv the split files will get the same name with a number attached to end. AAA1.csv,.....AAA5.csv so on. I am new to programming so not quite sure how to go forward from this point. Please help. Thanks https://fil.email/8wSH9ohq

import os
startdir='.'
suffix='.csv'
for root, dirs, files in os.walk(startdir):
  for name in files:
    if name.endswith(suffix):
      filename=os.path.join(root,name)

Here is an extract of a csv file with rows from 2 consecutive seconds:

"No.","Time","Time delta from previous displayed frame","Length","Source","Destination","Protocol","Info"
"100","23:39:52.634388","0.000502000","28","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","802.11 Block Ack, Flags=........"
"101","23:39:52.634393","0.000005000","102","Htc_9b:92:24","HuaweiTe_3a:d0:16","802.11","QoS Data, SN=45, FN=0, Flags=.p.....T"
"102","23:39:52.695277","0.060884000","28","Microsof_d2:8b:4f (30:59:b7:d2:8b:4f) (TA)","Sagemcom_28:38:64 (d0:6e:de:28:38:64) (RA)","802.11","802.11 Block Ack, Flags=........"
"103","23:39:52.695278","0.000001000","10","","Sagemcom_28:38:64 (d0:6e:de:28:38:64) (RA)","802.11","Clear-to-send, Flags=........"
"104","23:39:52.717845","0.022567000","16","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Request-to-send, Flags=........"
"105","23:39:52.717845","0.000000000","406","HuaweiTe_3a:d0:16","Htc_9b:92:24","802.11","QoS Data, SN=3446, FN=0, Flags=.p....F."
"106","23:39:52.717852","0.000007000","28","Htc_9b:92:24 (ac:37:43:9b:92:24) (TA)","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (RA)","802.11","802.11 Block Ack, Flags=........"
"107","23:39:52.717853","0.000001000","10","","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (RA)","802.11","Clear-to-send, Flags=........"
"108","23:39:52.719380","0.001527000","28","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","802.11 Block Ack, Flags=........"
"109","23:39:52.719384","0.000004000","102","Htc_9b:92:24","HuaweiTe_3a:d0:16","802.11","QoS Data, SN=46, FN=0, Flags=.p.....T"
"110","23:39:52.719389","0.000005000","10","","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Clear-to-send, Flags=........"
"111","23:39:53.109091","0.389702000","24","Htc_9b:92:24","HuaweiTe_3a:d0:1a","802.11","Null function (No data), SN=4069, FN=0, Flags=...P...T"
"112","23:39:53.109586","0.000495000","10","","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Acknowledgement, Flags=........"
"113","23:39:53.149481","0.039895000","28","Sagemcom_28:38:64 (d0:6e:de:28:38:64) (TA)","Microsof_a0:a4:2c (58:82:a8:a0:a4:2c) (RA)","802.11","802.11 Block Ack, Flags=........"
"114","23:39:53.157218","0.007737000","24","Htc_9b:92:24","HuaweiTe_3a:d0:1a","802.11","Null function (No data), SN=4070, FN=0, Flags=.......T"
"115","23:39:53.159251","0.002033000","10","","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Acknowledgement, Flags=........"
"116","23:39:53.159252","0.000001000","16","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Request-to-send, Flags=........"
"117","23:39:53.159267","0.000015000","10","","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (RA)","802.11","Clear-to-send, Flags=........"
"118","23:39:53.160276","0.001009000","16","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Request-to-send, Flags=........"
"119","23:39:53.160277","0.000001000","1500","HuaweiTe_3a:d0:16","Htc_9b:92:24","802.11","QoS Data, SN=3447, FN=0, Flags=.p....F."
"120","23:39:53.160290","0.000013000","28","Htc_9b:92:24 (ac:37:43:9b:92:24) (TA)","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (RA)","802.11","802.11 Block Ack, Flags=........"

The csv module is enough here. You just have to read every file one line at a time. If the 8 first characters of the Time field (second one) are the same as the ones of the previous row, then copy the row in the same output file, else create a new output file.

It can be coded as:

import os
import csv
startdir='.'
suffix='.csv'
for root, dirs, files in os.walk(startdir):
    for name in files:
        if name.endswith(suffix):
            filename=os.path.join(root,name)
            with open(filename) as fd:        # open the csv file
                rd = csv.reader(fd)           #  as a csv input file
                old = None                    # no previous line
                i = 0                         # we will start numbering output files with 1
                header = next(rd)             # store the header line
                for row in rd:
                    if row[1][:8] != old:     # we have a different second (or the first one...)
                        old = row[1][:8]      # store current time for next rows
                        i += 1                # increase output file number
                        if old is not None:   # eventually close previous output file
                            fdout.close()
                        fdout = open(filename[:-4] + str(i) + filename[-4:],
                                 'w', newline='')     # open a new output file
                        wr = csv.writer(fdout, quoting=csv.QUOTE_ALL)  # with expected csv params
                        _ = wr.writerow(header)   # write the header
                    _ = wr.writerow(row)      # copy the row to the current output file
                fdout.close()

Above code uses the fact that a second can be determined without parsing directly in the Time string. If you need variable duration eventually smaller that the second, you need to parse the time string and convert it to a decimal (of more exactly floating point) number of seconds and divide it by the choosen duration in seconds:

...
sec_duration=0.5     # for half a second
                ...
                for row in rd:
                    # convert the Time field to a total number of seconds in day
                    #  as a flot
                    cur = datetime.datetime.strptime(row[1], "%H:%M:%S.%f")
                    cur -= cur.replace(hour=0, minute=0, second=0, microsecond=0)
                    # make it a number of periods of sec_duration
                    cur = int(cur.total_seconds() / sec_duration)
                    if cur != old:     # we have a different period (or the first one...)
                        if old is not None:   # eventually close previous output file
                            fdout.close()
                        old = cur      # store current time for next rows
                        i += 1                # increase output file number
                ...

This should get you started. This will split your sample csv into 11 different files. I suggest creating a test directory and test with the code below if it does what you want it to do.

import os
# pandas to read / write csv and process the data
import pandas as pd
startdir='.'
suffix='.csv'
for root, dirs, files in os.walk(startdir):
  for name in files:
    if name.endswith(suffix):
      filename=os.path.join(root,name)
      #print(filename)
      df = pd.read_csv(filename) 
      # Extract the time for grouping
      col_time = pd.to_datetime(dat1['Time'])
      # Group the values according to second(minute might be not needed)
      df2 = df.groupby([col_time.dt.second,col_time.dt.minute]) 
      # now split the data frame according to group and put them in a list
      list_of_df = [df2.get_group(x) for x in df2.groups]
      # get the data frame from the list and write them 
      for i in range(len(list_of_df)):
        list_of_df[i].to_csv(file_nme[:-4]+str(i)+".csv")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM