简体   繁体   中英

txt to csv using Python

I have a sql dump in txt format , it looks like this way -

"Date:","8/21/2015","","Time:","16:18:38","","Name:","NC.S.RHU10.BRD"
"System Name:","NC.S.RHU10.BRD"
"Operator:","SYSTEM"
"Action:","Trend data loss"
"Comment:"," trend definition data loss occurred at 10:21:05 AM on 8/21/2015"
"Revision:","6"
"Location:",""
"Seq Number:","1278738"
" ********************************************************************************"
"Date:","8/21/2015","","Time:","16:17:17","","Name:","SC.L.SIDESHOWBOB.MBC009"
"System Name:","SC.L.SIDESHOWBOB.MBC009"
"Operator:","SYSTEM"
"Action:","FLN device return from failure"
"Comment:","Z8 RETURN from failure in Cabinet 9, Lan 3, Drop 1."
"Revision:","81"
"Location:","SC.L.SIDESHOWBOB.MBC009"
"Seq Number:","1278737"
" ********************************************************************************"
"Date:","8/21/2015","","Time:","16:17:17","","Name:","NC.S.EHU07.EAT"
"System Name:","NC.S.EHU07.EAT"
"Operator:","ITWVSIEMP01\InsightSCH"
"Action:","Trend data collection The target object could not be found on the Field"
"Panel."
"Comment:","Trend COV (0.000)  Failed - The target object could not be found on the"
"Field Panel"
"Revision:","1318"
"Location:","ITWVSIEMP01"
"Seq Number:","1278735"
" ********************************************************************************"
"Date:","8/21/2015","","Time:","16:17:15","","Name:","NC.S.EHU03.TCFM"
"System Name:","NC.S.EHU03.TCFM"
"Operator:","ITWVSIEMP01\InsightSCH"
"Action:","Trend data collection"
"Comment:","COV                Data Loss Detected"
"Revision:","1481"
"Location:","ITWVSIEMP01"
"Seq Number:","1278734"
" ********************************************************************************

I want to convert in column way using Python with following fields :-

"Date","Time","Name","System Name","Operator","Action","Comment","Type","Revision","Location","Seq Number"

Is there a ready function in python that does this ?

import csv

c = csv.writer(open('out.csv', 'w'), delimiter=',')

file = open('myfile.txt')
for col in file:
  data = col.split('\t')
 # find index "Date=0","Time=1","Name=2","System Name=3","Operator=4","Action=5","Comment=6","Type=7","Revision=8","Location=9","Seq Number=10"
  c.writerow(data[0],data[1],data[2],data[3],data[4],data[5],data[6],data[7],data[8],data[9],data[10])
f.close()
import operator
import csv

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    data = {}
    writer = csv.writer(outfile, delimiter=',')
    writer.writerow(["Date","Time","Name","System Name","Operator","Action","Comment","Revision","Location","Seq Number"])
    fields = operator.itemgetter("Date","Time","Name","System Name","Operator","Action","Comment","Revision","Location","Seq Number")
    for line in infile:
        if line.startswith('" *'):
            try:
                writer.writerow(fields(data))
            except AttributeError:
                print('malformed input')
                raise
            data = {}
            continue

        parts = line.split(',')
        if line.startswith('"Date'):
            data['Date'] = parts[1]
            data['Time'] = parts[4]
            data['Name'] = parts[-1]
            continue

        name = parts[0].strip('"').rstrip(":")
        value = parts[1].strip('"')
        data[name] = value

I've just written a little utility here . Maybe this could help you.

I think the last line of your input file is missing a " . Please add it at the end for a uniform delimiter.

The following script should work, it generates your header fields automatically and preserves the order in the CSV file, as such it should still work if the format changes a bit:

import csv

with open("sqldump.txt", "r") as f_input, open("output.csv", "wb") as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    headers = []
    for cols in csv_input:
        if len(cols) > 1:
            headers.extend([header.strip(":") for header in cols if header.endswith(':')])
        else:
            break

    csv_output.writerow(headers)
    f_input.seek(0)

    entry = []
    for cols in csv_input:
        if cols[0] == 'Date:':
            entry.extend([cols[1], cols[4], cols[-1]])
        elif len(cols) > 1:
            entry.append(cols[1])
        elif cols[0].startswith(' *'):
            csv_output.writerow(entry)
            entry = []

This would give you an output CSV file looking like:

Date,Time,Name,System Name,Operator,Action,Comment,Revision,Location,Seq Number
8/21/2015,16:18:38,NC.S.RHU10.BRD,NC.S.RHU10.BRD,SYSTEM,Trend data loss, trend definition data loss occurred at 10:21:05 AM on 8/21/2015,6,,1278738
8/21/2015,16:17:17,SC.L.SIDESHOWBOB.MBC009,SC.L.SIDESHOWBOB.MBC009,SYSTEM,FLN device return from failure,"Z8 RETURN from failure in Cabinet 9, Lan 3, Drop 1.",81,SC.L.SIDESHOWBOB.MBC009,1278737
8/21/2015,16:17:17,NC.S.EHU07.EAT,NC.S.EHU07.EAT,ITWVSIEMP01\InsightSCH,Trend data collection The target object could not be found on the Field,Trend COV (0.000)  Failed - The target object could not be found on the,1318,ITWVSIEMP01,1278735
8/21/2015,16:17:15,NC.S.EHU03.TCFM,NC.S.EHU03.TCFM,ITWVSIEMP01\InsightSCH,Trend data collection,COV                Data Loss Detected,1481,ITWVSIEMP01,1278734

Tested using Python 2.7. If you are using Python 3.0, change the code to open("output.csv", "w", newline="")

Note, there is no 'Type' field in your example data?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM