简体   繁体   中英

converting structured text file to csv(unable to change rows into columns):

I am trying to convert the text file into CSV in Python The input text file is as follows:

Employee Name: Dr.john doe
Designation: Professor
Email: johndoe@google.com
ContactNo: 1234567, 9999999
Qualification: M.Tech., Ph.D.
Area of Interest / Specialisation: network security
Employee Name: Dr. john doe2 
Designation: Professor2
Email: johndoe2@google.com
ContactNo: 222222222
Qualification: B.Tech., Ph.D.
Area of Interest / Specialisation: network security2
Employee Name: Dr. john doe3 
Designation: Associate Professor3
Email: johndoe3@google.com
ContactNo: 333333,4444444
Qualification: Ph.D.
Area of Interest / Specialisation: network security3
Designation: Associate Professor4
Email: johndoe4@google.com
ContactNo: 44444444 ,Intercom No.44444
Qualification: : M.Sc. 
Designation: Programmer
Email: johndoe5@google.com
ContactNo: 5555555555 ,Intercom No.5555
Qualification: Ph.D |Computer Science
Designation: Computer Operator
Email: johndoe6@google.com
ContactNo: 666666666
Qualification: D.C.Sc. & E.,
Designation: Computer Operator
Email: johndoe7@google.com
ContactNo: 777777777 ,Intercom No.77777<
Qualification: D.E & TC.,
Designation: Instructor4
Email: johndoe8@google.com
ContactNo: 8888888888 ,Intercom No.8888
Qualification: D.C.Sc. & E.,`

I need it in CSV in following format

Employee name,designation,email,contact,Qualification,Specialisation       
Dr. john doe,Professor,johndoe@google.com,1234567,B.E.,network security     
Dr. john doe2,Professor,johndoe2@google.com,222222222,M.S.,network security2    
Dr. john doe3,Associate,Professor3,johndoe3@gmail.com,333333,M.Tech.,network security3

i've tried this

with open('test.txt', 'r') as records:
    stripped = (line.strip() for line in records)
    lines = (line.split(":") for line in stripped if line)
    with open('log.csv', 'w') as out_file:
        writer = csv.writer(out_file)
        writer.writerows(lines)

my above code gives following output which has only two rows(I Dont know how to make 6 columns and add the tuples in rows):

Employee Name, Dr.john doe
Designation, Professor
Email, johndoe@google.com
ContactNo, 1234567, 9999999
Qualification, M.Tech., Ph.D.
Area of Interest / Specialisation, network security
Employee Name, Dr. john doe2 
Designation, Professor2
Email, johndoe2@google.com
ContactNo, 222222222
Qualification, B.Tech., Ph.D.
Area of Interest / Specialisation, network security2
Employee Name, Dr. john doe3 
Designation, Associate Professor3
Email, johndoe3@google.com
ContactNo, 333333,4444444
Qualification, Ph.D.
Area of Interest / Specialisation, network security3

in short : I am able to seperate the attribute name and its value, but i do not know how to populate the values in specific fields.

if you are familiar with pandas so you can simply use this code

import pandas as pd

with open('test.txt', 'r') as records:
    lines = [(line.split(':'))[1] for line in records.readlines()]
    col_titles = ('Employee name', 'designation','email','contact','Qualification','Specialisation')
    data = pd.np.array(lines).reshape((len(lines) // 6, 6))
    pd.DataFrame(data, columns=col_titles).to_csv("output.csv", index=False)

I think this works:

import csv, collections

with open('test.txt', 'r') as record_fields, open('log.csv', 'w') as out_file:
    records, fieldnames, record = [], collections.OrderedDict(), {}
    for field in record_fields:
        name, _, value = field.strip().partition(": ")
        if name == "Employee Name" and record:
            records.append(record)
            record = {}
        if name not in record: record[name] = value
        fieldnames[name] = None
    records.append(record)

    writer = csv.DictWriter(out_file, fieldnames=fieldnames.keys())
    writer.writeheader()
    writer.writerows(records)

It gives me:

Employee Name,Designation,Email,ContactNo,Qualification,Area of Interest / Specialisation
Dr.john doe,Professor,johndoe@google.com,"1234567, 9999999","M.Tech., Ph.D.",network security
Dr. john doe2,Professor2,johndoe2@google.com,222222222,"B.Tech., Ph.D.",network security2
Dr. john doe3,Associate Professor3,johndoe3@google.com,"333333,4444444",Ph.D.,network security3

You can use itertools.groupby to find the different information blocks for each employee:

import itertools, csv
data = [i.strip('\n').split(': ') for i in open('university_employees.txt')]
new_data = [[a, list(b)] for a, b in itertools.groupby(data, key=lambda x:x[0] == 'Employee Name')]
header = [c for b in new_data[:2] for c, _ in b[-1]]
a, b, *d = [[new_data[i][-1][-1][-1], *[' '.join(c) for _, *c in new_data[i+1][-1]]] for i in range(0, len(new_data), 2)]
with open('professors.csv', 'w') as f:
  write = csv.writer(f)
  write.writerows([header, a, b, d[0][:6]])

Output:

Employee Name,Designation,Email,ContactNo,Qualification,Area of Interest / Specialisation
Dr.john doe,Professor,johndoe@google.com,"1234567, 9999999","M.Tech., Ph.D.",network security
Dr. john doe2 ,Professor2,johndoe2@google.com,222222222,"B.Tech., Ph.D.",network security2
Dr. john doe3 ,Associate Professor3,johndoe3@google.com,"333333,4444444",Ph.D.,network security3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM