converting structured text file to csv(unable to change rows into columns):

Question

I am trying to convert the text file into CSV in Python The input text file is as follows:

Employee Name: Dr.john doe
Designation: Professor
Email: johndoe@google.com
ContactNo: 1234567, 9999999
Qualification: M.Tech., Ph.D.
Area of Interest / Specialisation: network security
Employee Name: Dr. john doe2 
Designation: Professor2
Email: johndoe2@google.com
ContactNo: 222222222
Qualification: B.Tech., Ph.D.
Area of Interest / Specialisation: network security2
Employee Name: Dr. john doe3 
Designation: Associate Professor3
Email: johndoe3@google.com
ContactNo: 333333,4444444
Qualification: Ph.D.
Area of Interest / Specialisation: network security3
Designation: Associate Professor4
Email: johndoe4@google.com
ContactNo: 44444444 ,Intercom No.44444
Qualification: : M.Sc. 
Designation: Programmer
Email: johndoe5@google.com
ContactNo: 5555555555 ,Intercom No.5555
Qualification: Ph.D |Computer Science
Designation: Computer Operator
Email: johndoe6@google.com
ContactNo: 666666666
Qualification: D.C.Sc. & E.,
Designation: Computer Operator
Email: johndoe7@google.com
ContactNo: 777777777 ,Intercom No.77777<
Qualification: D.E & TC.,
Designation: Instructor4
Email: johndoe8@google.com
ContactNo: 8888888888 ,Intercom No.8888
Qualification: D.C.Sc. & E.,`

I need it in CSV in following format

Employee name,designation,email,contact,Qualification,Specialisation       
Dr. john doe,Professor,johndoe@google.com,1234567,B.E.,network security     
Dr. john doe2,Professor,johndoe2@google.com,222222222,M.S.,network security2    
Dr. john doe3,Associate,Professor3,johndoe3@gmail.com,333333,M.Tech.,network security3

i've tried this

with open('test.txt', 'r') as records:
    stripped = (line.strip() for line in records)
    lines = (line.split(":") for line in stripped if line)
    with open('log.csv', 'w') as out_file:
        writer = csv.writer(out_file)
        writer.writerows(lines)

my above code gives following output which has only two rows(I Dont know how to make 6 columns and add the tuples in rows):

Employee Name, Dr.john doe
Designation, Professor
Email, johndoe@google.com
ContactNo, 1234567, 9999999
Qualification, M.Tech., Ph.D.
Area of Interest / Specialisation, network security
Employee Name, Dr. john doe2 
Designation, Professor2
Email, johndoe2@google.com
ContactNo, 222222222
Qualification, B.Tech., Ph.D.
Area of Interest / Specialisation, network security2
Employee Name, Dr. john doe3 
Designation, Associate Professor3
Email, johndoe3@google.com
ContactNo, 333333,4444444
Qualification, Ph.D.
Area of Interest / Specialisation, network security3

in short : I am able to seperate the attribute name and its value, but i do not know how to populate the values in specific fields.

Answer 1

if you are familiar with pandas so you can simply use this code

import pandas as pd

with open('test.txt', 'r') as records:
    lines = [(line.split(':'))[1] for line in records.readlines()]
    col_titles = ('Employee name', 'designation','email','contact','Qualification','Specialisation')
    data = pd.np.array(lines).reshape((len(lines) // 6, 6))
    pd.DataFrame(data, columns=col_titles).to_csv("output.csv", index=False)

Answer 2

I think this works:

import csv, collections

with open('test.txt', 'r') as record_fields, open('log.csv', 'w') as out_file:
    records, fieldnames, record = [], collections.OrderedDict(), {}
    for field in record_fields:
        name, _, value = field.strip().partition(": ")
        if name == "Employee Name" and record:
            records.append(record)
            record = {}
        if name not in record: record[name] = value
        fieldnames[name] = None
    records.append(record)

    writer = csv.DictWriter(out_file, fieldnames=fieldnames.keys())
    writer.writeheader()
    writer.writerows(records)

It gives me:

Employee Name,Designation,Email,ContactNo,Qualification,Area of Interest / Specialisation
Dr.john doe,Professor,johndoe@google.com,"1234567, 9999999","M.Tech., Ph.D.",network security
Dr. john doe2,Professor2,johndoe2@google.com,222222222,"B.Tech., Ph.D.",network security2
Dr. john doe3,Associate Professor3,johndoe3@google.com,"333333,4444444",Ph.D.,network security3

Answer 3

You can use itertools.groupby to find the different information blocks for each employee:

import itertools, csv
data = [i.strip('\n').split(': ') for i in open('university_employees.txt')]
new_data = [[a, list(b)] for a, b in itertools.groupby(data, key=lambda x:x[0] == 'Employee Name')]
header = [c for b in new_data[:2] for c, _ in b[-1]]
a, b, *d = [[new_data[i][-1][-1][-1], *[' '.join(c) for _, *c in new_data[i+1][-1]]] for i in range(0, len(new_data), 2)]
with open('professors.csv', 'w') as f:
  write = csv.writer(f)
  write.writerows([header, a, b, d[0][:6]])

Output:

Employee Name,Designation,Email,ContactNo,Qualification,Area of Interest / Specialisation
Dr.john doe,Professor,johndoe@google.com,"1234567, 9999999","M.Tech., Ph.D.",network security
Dr. john doe2 ,Professor2,johndoe2@google.com,222222222,"B.Tech., Ph.D.",network security2
Dr. john doe3 ,Associate Professor3,johndoe3@google.com,"333333,4444444",Ph.D.,network security3

converting structured text file to csv(unable to change rows into columns):

Question

3 answers

solution1
2 ACCPTED 2019-04-19 14:37:38

solution2
1 2019-04-19 14:19:48

solution3
1 2019-04-19 14:32:53

converting structured text file to csv(unable to change rows into columns):

Question

3 answers

solution1 2 ACCPTED 2019-04-19 14:37:38

solution2 1 2019-04-19 14:19:48

solution3 1 2019-04-19 14:32:53

solution1
2 ACCPTED 2019-04-19 14:37:38

solution2
1 2019-04-19 14:19:48

solution3
1 2019-04-19 14:32:53