简体   繁体   中英

Reading CSV file with multiple rows containing header

I have csv files that are outputs generated by an instrument. Each file contains multiple datasets that are separated with a 'condition' followed by the header and data. I want to make the 'condition' a column for the appropriate data set and read the file. The output can either be one file or a file for each dataset. The condition, the headers, and the data are all separated by tabs in the csv file.

I can't figure out how to even begin this. I have a screenshot of the example inputs and outputs. Any insights or directions to take this would be appreciated. Thank you! Image of example input and desired output

There is one of the possible solutions:


#Open the fist file
mfile = open('file.csv', 'r')
string = mfile.read()
mfile.close()
# Split on the line breaks
string = string.split("\n")



#CAUTION if you CSV file uses ";" instead "," change it on the code!

condition = ''
newString = []
for i in range(len(string)):
    # Check if condition is trully oneline
    if(len(string[i].split(',')) ==1):
        condition = string[i]
        #Change the string 'header1,header2 to you header
    elif (string[i] == 'header1,header2'):
        pass
    else:
        newString.append(string[i] + ","+condition)

mfile = open('outfile.csv', 'w')
mfile.write('header1,header2\n')
for i in newString:
    mfile.write(i + '\n')

I've used this as a content of file.csv (input):

condidtion1
header1,header2
2,3
2,3
2,3
2,3
condidtion2
header1,header2
3,4
3,4
3,4
3,4
3,4
3,4

After running the code, the outfile.csv looks like (output):

header1,header2
2,3,condidtion1
2,3,condidtion1
2,3,condidtion1
2,3,condidtion1
3,4,condidtion2
3,4,condidtion2
3,4,condidtion2
3,4,condidtion2
3,4,condidtion2
3,4,condidtion2

This will solve your issue

import csv

file = open('test.tsv', 'r')
lines = file.readlines()
# lines = ['Condition 1\t\n', 'Header 1\tHeader 2\n', '2\t3\n', '2\t3\n', '2\t3\n', 'Condition 2\t\n', 'Header 1\tHeader 2\n', '2\t3\n', '2\t3\n', '2\t3\n']
current_condition = ''
final_output = [['Header 1', 'Header 2', 'condition']]
for i in range(0,len(lines)):
    row = lines[i].rstrip().split('\t')
    if len(row) == 1:
        current_condition = row[0]
    elif row[0] != 'Header 1' and row[1] != 'Header 2':
        final_output.append([
            row[0],
            row[1],
            current_condition
        ])

fout = open('output.csv', 'w')
writer = csv.writer(fout)
writer.writerows(final_output)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM