简体   繁体   中英

Use DictReader in Python to determine table header in CSV file

I am using DictReader to convert a csv table into a dict. However the csv file contains some lines above the data I need. I could use next() , but thats not an intelligent solution as the number of "junk" lines may vary. eg The file is as follows:

#stuff not needed
#more stuff which is not needed

label,path,value
a,/path,1
b,/path,2 

So can I automatically extract the table and the header in this case ?

If the fields are identifiable, you could do something along these lines:

import csv

st='''\
stuff, not, needed
#more stuff which is not needed
# even more stuff not needed
label,path,value
a,/path,1
b,/path,2''' 

data=[]
tgt='label,path,value'
start=False
for line in csv.reader(st.splitlines()):
    if start:
        data.append(line) 
    elif ','.join(e.strip() for e in line)==tgt:
        start=True
        data.append(line)              

print data 
# [['label', 'path', 'value'], ['a', '/path', '1'], ['b', '/path', '2']]

Or, if you have a file that looks like that, you can do something along these lines:

import csv

with open('/tmp/test.csv', 'r') as csvin:
    tgt='label,path,value'
    for line in csv.reader(csvin):
        if ','.join(e.strip() for e in line)==tgt:
            break 

    data={k:[] for k in line}   
    for line in csv.DictReader(csvin, data.keys()):   
        for k,v in line.items():
            data[k].append(v)

print data 
# {'path': ['/path', '/path'], 'value': ['1', '2'], 'label': ['a', 'b']}

Both of these solution rely on the value of the header being known in advance. If you do NOT know the headers, you will need to know how to identify by other means the lines that are not of interest before the header, such as # this is a comment

If you have no idea what the header elements are but you know that all the lines leading up to the header are either blank or are prepended with # , then this:

import csv

with open('/tmp/test.csv', 'r') as csvin:
    for line in csv.reader(csvin):
        if not ''.join(x.strip() for x in line) or line[0].strip()[0]=='#':
            continue
        else:
            break
    data={k:[] for k in line}
    for line in csv.DictReader(csvin, data.keys()): 
        for k,v in line.items():
            data[k].append(v)          

print data  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM