I am using DictReader to convert a csv table into a dict. However the csv file contains some lines above the data I need. I could use next()
, but thats not an intelligent solution as the number of "junk" lines may vary. eg The file is as follows:
#stuff not needed
#more stuff which is not needed
label,path,value
a,/path,1
b,/path,2
So can I automatically extract the table and the header in this case ?
If the fields are identifiable, you could do something along these lines:
import csv
st='''\
stuff, not, needed
#more stuff which is not needed
# even more stuff not needed
label,path,value
a,/path,1
b,/path,2'''
data=[]
tgt='label,path,value'
start=False
for line in csv.reader(st.splitlines()):
if start:
data.append(line)
elif ','.join(e.strip() for e in line)==tgt:
start=True
data.append(line)
print data
# [['label', 'path', 'value'], ['a', '/path', '1'], ['b', '/path', '2']]
Or, if you have a file that looks like that, you can do something along these lines:
import csv
with open('/tmp/test.csv', 'r') as csvin:
tgt='label,path,value'
for line in csv.reader(csvin):
if ','.join(e.strip() for e in line)==tgt:
break
data={k:[] for k in line}
for line in csv.DictReader(csvin, data.keys()):
for k,v in line.items():
data[k].append(v)
print data
# {'path': ['/path', '/path'], 'value': ['1', '2'], 'label': ['a', 'b']}
Both of these solution rely on the value of the header being known in advance. If you do NOT know the headers, you will need to know how to identify by other means the lines that are not of interest before the header, such as # this is a comment
If you have no idea what the header elements are but you know that all the lines leading up to the header are either blank or are prepended with #
, then this:
import csv
with open('/tmp/test.csv', 'r') as csvin:
for line in csv.reader(csvin):
if not ''.join(x.strip() for x in line) or line[0].strip()[0]=='#':
continue
else:
break
data={k:[] for k in line}
for line in csv.DictReader(csvin, data.keys()):
for k,v in line.items():
data[k].append(v)
print data
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.