简体   繁体   English

使用Python中的DictReader确定CSV文件中的表头

[英]Use DictReader in Python to determine table header in CSV file

I am using DictReader to convert a csv table into a dict. 我正在使用DictReader将csv表转换为dict。 However the csv file contains some lines above the data I need. 但是,csv文件包含我需要的数据之上的一些行。 I could use next() , but thats not an intelligent solution as the number of "junk" lines may vary. 我可以使用next() ,但这不是一个智能解决方案,因为“垃圾”行的数量可能会有所不同。 eg The file is as follows: 例如,文件如下:

#stuff not needed
#more stuff which is not needed

label,path,value
a,/path,1
b,/path,2 

So can I automatically extract the table and the header in this case ? 那么在这种情况下我可以自动提取表格和标题吗?

If the fields are identifiable, you could do something along these lines: 如果字段是可识别的,您可以按照以下方式执行操作:

import csv

st='''\
stuff, not, needed
#more stuff which is not needed
# even more stuff not needed
label,path,value
a,/path,1
b,/path,2''' 

data=[]
tgt='label,path,value'
start=False
for line in csv.reader(st.splitlines()):
    if start:
        data.append(line) 
    elif ','.join(e.strip() for e in line)==tgt:
        start=True
        data.append(line)              

print data 
# [['label', 'path', 'value'], ['a', '/path', '1'], ['b', '/path', '2']]

Or, if you have a file that looks like that, you can do something along these lines: 或者,如果您有一个看起来像这样的文件,您可以沿着这些方向做一些事情:

import csv

with open('/tmp/test.csv', 'r') as csvin:
    tgt='label,path,value'
    for line in csv.reader(csvin):
        if ','.join(e.strip() for e in line)==tgt:
            break 

    data={k:[] for k in line}   
    for line in csv.DictReader(csvin, data.keys()):   
        for k,v in line.items():
            data[k].append(v)

print data 
# {'path': ['/path', '/path'], 'value': ['1', '2'], 'label': ['a', 'b']}

Both of these solution rely on the value of the header being known in advance. 这两种解决方案都依赖于预先知道的报头的值。 If you do NOT know the headers, you will need to know how to identify by other means the lines that are not of interest before the header, such as # this is a comment 如果你不知道标题,你将需要知道如何通过其他方式识别标题之前不感兴趣的行,例如# this is a comment

If you have no idea what the header elements are but you know that all the lines leading up to the header are either blank or are prepended with # , then this: 如果您不知道标题元素是什么,但是您知道通向标题的所有行都是空白的或者前面带有# ,那么这样:

import csv

with open('/tmp/test.csv', 'r') as csvin:
    for line in csv.reader(csvin):
        if not ''.join(x.strip() for x in line) or line[0].strip()[0]=='#':
            continue
        else:
            break
    data={k:[] for k in line}
    for line in csv.DictReader(csvin, data.keys()): 
        for k,v in line.items():
            data[k].append(v)          

print data  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM