简体   繁体   English

从前导空格开始阅读csv

[英]Read csv starting with leading spaces

I have a comma-separated file (from a third party) in which each line starts and ends with a space, the fields are quoted with a doublequote, and the file ends with a line with only a space. 我有一个逗号分隔的文件(来自第三方),其中每行以空格开头和结尾,字段用双引号引起来,并且文件的末尾仅带有空格。

 "first_name";"last_name" 
 "John";"Doe" 
 "Anita";"Doe"  

I try to read this with the following code. 我尝试使用以下代码阅读。

import csv
import json

def read_csv(filename):
    result = []
    with open(filename, 'r', encoding='utf-8') as f:
        csv_reader = csv.reader(f, delimiter=';', quotechar='"')
        for line_index, line in enumerate(csv_reader):
            if line_index == 0:
                header = line
                continue
            result.append(dict(zip(header, line)))
    return result

if __name__ == '__main__':
    contents = read_csv('test.txt')
    print(json.dumps(contents, indent=4, sort_keys=4))

This is my expected result: 这是我的预期结果:

[
    {
        "first_name": "John",
        "last_name ": "Doe "
    },
    {
        "first_name": "Anita",
        "last_name ": "Doe "
    }
]

However, it always takes the doublequotes as part of the first column, due to the leading spaces, plus it takes the last line also into account. 但是,由于前导空格,它总是将双引号作为第一列的一部分,而且还要考虑到最后一行。 This is the result I get: 这是我得到的结果:

[
    {
        " \"first_name\"": " \"John\"",
        "last_name ": "Doe "
    },
    {
        " \"first_name\"": " \"Anita\"",
        "last_name ": "Doe "
    },
    {
        " \"first_name\"": " "
    }
]

How can I get rid of these leading and trailing spaces before the csv is parsed? 解析csv 之前 ,如何摆脱这些前导和尾随空格? The answer here shows how to remove spaces from fields after it is read, but that wouldn't be good here, since it's not the contents of the fields that I want to change, but the fields themselves. 这里的答案显示了如何在读取后从字段中删除空格,但这在这里并不好,因为不是我要更改的字段内容,而是字段本身。

By the way: I am using Python 3.5. 顺便说一句:我正在使用Python 3.5。

EDIT 编辑

I am skipping empty lines now using the following code: 我现在使用以下代码跳过空行:

# Skip empty lines
line = [column.strip() for column in line]
if not any(line):
    continue

You can use skipinitialspace=True and use a csv.DictReader (which assumes the first row is a header and creates a dict for you of name->value instead of manually doing it yourself) instead, eg: 您可以使用skipinitialspace=True并使用csv.DictReader (假定第一行是标头,并为您创建一个name-> value的dict而不是您自己手动执行),例如:

with open(filename) as fin:
    csvin = csv.DictReader(fin, delimiter=';', skipinitialspace=True)
    result = list(csvin)

Alternatively, if only rows with some value should be considered (ie, the last row with no values, or even iterim blanks row should be filtered out), you can use: 或者,如果仅考虑具有某个值的行(即,最后一个没有值的行,甚至应过滤掉iterim空白行),则可以使用:

result = [row for row in csvin if any(row.values())]

Which'll give you: 这会给你:

[{'first_name': 'John', 'last_name ': 'Doe '},
 {'first_name': 'Anita', 'last_name ': 'Doe '}]

And the result of that using json.dumps(result, indent=4, sort_keys=4)) is: 使用json.dumps(result, indent=4, sort_keys=4))是:

[
    {
        "first_name": "John",
        "last_name ": "Doe "
    },
    {
        "first_name": "Anita",
        "last_name ": "Doe  "
    }
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM