简体   繁体   English

使用不同的行终止符在Python中读取csv文件

[英]Reading a csv file in Python with different line terminator

I have a file in CSV format where the delimiter is the ASCII unit separator ^_ and the line terminator is the ASCII record separator ^^ (obviously, since these are nonprinting characters, I've just used one of the standard ways of writing them here). 我有一个CSV格式的文件,其中分隔符是ASCII单元分隔符^_ ,行终止符是ASCII记录分隔符^^ (显然,因为这些是非打印字符,我只是使用了一种标准的写入方式这里)。 I've written plenty of code that reads and writes CSV files, so my issue isn't with Python's csv module per se. 我写了大量读写CSV文件的代码,所以我的问题不在于Python的csv模块本身。 The problem is that the csv module doesn't support reading (but it does support writing) line terminators other than a carriage return or line feed, at least as of Python 2.6 where I just tested it. 问题是csv模块不支持读取(但它确实支持写入)除了回车符或换行符之外的行终止符,至少从我测试它的Python 2.6开始。 The documentation says that this is because it's hard coded, which I take to mean it's done in the C code that underlies the module, since I didn't see anything in the csv.py file that I could change. 文档说这是因为它是硬编码的,我认为它是在C代码中完成的,因为我没有在csv.py文件中看到任何可以更改的内容。

Does anyone know a way around this limitation (patch, another CSV module, etc.)? 有没有人知道解决这个限制的方法(补丁,另一个CSV模块等)? I really need to read in a file where I can't use carriage returns or new lines as the line terminator because those characters will appear in some of the fields, and I'd like to avoid writing my own custom reader code if possible, even though that would be rather simple to meet my needs. 我真的需要在一个文件中读取,我不能使用回车符或新行作为行终止符,因为这些字符将出现在某些字段中,我想尽可能避免编写自己的自定义阅读器代码,即使这样可以很容易地满足我的需求。

Why not supply a custom iterable to the csv.reader function? 为什么不为csv.reader函数提供自定义迭代? Here is a naive implementation which reads the entire contents of the CSV file into memory at once (which may or may not be desirable, depending on the size of the file): 这是一个简单的实现,它将CSV文件的全部内容一次性读入内存(根据文件的大小,可能需要也可能不需要):

def records(path):
    with open(path) as f:
        contents = f.read()
        return (record for record in contents.split('^^'))

csv.reader(records('input.csv'))

I think that should work. 我认为这应该有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM