[英]Python csv.DictReader - how to reverse output?
I'm trying to reverse the way a file is read. 我试图扭转文件的读取方式。 I am using DictReader because I want the contents in a Dictionary.
我正在使用DictReader,因为我想要一个字典中的内容。 I'd like to read the first line in the file and use that for the Keys, then parse the file in reverse (bottom to top) kind of like the linux "tac" command.
我想读取文件中的第一行并将其用于Keys,然后反向解析文件(从下到上),类似于linux“tac”命令。 Is there an easy way to do this?
是否有捷径可寻? Below is my code to read the file into a dictionary and write it to a file...
下面是我的代码,将文件读入字典并将其写入文件...
reader = csv.DictReader(open(file_to_parse, 'r'), delimiter=',', quotechar='"')
for line in reader:
# ...
This code works to process the file normally, however.. I need it to read the file from the end. 这段代码可以正常处理文件,但是我需要它从最后读取文件。
In other words, I'd like it to read the file: 换句话说,我希望它能够读取文件:
fruit, vegetables, cars
orange, carrot, ford
apple, celery, chevy
grape, corn, chrysler
and be able to have it return: 并能够让它返回:
{' cars': ' chrysler', ' vegetables': ' corn', 'fruit': 'grape'}
{' cars': ' chevy', ' vegetables': ' celery', 'fruit': 'apple'}
{' cars': ' ford', ' vegetables': ' carrot', 'fruit': 'orange'}
instead of: 代替:
{' cars': ' ford', ' vegetables': ' carrot', 'fruit': 'orange'}
{' cars': ' chevy', ' vegetables': ' celery', 'fruit': 'apple'}
{' cars': ' chrysler', ' vegetables': ' corn', 'fruit': 'grape'}
You'll have to read the whole CSV file into memory; 您必须将整个 CSV文件读入内存; you can do so by calling
list()
on the reader object: 你可以通过调用reader对象上的
list()
来实现:
with open(file_to_parse, 'rb') as inf:
reader = csv.DictReader(inf, skipinitialspace=True)
rows = list(reader)
for row in reversed(rows):
Note that I used the file as a context manager here to ensure that the file is closed. 请注意,我在此处使用该文件作为上下文管理器以确保文件已关闭。 You also want to open the file in binary mode (leave newline handling to the
csv
module). 您还希望以二进制模式打开文件(将换行处理留给
csv
模块)。 The rest of the configuration you passed to the DictReader()
are the defaults, so I omitted them. 您传递给
DictReader()
的其余配置是默认值,因此我省略了它们。
I set skipinitialspace
to True, as judging from your sample input and output you do have spaces after your delimiters; 我将
skipinitialspace
设置为True,从您的示例输入和输出判断,您的分隔符后面有空格; the option removes these. 该选项删除了这些。
The csv.DictReader()
object takes care of reading that first line as the keys. csv.DictReader()
对象负责将第一行作为键读取。
Demo: 演示:
>>> import csv
>>> sample = '''\
... fruit, vegetables, cars
... orange, carrot, ford
... apple, celery, chevy
... grape, corn, chrysler
... '''.splitlines()
>>> reader = csv.DictReader(sample, skipinitialspace=True)
>>> rows = list(reader)
>>> for row in reversed(rows):
... print row
...
{'cars': 'chrysler', 'vegetables': 'corn', 'fruit': 'grape'}
{'cars': 'chevy', 'vegetables': 'celery', 'fruit': 'apple'}
{'cars': 'ford', 'vegetables': 'carrot', 'fruit': 'orange'}
read to a list and reverse: 读到列表并反转:
lines = [x for x in reader]
for line in lines[::-1]:
print line
{' cars': ' chrysler', ' vegetables': ' corn', 'fruit': 'grape'}
{' cars': ' chevy', ' vegetables': ' celery', 'fruit': 'apple'}
{' cars': ' ford', ' vegetables': ' carrot', 'fruit': 'orange'}
Or as Martijn Pieters suggested: 或者正如Martijn Pieters所说:
for line in reversed(list(reader)):
You don't actually have to read the whole file into memory. 您实际上不必将整个文件读入内存。
A csv.DictReader
doesn't actually require a file, just an iterable of strings.* csv.DictReader
实际上并不需要一个文件,只是一个可迭代的字符串。*
And you can read a text file in reverse order in average linear time with constant space with not too much overhead. 并且您可以以平均线性时间以相反的顺序读取文本文件,其中空间不变,而且开销不会太大 。 It's not trivial, but it's not that hard:
这不是微不足道的,但并不难:
def reverse_lines(*args, **kwargs):
with open(*args, **kwargs) as f:
buf = ''
f.seek(0, io.SEEK_END)
while f.tell():
try:
f.seek(-1024, io.SEEK_CUR)
except OSError:
bufsize = f.tell()
f.seek(0, io.SEEK_SET)
newbuf = f.read(bufsize)
f.seek(0, io.SEEK_SET)
else:
newbuf = f.read(1024)
f.seek(-1024, io.SEEK_CUR)
buf = newbuf + buf
lines = buf.split('\n')
buf = lines.pop(0)
yield from reversed(lines)
yield buf
This isn't rigorously tested, and it strips off the newlines (which is fine for csv.DictReader
, but not fine in general), and it's not optimized for unusual but possible edge cases (eg, for really long lines, it will be quadratic), and it requires Python 3.3, and the file doesn't go away until you close/release the iterator (it probably should be a context manager so you can deal with that)—but if you really want this, I'm willing to bet you can find a recipe on ActiveState or distribution on PyPI with none of those problems. 这没有经过严格的测试,它剥离了换行符(这对于
csv.DictReader
来说很好,但一般来说不是很好),并且它没有针对不寻常但可能的边缘情况进行优化(例如,对于非常长的行,它将是二次),它需要Python 3.3,并且文件不会消失,直到你关闭/释放迭代器(它可能应该是一个上下文管理器,所以你可以处理它) - 但如果你真的想要这个,我是愿意打赌你可以在ActiveState上找到一个配方或在PyPI上找到一个没有这些问题的分配。
Anyway, for a medium-sized file, I suspect it'd actually be faster, on almost any real-life filesystem, to read the whole thing into memory in forward order then iterate the list in reverse. 无论如何,对于一个中等大小的文件,我怀疑在几乎任何现实生活中的文件系统上实际上都要以正向顺序将整个内容读入内存然后反向迭代列表。 But for a very large file (especially one you can't even fit into memory), this solution is obviously much better.
但是对于一个非常大的文件(特别是一个你甚至无法适应内存的文件),这个解决方案显然要好得多。
From a quick test (see http://pastebin.com/Nst6WFwV for code), on my computer, the basic breakdown is: 通过快速测试(请参阅http://pastebin.com/Nst6WFwV获取代码),在我的计算机上,基本细分是:
Of course the details will depend on a lot of facts about your computer. 当然,细节将取决于有关您的计算机的大量事实。 It's probably no coincidence that 500M 72-char lines of ASCII takes up close to half the physical RAM on my machine.
可能并非巧合的是,500M 72-char的ASCII线占据了我机器上近一半的物理RAM。 But with a hard drive instead of an SSD you'd probably see more penalty for
reverse_lines
(since random seeks would be much slower compared to contiguous reads, and disk in general would be more important). 但是使用硬盘而不是SSD你可能会看到对
reverse_lines
更多惩罚(因为随机读取与连续读取相比会慢很多,而且通常磁盘会更重要)。 And your platform's malloc and VM behavior, and even locality issues (parsing a line almost immediately after reading it instead of after it's been swapped out and back in…) might make a difference. 而你的平台的malloc和VM行为,甚至地点问题(在读取它之后几乎立即解析一条线而不是在它被换出并重新进入......之后)可能会有所不同。 And so on.
等等。
Anyway, the lesson is, if you're not expecting at least 10s of millions of lines (or maybe a bit less on a very resource-constrained machine), don't even think about this; 无论如何,教训是,如果你不期望至少有数百万行(或者在资源有限的机器上可能少一点),甚至不要考虑这个问题; just keep it simple.
保持简单。
* As Martijn Pieters points out in the comments, if you're not using explicit fieldnames
, DictReader
requires an iterable of strings where the first line is the header . *正如Martijn Pieters在评论中指出的那样,如果你没有使用显式
fieldnames
, DictReader
需要一个可迭代的字符串,其中第一行是标题 。 But you can fix that by reading the first line separately with a csv.reader
and passing it as the fieldnames
, or even by itertools.chain
-ing all the first line from a forward read before all but the last lines of the backward read. 但是你可以通过分别用
csv.reader
读取第一行并将其作为fieldnames
传递来解决这个问题,甚至可以通过itertools.chain
来解决这个问题。来自前向读取的所有第一行除了后向读取的最后csv.reader
行之外。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.