简体   繁体   English

不了解Python的csv.reader对象

[英]Don't understand Python's csv.reader object

I've come across a behavior in python's built-in csv module that I've never noticed before. 我在python的内置csv模块中遇到过一个我以前从未注意过的行为。 Typically, when I read in a csv, it's following the doc's pretty much verbatim, using 'with' to open the file then looping over the reader object with a 'for' loop. 通常,当我在csv中读取时,它几乎逐字地跟随文档 ,使用'with'打开文件,然后使用'for'循环遍历reader对象。 However, I recently tried iterating over the csv.reader object twice in a row, only to find out that the second 'for' loop did nothing. 但是,我最近尝试连续两次迭代csv.reader对象,结果发现第二个'for'循环没有做任何事情。

import csv

with open('smallfriends.csv','rU') as csvfile:
readit = csv.reader(csvfile,delimiter=',')

for line in readit:
    print line

for line in readit:
    print 'foo'

Console Output: 控制台输出:

Austins-iMac:Desktop austin$ python -i amy.py 
['Amy', 'James', 'Nathan', 'Sara', 'Kayley', 'Alexis']
['James', 'Nathan', 'Tristan', 'Miles', 'Amy', 'Dave']
['Nathan', 'Amy', 'James', 'Tristan', 'Will', 'Zoey']
['Kayley', 'Amy', 'Alexis', 'Mikey', 'Sara', 'Baxter']
>>>
>>> readit
<_csv.reader object at 0x1023fa3d0>
>>> 

So the second 'for' loop basically does nothing. 所以第二个'for'循环基本上什么也没做。 One thought I had is the csv.reader object is being released from memory after being read once. 我有一个想法是csv.reader对象在被读取一次后从内存中释放。 This isn't the case though since it still retains it's memory address. 但事实并非如此,因为它仍然保留了它的内存地址。 I found a post that mentions a similar problem. 我找到了一篇提到类似问题的帖子 The reason they gave is that once the object is read, the pointer stay's at the end of the memory address ready to write data to the object. 他们给出的原因是,一旦读取了对象,指针就会停留在内存地址的末尾,准备将数据写入对象。 Is this correct? 它是否正确? Could someone go into greater detail as to what is going on here? 有人可以详细了解这里发生了什么吗? Is there a way to push the pointer back to the beginning of the memory address to reread it? 有没有办法将指针推回到内存地址的开头重新读取? I know it's bad coding practices to do that but I'm mainly just curious and wanting to learn more about what goes on under Python's hood. 我知道这样做是不好的编码实践,但我主要只是好奇并希望更多地了解Python的内容。

Thanks! 谢谢!

I'll try to answer your other questions about what the reader is doing and why reset() or seek(0) might help. 我将尝试回答您关于读者正在做什么以及为什么reset()seek(0)可能有所帮助的其他问题。 In the most basic form, the csv reader might look something like this: 在最基本的形式中,csv阅读器可能看起来像这样:

def csv_reader(it):
    for line in it:
        yield line.strip().split(',')

That is, it takes any iterator producing strings and gives you a generator. 也就是说,它需要任何迭代器生成字符串并为您提供生成器。 All it does is take an item from your iterator, process it and return the item. 它所做的只是从你的迭代器中获取一个项目,处理它并返回该项目。 When it is consumed, the csv_reader will quit. it的消耗,csv_reader将退出。 The reader has no idea where the iterator came from or how to properly make a fresh one, so it doesn't even try to reset itself. 读者不知道迭代器的来源或如何正确地制作一个新的,所以它甚至都没有尝试重置自己。 That is left to the programmer. 这留给了程序员。

We can either modify the iterator in place without the reader knowing or just make a new reader. 我们既可以在没有读者知道的情况下修改迭代器,也可以只创建一个新读者。 Here are some examples to demonstrate my point. 以下是一些证明我的观点的例子。

data = open('data.csv', 'r')
reader = csv.reader(data)

print(next(reader))               # Parse the first line
[next(data) for _ in range(5)]    # Skip the next 5 lines on the underlying iterator
print(next(reader))               # This will be the 7'th line in data
print(reader.line_num)            # reader thinks this is the 2nd line
data.seek(0)                      # Go back to the beginning of the file
print(next(reader))               # gives first line again

data = ['1,2,3', '4,5,6', '7,8,9']
reader = csv.reader(data)         # works fine on lists of strings too
print(next(reader))               # ['1', '2', '3']

In general if you need a 2nd pass, its best to close/reopen your files and use a new csv reader. 一般情况下,如果您需要第二遍,最好关闭/重新打开文件并使用新的csv阅读器。 Its clean and ensures nice bookkeeping. 它干净,确保良好的簿记。

Iterating over a csvreader simply wraps iterating over the lines in the underlying file object. 迭代csvreader只是迭代迭代底层文件对象中的行。 On each iteration the reader gets the next line from the file, converts and returns it. 在每次迭代中,阅读器从文件中获取下一行,转换并返回它。

So iterating over a csvreader follows the same conventions as iterating over files . 因此,迭代csvreader遵循与迭代文件相同的约定。 That is, once the file reached its end you'd have to seek to the start before iterating a second time. 也就是说,一旦文件到达终点,你必须在第二次迭代之前寻求开始。

The below should do, though I haven't tested it: 以下应该做,虽然我没有测试过:

import csv

with open('smallfriends.csv','rU') as csvfile:
    readit = csv.reader(csvfile,delimiter=',')

    for line in readit:
        print line

    # go back to the start of the file
    csvfile.seek(0)

    for line in readit:
        print 'foo

If it's not too much data, you can always read it into a list: 如果数据不是太多,您可以随时将其读入列表:

import csv

with open('smallfriends.csv','rU') as csvfile:
    readit = csv.reader(csvfile,delimiter=',')
    csvdata = list(readit)

    for line in csvdata :
        print line

    for line in csvdata :
        print 'foo'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM