简体   繁体   English

在 Python 中迭代 CSV 读卡器 object

[英]Iterating over CSV reader object in Python

I have two CSV files, one of which is likely to contain a few more records that the other.我有两个 CSV 文件,其中一个可能比另一个包含更多的记录。 I am writing a function to iterate over each and determine which records are in dump but not liar.我正在写一个 function 来迭代每个并确定哪些记录在转储中但不是骗子。

My code is as follows:我的代码如下:

def update_lib(x, y):
    dump = open(x, newline='')
    libr = open(y, newline='')
    dump_reader = csv.reader(dump)
    for dump_row in dump_reader:
        libr_reader = csv.reader(libr)
        for libr_row in libr_reader:
            if dump_row[0] == libr_row[0]:
                break

I am expecting this to take the first row in dump (dump_row) and iterate over each row in library (libr_row) to see if the first elements match.我希望这会在转储(dump_row)中获取第一行并遍历库(libr_row)中的每一行以查看第一个元素是否匹配。 If they do then I want to move to the next row in dump and if not I will do something else eventually.如果他们这样做了,那么我想移动到转储中的下一行,如果没有,我最终会做其他事情。

My issue is that libr_reader appears to "remember" where it is and I can't get it to go back to the first row in libr, even when the break has been reached and I would therefore expect libr_reader to be re-initiated.我的问题是 libr_reader 似乎“记住”了它的位置,我无法将 go 返回到 libr 的第一行,即使已达到break ,因此我希望 libr_reader 被重新启动。 I have even tried del libr_row and del libr_reader but this doesn't appear to make a difference.我什至尝试过del libr_rowdel libr_reader但这似乎没有什么不同。 I suspect I am misunderstanding iterators, any help gratefully received.我怀疑我误解了迭代器,感激地收到任何帮助。

As it's pasted in your question, you'll be creating a libr_reader object every time you iterate over a row in dump_reader .正如它粘贴在您的问题中一样,每次迭代 dump_reader 中的一行时,您都将创建一个libr_reader dump_reader

dump_reader = csv.reader(dump)
for dump_row in dump_reader:
    libr_reader = csv.reader(libr)

dump_reader here is created once.此处的dump_reader被创建一次。 Assuming there are 10 rows from dump_reader , you will be creating 10 libr_reader instances, all from the same file handle.假设dump_reader有 10 行,您将创建 10 个libr_reader实例,全部来自同一个文件句柄。

Per our discussion in the comments, you're aware of that, but what you're unaware of is that the reader object is working on the same file handle and thus, is still at the same cursor.根据我们在评论中的讨论,您知道这一点,但您不知道的是阅读器 object 正在处理相同的文件句柄,因此仍然使用相同的 cursor。

Consider this example:考虑这个例子:

>>> import io
>>> my_file = io.StringIO("""Line 1
... Another Line
... Finally, a third line.""")

This is creating a simulated file object.这是创建一个模拟文件 object。 Now I'll create a "LineReader" class.现在我将创建一个“LineReader”class。

>>> class LineReader:
...     def __init__(self, file):
...         self.file = file
...     def show_me_a_line(self):
...         print(self.file.readline())
... 

If I use three line readers on the same file, the file still remembers its place:如果我在同一个文件上使用三个行阅读器,该文件仍然会记住它的位置:

>>> line_reader = LineReader(my_file)
>>> line_reader.show_me_a_line()
Line 1

>>> second_line_reader = LineReader(my_file)
>>> second_line_reader.show_me_a_line()
Another Line

>>> third_line_reader = LineReader(my_file)
>>> third_line_reader.show_me_a_line()
Finally, a third line.

To the my_file object, there's no material difference between what I just did, and doing this directly.对于my_file object,我刚才做的和直接做的没有本质区别。 First, I'll "reset" the file to the beginning by calling seek(0):首先,我将通过调用 seek(0) 将文件“重置”到开头:

>>> my_file.seek(0)
0
>>> my_file.readline()
'Line 1\n'
>>> my_file.readline()
'Another Line\n'
>>> my_file.readline()
'Finally, a third line.'

There you have it.你有它。

So TL/DR: Files have cursors and remember where they are.所以 TL/DR:文件有游标并记住它们的位置。 Think of the file handle as a thing that remembers where the file is, yes, but also remembers where in the file your program is.将文件句柄视为记住文件位置的东西,是的,但也记住程序在文件中的位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM