從Python中的csv文件的特定行進行迭代

Question

我有一個包含數百萬行的csv文件。 我想從10,000,000行開始迭代。 目前，我有代碼：

    with open(csv_file, encoding='UTF-8') as f: 
        r = csv.reader(f)
        for row_number, row in enumerate(r):    
            if row_number < 10000000:
                continue
            else:
                process_row(row)

這可行，但是需要幾秒鍾才能運行感興趣的行。 大概所有不需要的行都不必要地加載到python中，從而減慢了運行速度。 有沒有一種方法可以在某一行上開始迭代過程-即無需開始讀取數據。

Answer 1

您可以使用islice ：

from itertools import islice

with open(csv_file, encoding='UTF-8') as f:
    r = csv.reader(f)
    for row in islice(r,  10000000, None):
            process_row(row)

它仍然遍歷所有行，但效率更高。

您還可以使用消耗配方，該消耗配方 以C速度調用消耗迭代器的函數，並在將其傳遞給csv.reader 之前在文件對象上對其進行調用，因此也避免了不必要地使用讀取器處理這些行：

import collections
from itertools import islice
def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)


with open(csv_file, encoding='UTF-8') as f:
    consume(f, 9999999)
    r = csv.reader(f)
    for row  in r:
          process_row(row)

正如Shadowranger所說，如果文件可以包含嵌入的換行符，那么您將不得不消耗讀取器並傳遞newline=""但是如果不是這種情況，則請使用do消耗文件對象，因為性能差異會很大，尤其是如果您擁有很多專欄。

從Python中的csv文件的特定行進行迭代

問題描述

1 個解決方案

解決方案1
4 已采納 2016-06-27 22:48:18

從Python中的csv文件的特定行進行迭代

問題描述

1 個解決方案

解決方案1 4 已采納 2016-06-27 22:48:18

解決方案1
4 已采納 2016-06-27 22:48:18