简体   繁体   English

遍历字典,一次 5 行

[英]Iterating through dictionary, 5 rows at a time

I am trying to open a csv file with csv.DictReader, read in just the first 5 rows of data, perform the primary process of my script, then read in the next 5 rows and do the same for them.我试图用 csv.DictReader 打开一个 csv 文件,只读取前 5 行数据,执行我的脚本的主要过程,然后读取接下来的 5 行并对它们执行相同的操作。 Rinse and repeat.冲洗并重复。

I believe I have a method that works, however I am having issues with the last lines of the data not processing.我相信我有一种有效的方法,但是我遇到了未处理数据的最后几行的问题。 I know I need to modify my if statement so that it also checks for if I'm at the end of the file, but am having trouble finding a way to do that.我知道我需要修改我的 if 语句,以便它也检查我是否在文件的末尾,但是我找不到方法来做到这一点。 I've found methods online, but they involve reading in the whole file to get a row count but doing so would defeat the purpose of this script as I'm dealing with memory issues.我在网上找到了方法,但它们涉及读取整个文件以获取行数,但这样做会破坏此脚本的目的,因为我正在处理内存问题。

Here is what I have so far:这是我到目前为止所拥有的:

import csv
count = 0
data = []
with open('test.csv') as file:
    reader = csv.DictReader(file)
    
    for row in reader:
        count +=1
        data.append(row)

        if count % 5 == 0 or #something to check for the end of the file:
            #do stuff
            data = []
        

Thank you for the help!感谢您的帮助!

You can use the chunksize argument when reading in the csv.您可以在读取 csv 时使用chunksize参数。 This will step by step read in the number of lines:这将逐步读取行数:

import pandas as pd
reader = pd.read_csv('test.csv', chunksize=5)
for df in reader:
    # do stuff

You can handle the remaining lines after the for loop body.您可以处理for循环体之后的剩余行。 You can also use the more pythonic enumerate .您还可以使用更 Pythonic 的enumerate

import csv

data = []
with open('test.csv') as file:
    reader = csv.DictReader(file)
    for count, row in enumerate(reader, 1):
        data.append(row)
        if count % 5 == 0:
            # do stuff
            data = []

    print('handling remaining lines at end of file')
    print(data)

considering the file考虑文件

a,b
1,1
2,2
3,3
4,4
5,5
6,6
7,7

outputs产出

handling remaining lines at end of file
[OrderedDict([('a', '6'), ('b', '6')]), OrderedDict([('a', '7'), ('b', '7')])]

This is one approach using the iterator这是使用迭代器的一种方法

Ex:前任:

import csv

with open('test.csv') as file:
    reader = csv.DictReader(file)
    
    value = True
    while value:
        data = []
        for _ in range(5):             # Get 5 rows
            value = next(reader, False) 
            if value:
                data.append(value)
        print(data)   #List of 5 elements

Staying along the lines of what you wrote and not including any other imports:保持你写的东西,不包括任何其他进口:

import csv
data = []
with open('test.csv') as file:
    reader = csv.DictReader(file)

    for row in reader:
        data.append(row)
        if len(data) > 5:
            del data[0]
        if len(data) == 5:
            # Do something with the 5 elements
            print(data)

The if statements allow the array to be loaded with 5 elements before processing on the begins. if 语句允许在开始处理之前使用 5 个元素加载数组。

class ZeroItterNumberException(Exception):
    pass
class ItterN:
    def __init__(self, itterator, n):
        if n<1:
            raise ZeroItterNumberException("{} is not a valid number of rows.".format(n))
        self.itterator = itterator
        self.n = n
        self.cache = []

    def __iter__(self):
        return self

    def __next__(self):
        self.cache.append(next(self.itterator))
        if len(self.cache) < self.n:
            return self.__next__()
        if len(self.cache) > self.n:
            del self.cache[0]
        if len(self.cache) == 5:
            return self.cache

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM