[英]Iterating through dictionary, 5 rows at a time
I am trying to open a csv file with csv.DictReader, read in just the first 5 rows of data, perform the primary process of my script, then read in the next 5 rows and do the same for them.我试图用 csv.DictReader 打开一个 csv 文件,只读取前 5 行数据,执行我的脚本的主要过程,然后读取接下来的 5 行并对它们执行相同的操作。 Rinse and repeat.
冲洗并重复。
I believe I have a method that works, however I am having issues with the last lines of the data not processing.我相信我有一种有效的方法,但是我遇到了未处理数据的最后几行的问题。 I know I need to modify my if statement so that it also checks for if I'm at the end of the file, but am having trouble finding a way to do that.
我知道我需要修改我的 if 语句,以便它也检查我是否在文件的末尾,但是我找不到方法来做到这一点。 I've found methods online, but they involve reading in the whole file to get a row count but doing so would defeat the purpose of this script as I'm dealing with memory issues.
我在网上找到了方法,但它们涉及读取整个文件以获取行数,但这样做会破坏此脚本的目的,因为我正在处理内存问题。
Here is what I have so far:这是我到目前为止所拥有的:
import csv
count = 0
data = []
with open('test.csv') as file:
reader = csv.DictReader(file)
for row in reader:
count +=1
data.append(row)
if count % 5 == 0 or #something to check for the end of the file:
#do stuff
data = []
Thank you for the help!感谢您的帮助!
You can use the chunksize
argument when reading in the csv.您可以在读取 csv 时使用
chunksize
参数。 This will step by step read in the number of lines:这将逐步读取行数:
import pandas as pd
reader = pd.read_csv('test.csv', chunksize=5)
for df in reader:
# do stuff
You can handle the remaining lines after the for
loop body.您可以处理
for
循环体之后的剩余行。 You can also use the more pythonic enumerate
.您还可以使用更 Pythonic 的
enumerate
。
import csv
data = []
with open('test.csv') as file:
reader = csv.DictReader(file)
for count, row in enumerate(reader, 1):
data.append(row)
if count % 5 == 0:
# do stuff
data = []
print('handling remaining lines at end of file')
print(data)
considering the file考虑文件
a,b
1,1
2,2
3,3
4,4
5,5
6,6
7,7
outputs产出
handling remaining lines at end of file
[OrderedDict([('a', '6'), ('b', '6')]), OrderedDict([('a', '7'), ('b', '7')])]
This is one approach using the iterator这是使用迭代器的一种方法
Ex:前任:
import csv
with open('test.csv') as file:
reader = csv.DictReader(file)
value = True
while value:
data = []
for _ in range(5): # Get 5 rows
value = next(reader, False)
if value:
data.append(value)
print(data) #List of 5 elements
Staying along the lines of what you wrote and not including any other imports:保持你写的东西,不包括任何其他进口:
import csv
data = []
with open('test.csv') as file:
reader = csv.DictReader(file)
for row in reader:
data.append(row)
if len(data) > 5:
del data[0]
if len(data) == 5:
# Do something with the 5 elements
print(data)
The if statements allow the array to be loaded with 5 elements before processing on the begins. if 语句允许在开始处理之前使用 5 个元素加载数组。
class ZeroItterNumberException(Exception):
pass
class ItterN:
def __init__(self, itterator, n):
if n<1:
raise ZeroItterNumberException("{} is not a valid number of rows.".format(n))
self.itterator = itterator
self.n = n
self.cache = []
def __iter__(self):
return self
def __next__(self):
self.cache.append(next(self.itterator))
if len(self.cache) < self.n:
return self.__next__()
if len(self.cache) > self.n:
del self.cache[0]
if len(self.cache) == 5:
return self.cache
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.