[英]How can I open a csv file in python, and read one line at a time, without loading the whole csv file in memory?
I have a csv file of size that would not fit in the memory of my machine.我有一个大小不适合我机器内存的 csv 文件。 So I want to open the csv file and then read it's rows one at a time.
所以我想打开 csv 文件,然后一次读取它的行。 I basically want to make a python generator that yields single rows from the csv.
我基本上想制作一个从csv生成单行的python生成器。
Thanks in advance!提前致谢! :)
:)
with open(filename, "r") as file:
for line in file:
doanything()
Python is lazy whenever possible.只要有可能,Python 就是懒惰的。 File objects are generators and do not load the entire file but only one line at a time.
文件对象是生成器,不会加载整个文件,而是一次只加载一行。
My personal preference for doing this is with csv.DictReader我个人更喜欢使用csv.DictReader
You set it up as an object, with pointers/parameters, and then to access the file one row at a time, you just iterate over it with next
and it returns a dictionary containing the named field key, value pairs in your csv file.您将其设置为带有指针/参数的对象,然后一次访问文件一行,您只需使用
next
对其进行迭代,它会返回一个字典,其中包含 csv 文件中的命名字段键、值对。
eg例如
import csv
csvfile = open('names.csv')
my_reader = csv.DictReader(csvfile)
first_row = next(my_reader)
for row in my_reader:
print ( [(k,v) for k,v in row.items() ] )
csvfile.close()
See the linked docs for parameter usage etc - it's fairly straightforward.有关参数使用等信息,请参阅链接的文档 - 这相当简单。
Solution:解决方案:
You can use chunksize
param available in pandas read_csv function您可以使用 pandas read_csv 函数中可用的
chunksize
参数
chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
print(type(chunk))
# CODE HERE
set chunksize
to 1 and it should take care of your problem statement.将
chunksize
设置为 1,它应该会处理您的问题陈述。
python generator that yields single rows from the csv.从 csv 生成单行的 python 生成器。
This sounds like you want csv.reader from built-in csv
module.这听起来像是您想要来自内置
csv
模块的csv.reader 。 You will get one list for each line in file.您将获得文件中每一行的一个列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.