[英]What is the fastest way to load file using python and numpy?
I want to train a model and I have a big dataset for training. 我想训练一个模型,但是我有一个很大的训练数据集。 Its size is more than 20gb. 它的大小超过20GB。 But when I try to read it, it took so long time. 但是当我尝试阅读它时,花费了很长时间。 I mean to load it on memory. 我的意思是将其加载到内存中。
with open(file_path, newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for i,row in enumerate(islice(reader,0,1)):
train_data = np.array(makefloat(row))[None,:]
for i,row in enumerate(reader):
train_data = np.vstack((train_data,np.array(makefloat(row))[None,:]))
It has 43 floats for each line. 每行有43个浮点数。
It took so long time, I tested it for just 100,000 lines and it took 20 mins. 它花了很长时间,我只测试了100,000条线,花了20分钟。
I think I'm doing wrong. 我想我做错了。 How can I make it faster? 我怎样才能使其更快?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.