使用python和numpy加载文件的最快方法是什么？

Question

I want to train a model and I have a big dataset for training. 我想训练一个模型，但是我有一个很大的训练数据集。 Its size is more than 20gb. 它的大小超过20GB。 But when I try to read it, it took so long time. 但是当我尝试阅读它时，花费了很长时间。 I mean to load it on memory. 我的意思是将其加载到内存中。

with open(file_path, newline='', encoding='utf-8') as f:
    reader = csv.reader(f)
    for i,row in enumerate(islice(reader,0,1)):
        train_data = np.array(makefloat(row))[None,:]
    for i,row in enumerate(reader):
        train_data = np.vstack((train_data,np.array(makefloat(row))[None,:]))

It has 43 floats for each line. 每行有43个浮点数。

It took so long time, I tested it for just 100,000 lines and it took 20 mins. 它花了很长时间，我只测试了100,000条线，花了20分钟。

I think I'm doing wrong. 我想我做错了。 How can I make it faster? 我怎样才能使其更快？

Answer 1

Its' not good to read the entire file. 读取整个文件不是很好。 You can use something like Dask which will read your file in chunks and will be faster. 您可以使用诸如Dask之类的工具来读取文件，并且速度更快。 Dask 达斯克

使用python和numpy加载文件的最快方法是什么？

问题描述

1 个解决方案

解决方案1
0 2019-02-25 03:57:31

使用python和numpy加载文件的最快方法是什么？

问题描述

1 个解决方案

解决方案1 0 2019-02-25 03:57:31

解决方案1
0 2019-02-25 03:57:31