简体   繁体   中英

How to store HUGE python list as a file and then read the file as a list in python?

I'm doing a machine learning project, my dataset is composed of thousands of x-ray pictures, every time I want to work on this project I have to reload the pictures and pre-process them, which is very time-consuming so I want to read my images once and write the list of thousands of 224x224x3 matrices in a file that I can load everytime I need to work on this project.

I've already found some functions that allow me to write/read lists, but they don't seem to write the whole matrices but only a part:

This is the code I used to write the file:

with open(obj_dir +"train_data_p", "w") as file:
  file.write(str(train_data_p))

This is what I get when I open my training dataset file with notepad, as you can see from the "...," parts, it's showing only snippets of matrices:

[array([[[0.26666668, 0.26666668, 0.26666668],
        [0.32156864, 0.32156864, 0.32156864],
        [0.33333334, 0.33333334, 0.33333334],
        ...,
        [0.75686276, 0.75686276, 0.75686276],
        [0.77254903, 0.77254903, 0.77254903],
        [0.7764706 , 0.7764706 , 0.7764706 ]],
   [[0.27058825, 0.27058825, 0.27058825],
    [0.28627452, 0.28627452, 0.28627452],
    [0.31764707, 0.31764707, 0.31764707],
    ...,
    [0.7607843 , 0.7607843 , 0.7607843 ],
    [0.7647059 , 0.7647059 , 0.7647059 ],
    [0.8039216 , 0.8039216 , 0.8039216 ]],

   [[0.3019608 , 0.3019608 , 0.3019608 ],
    [0.34901962, 0.34901962, 0.34901962],
    [0.27058825, 0.27058825, 0.27058825],
    ...,
    [0.78431374, 0.78431374, 0.78431374],
    [0.7764706 , 0.7764706 , 0.7764706 ],
    [0.78431374, 0.78431374, 0.78431374]],

   ...,

   [[0.1254902 , 0.1254902 , 0.1254902 ],
    [0.1254902 , 0.1254902 , 0.1254902 ],
    [0.12156863, 0.12156863, 0.12156863],

How can I write/store the whole dataset so I don't have to read and process the images everytime? Help me please!

You can do it by numpy.save() and numpy.load() methods

import numpy as np
np.save('/tmp/123', np.array([[1, 2, 3], [4, 5, 6]]))
np.load('/tmp/123.npy')

The reason that you are seeing ellipsis in the file is because you are writing str(train_data_p) to the file, and not actual train_data_p object.

As pointed by other answers, there are numerous packages that help storing large data. If you are using numpy, this answer may help you too.

You can serialize your data using builtin modules easy.

We have different options list:

Or any other 3rd party serialization package available in pip.

More about serialization https://en.wikipedia.org/wiki/Serialization

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM