简体   繁体   中英

Fast iterative file reading in python

I asked a question here about how to read in a very large file to python, and I got a response based on zip_longest.

The problem is that this solution is extremely slow - it took keras' model.predict >2 hours to process 200,000 lines in a file which normally takes <3 minutes when the file is loaded directly into memory, and I want to be able to process files 5x this size.

I've since found the chunking functions in pandas but I don't understand how to load a chunk of a file, reshape the data and then use it using these methods, and I also don't know if this will be the fastest way of reading and using the data in a very large file.

Any fast solutions to this problem are welcome.

If you are looking for fast performing iterative python functions, you should check out the itertools package + documentation. Im pretty sure it doesn't get much faster than that.

But be aware that - if you neglect any kind of preprocessing or reshaping- you will hit a maximum of performance when dealing with large datasets. Just imagine your 2e5 lines file contains only one character (1 Byte) of information. That still makes 200 MB of information to read, which is the lower bound imaginable for you, if i get that correctly. So you will have to face long interpreting times if you get that amount to 3 or 4 GB of information in one go.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM