简体繁体中英

Fast iterative file reading in python

原文 2020-11-07 15:31:52 5 1 python/ pandas/ dataframe/ memory/ chunking

I asked a question here about how to read in a very large file to python, and I got a response based on zip_longest.

The problem is that this solution is extremely slow - it took keras' model.predict >2 hours to process 200,000 lines in a file which normally takes <3 minutes when the file is loaded directly into memory, and I want to be able to process files 5x this size.

I've since found the chunking functions in pandas but I don't understand how to load a chunk of a file, reshape the data and then use it using these methods, and I also don't know if this will be the fastest way of reading and using the data in a very large file.

Any fast solutions to this problem are welcome.

1 answers

If you are looking for fast performing iterative python functions, you should check out the itertools package + documentation. Im pretty sure it doesn't get much faster than that.

But be aware that - if you neglect any kind of preprocessing or reshaping- you will hit a maximum of performance when dealing with large datasets. Just imagine your 2e5 lines file contains only one character (1 Byte) of information. That still makes 200 MB of information to read, which is the lower bound imaginable for you, if i get that correctly. So you will have to face long interpreting times if you get that amount to 3 or 4 GB of information in one go.

Fast reading of gzip (text file) using io.BufferedReader in Python 3

python twice as fast than pypy reading/iterating a file

Fast reading and interpreting binary file

Python: Writing csv file in a loop (iterative way)

fast file loading on python

Python - fast file search

Fast data reading from text file in numpy

Python fast static file serving

Python: fast iteration through file

Fast iterative changes in pandas dataframe groups

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Fast reading of gzip (text file) using io.BufferedReader in Python 3 python twice as fast than pypy reading/iterating a file Fast reading and interpreting binary file Python: Writing csv file in a loop (iterative way) fast file loading on python Python - fast file search Fast data reading from text file in numpy Python fast static file serving Python: fast iteration through file Fast iterative changes in pandas dataframe groups

Related Tags

Fast iterative file reading in python

Question

1 answers

solution1 0 2020-11-07 15:40:48

solution1
0 2020-11-07 15:40:48