简体繁体中英

Improve speed when reading very large files in Python

原文 2016-04-05 14:06:08 6 1 python/ for-loop/ large-files/ python-multiprocessing/ cpu-speed

So I'm running multiple functions, each function takes a section out of the million line .txt file. Each function has a for loop that runs through every line in that section of million line file.

It takes info from those lines to see if it matches info in 2 other files, one about 50,000-100,000 lines long, the other about 500-1000 lines long. I checked if the lines match by running for loops through the other 2 files. Once the info matches I write the output to a new file, all functions write to the same file. The program will produce about 2,500 lines a minute, but will slow down the longer it runs. Also, when I run one of the function, it does in about 500 a minute, but when I do it with 23 other processes it only makes 2500 a minute, why is that?

Does anyone know why that would happen? Anyway, I could import something to make the program run/read through files faster, I am already using the with "as file1:" method.

Can the multi-processes be redone to run faster?

1 answers

The thread can only use your ressources. 4 cores = 4 thread with full ressource. There are a few cases where having more thread can improve performance, but this is not the case for you. So keep the thread count to the number of cores you have.

Also, because you have a concurrent access to a file, you need a lock on this file which will slow down the process a bit.

What could be improve however is your code to compare the string, but that is another question.

How to improve the speed of reading multiple csv files in python

Improve the speed of for loop for readline of very big files

How to improve speed of the program for large data in python

Improve speed of python double loop of large lists

improve speed when reading a binary file

Reading a very large file word by word in Python

Improve performance reading and writing txt files with python

Reading large compressed files in python

reading large number of files in python

disable python file caching when reading large amount of files or lmdb

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to improve the speed of reading multiple csv files in python Improve the speed of for loop for readline of very big files How to improve speed of the program for large data in python Improve speed of python double loop of large lists improve speed when reading a binary file Reading a very large file word by word in Python Improve performance reading and writing txt files with python Reading large compressed files in python reading large number of files in python disable python file caching when reading large amount of files or lmdb

Related Tags

Improve speed when reading very large files in Python

Question

1 answers

solution1 0 2016-04-05 14:11:58

solution1
0 2016-04-05 14:11:58