简体   繁体   中英

Improve speed when reading very large files in Python

So I'm running multiple functions, each function takes a section out of the million line .txt file. Each function has a for loop that runs through every line in that section of million line file.

It takes info from those lines to see if it matches info in 2 other files, one about 50,000-100,000 lines long, the other about 500-1000 lines long. I checked if the lines match by running for loops through the other 2 files. Once the info matches I write the output to a new file, all functions write to the same file. The program will produce about 2,500 lines a minute, but will slow down the longer it runs. Also, when I run one of the function, it does in about 500 a minute, but when I do it with 23 other processes it only makes 2500 a minute, why is that?

Does anyone know why that would happen? Anyway, I could import something to make the program run/read through files faster, I am already using the with "as file1:" method.

Can the multi-processes be redone to run faster?

The thread can only use your ressources. 4 cores = 4 thread with full ressource. There are a few cases where having more thread can improve performance, but this is not the case for you. So keep the thread count to the number of cores you have.

Also, because you have a concurrent access to a file, you need a lock on this file which will slow down the process a bit.

What could be improve however is your code to compare the string, but that is another question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM