简体   繁体   English

在 Python 中读取非常大的文件时提高速度

[英]Improve speed when reading very large files in Python

So I'm running multiple functions, each function takes a section out of the million line .txt file.所以我正在运行多个函数,每个函数从百万行 .txt 文件中取出一部分。 Each function has a for loop that runs through every line in that section of million line file.每个函数都有一个 for 循环,它运行在百万行文件的那部分中的每一行。

It takes info from those lines to see if it matches info in 2 other files, one about 50,000-100,000 lines long, the other about 500-1000 lines long.它从这些行中获取信息以查看它是否与其他 2 个文件中的信息匹配,一个大约 50,000-100,000 行长,另一个大约 500-1000 行长。 I checked if the lines match by running for loops through the other 2 files.我通过在其他 2 个文件中运行 for 循环来检查这些行是否匹配。 Once the info matches I write the output to a new file, all functions write to the same file.一旦信息匹配,我将输出写入一个新文件,所有函数都写入同一个文件。 The program will produce about 2,500 lines a minute, but will slow down the longer it runs.该程序每分钟将生成大约 2,500 行,但运行时间越长,速度就会越慢。 Also, when I run one of the function, it does in about 500 a minute, but when I do it with 23 other processes it only makes 2500 a minute, why is that?此外,当我运行其中一个函数时,它每分钟运行大约 500 次,但是当我使用其他 23 个进程运行时,它每分钟只运行 2500 次,这是为什么呢?

Does anyone know why that would happen?有谁知道为什么会这样? Anyway, I could import something to make the program run/read through files faster, I am already using the with "as file1:" method.无论如何,我可以导入一些东西来使程序运行/读取文件更快,我已经在使用with "as file1:"方法。

Can the multi-processes be redone to run faster?可以重做多进程以更快地运行吗?

The thread can only use your ressources.该线程只能使用您的资源。 4 cores = 4 thread with full ressource. 4 核 = 4 线程,资源充足。 There are a few cases where having more thread can improve performance, but this is not the case for you.在某些情况下,拥有更多线程可以提高性能,但对您而言并非如此。 So keep the thread count to the number of cores you have.因此,将线程数保持为您拥有的内核数。

Also, because you have a concurrent access to a file, you need a lock on this file which will slow down the process a bit.此外,因为您可以并发访问一个文件,所以您需要锁定这个文件,这会稍微减慢进程的速度。

What could be improve however is your code to compare the string, but that is another question.然而,可以改进的是您比较字符串的代码,但这是另一个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM