简体   繁体   English

如何解决非常慢的python代码

[英]How to troubleshoot very slow python code

I have a python script that processes a large number of files. 我有一个处理大量文件的python脚本。

For each file, the script goes line by line, searching for specific RegEx patters. 对于每个文件,脚本逐行查找特定的RegEx模式。 If a pattern is found, the line is copied into the log file. 如果找到了模式,则将该行复制到日志文件中。

As sample input, I'm passing it a folder with 42 small files and 3 large files (~1500 lines each). 作为示例输入,我将一个包含42个小文件和3个大文件(每个〜1500行)的文件夹传递给它。

The scripts processes the first two large files very fast - it needs a few seconds for them. 脚本非常快速地处理了前两个大文件-它们需要几秒钟的时间。 But when it reaches the third large file, it slows down, and it goes slower and slower. 但是,当到达第三大文件时,它会变慢,并且变得越来越慢。

In the middle of the third large file, it needs a whole second per line, and it keeps slowing down. 在第三个大文件的中间,每行需要整整一秒钟,并且它一直在减慢速度。 If I don't stop it, the whole run takes an hour! 如果我不停止,整个过程将花费一个小时!

I added debugging code that prints out the line numbers - that's how I noticed that it keeps churning slower and slower, and it doesn't get stuck somewhere. 我添加了打印行号的调试代码-这就是我注意到它不断搅拌的越来越慢的方法,并且不会卡在某处。

I have 20 years experience with c, and many other languages, but I'm a python beginner. 我有20年使用c和许多其他语言的经验,但是我是python初学者。 What are steps that I can take to troubleshoot this script? 我可以采取什么步骤来对此脚本进行故障排除?

If your code is a script you can run cProfile as shown in this answer 如果您的代码是脚本,则可以按照以下答案运行cProfile

python -m cProfile myscript.py

I do not know if this gives you the granularity you wanted, otherwise have a look at The Python Profilers 我不知道这是否为您提供了所需的粒度,否则请查看Python Profilers

As for the actual reason your code runs slow I suspect either catastrophic backtracking or that you open and append to your log file every time the pattern matches aka. 至于您的代码运行缓慢的实际原因,我怀疑是灾难性的回溯,还是每次模式匹配时都打开并追加到日志文件。 Shlemiel The Painter Shlemiel画家

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM