简体   繁体   English

清理庞大的Perl Codebase

[英]Cleanup huge Perl Codebase

I am currently working on a roughly 15 years old web application. 我目前正在开发一个大约15年的Web应用程序。

It contains mainly CGI perl scripts with HTML::Template templates. 它主要包含带有HTML :: Template模板的CGI perl脚本。

It has over 12 000 files and roughly 260 MB of total code. 它有超过12 000个文件和大约260 MB的总代码。 I estimate that no more than 1500 perl scripts are needed and I want to get rid of all the unused code. 我估计不需要超过1500 perl脚本,我想摆脱所有未使用的代码。

There are practically no tests written for the code. 实际上没有为代码编写测试。

My questions are: 我的问题是:

  • Are you aware of any CPAN module that can help me get a list of only use d and require d modules? 您是否了解任何可以帮助我获取仅use d和require d模块的列表的CPAN模块?
  • What would be your approach if you'd want to get rid of all the extra code? 如果你想摆脱所有额外的代码,你的方法是什么?

I was thinking at the following approaches: 我在考虑以下方法:

  • try to override the use and require perl builtins with ones that output the loaded file name in a specific location 尝试覆盖userequire perl builtins使用在特定位置输出加载的文件名的perl
  • override the warnings and/or strict modules import function and output the file name in the specific location 覆盖warnings和/或strict模块import功能,并在特定位置输出文件名
  • study the Devel::Cover perl module and take the same approach and analyze the code when doing manual testing instead of automated tests 研究Devel::Cover perl模块并采用相同的方法并在进行手动测试而不是自动测试时分析代码
  • replace the perl executable with a custom one, which will log each name of file it reads (I don't know how to do that yet) 用自定义的perl可执行文件替换perl可执行文件,它将记录它读取的文件的每个名称(我不知道该怎么做)
  • some creative use of lsof (?!?) 一些创造性地使用lsof (?!?)

Devel::Modlist may give you what you need, but I have never used it. Devel :: Modlist可能会为您提供所需,但我从未使用过它。

The few times I have needed to do somehing like this I have opted for the more brute force approach of inspecting %INC at the end the program. 我需要做几次像这样的事情,我选择了在程序结束时检查%INC的更强力方法。

END {
    open my $log_fh, ...;
    print $log_fh "$_\n" for sort keys %INC;
}

As a first approximation, I would simply run 作为第一个近似值,我会简单地运行

egrep -r '\<(use|require)\>' /path/to/source/*

Then spend a couple of days cleaning up the output from that. 然后花几天时间清理那里的输出。 That will give you a list of all of the modules used or required. 这将为您提供所使用或要求的所有模块的列表。

You might also be able to play around with @INC to exclude certain library paths. 您也可以使用@INC来排除某些库路径。

If you're trying to determine execution path, you might be able to run the code through the debugger with 'trace' (ie 't' in the debugger) turned on, then redirect the output to a text file for further analysis. 如果您正在尝试确定执行路径,则可以通过调试器运行代码,并打开“跟踪”(即调试器中的“t”),然后将输出重定向到文本文件以进行进一步分析。 I know that this is difficult when running CGI... 我知道运行CGI时很难...

Assuming the relevant timestamps are turned on, you could check access times on the various script files - that should rule out any top-level script files that aren't being used. 假设打开了相关的时间戳,您可以检查各种脚本文件的访问时间 - 这应该排除任何未使用的顶级脚本文件。

Might be worth adding some instrumentation to CGI.pm to log the current script-name ($0) to see what's happening. 可能值得为CGI.pm添加一些工具来记录当前的脚本名称($ 0)以查看发生了什么。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM