简体   繁体   English

如何在Perl中索引一堆文件?

[英]How can I index a bunch of files in Perl?

I'm trying to clean up a database by first finding unreferenced objects. 我正在尝试通过首先查找未引用的对象来清理数据库。 I have extracted all the database objects into a list, and all the ddl code into files, I also have all the Java source code for the project. 我已经将所有数据库对象提取到一个列表中,并且将所有ddl代码提取到了文件中,我还拥有了该项目的所有Java源代码。

Basically what I want to do (preferably in Perl as it's the scripting language that I'm most familiar with) is to somehow index the contents of all the extracted database ddl and Java files (to speed up the search), step through the database object list and then search through all the files (using the index) to see if those objects are referenced anywhere and create a report. 基本上,我想做的事情(最好是在Perl中,因为它是我最熟悉的脚本语言)是以某种方式索引所有提取的数据库ddl和Java文件的内容(以加快搜索速度),逐步遍历数据库对象列表,然后搜索所有文件(使用索引)以查看这些对象是否在任何地方被引用并创建报告。

If you could point me in the right direction to find something that indexes all those files in a way that I can search them (preferably in Perl) I would greatly appreciate it. 如果您能为我指明正确的方向,以便找到一种可以对所有这些文件进行索引的索引,并且可以搜索它们(最好在Perl中),那么我将不胜感激。 The key here is to be able to do this programatically, not manually (using something like Google desktop search). 此处的关键是能够以编程方式执行此操作,而不是手动执行此操作(使用类似Google桌面搜索的操作)。

Break the task down into its steps and start at the beginning. 将任务分解为几个步骤,并从头开始。 First, what does a record look like, and what information in it connects it to another record? 首先,一条记录是什么样的,并且其中的哪些信息将其连接到另一条记录? Parse that record, store its unique identifier and a list of the things it references. 解析该记录,存储其唯一标识符和所引用内容的列表。

Once you have that list, invert it. 获得该列表后,将其反转。 For each reference, create a list of the objects referenced. 对于每个引用,创建一个引用对象的列表。 Count them by their identifier. 按其标识符计数。 You should be able to get the ones whose count is zero. 您应该能够得到计数为零的计数器。

That's a very general answer, but you asked a very general question. 这是一个非常笼统的答案,但是您问了一个非常笼统的问题。 If you are having trouble, break it down into just one of those steps and ask a more specific question, supplying sample data and the code you've tried so far. 如果遇到问题,请将其分解为其中一个步骤,并提出一个更具体的问题,提供示例数据和到目前为止已尝试的代码。

Good luck, 祝好运,

An interesting module you might use to do what you want is KinoSearch, it provides you the kind of indexing you said to be looking for. 一个有趣的模块,您可以用来做您想做的事情,是KinoSearch,它为您提供了您正在寻找的索引。 Then you can go through the object identifiers and check if there are references to it. 然后,您可以浏览对象标识符并检查是否有对其的引用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM