简体繁体 English

如何使用lucene.net索引文件夹

[英]how to index a folder using lucene.net

原文 2010-12-15 13:09:07 1 1 c#/ asp.net/ vb.net/ lucene/ lucene.net

I am trying to develop a search engine in asp.net using lucene.net. 我正在尝试使用lucene.net在asp.net中开发一个搜索引擎。 I go through many tutorials and pages to get the appropriate results but i couldn't. 我浏览了许多教程和页面，以获得适当的结果，但是我做不到。 Actually I have a folder with some files(doc,ppt,pdf,excel etc..) and i want to search within that folder only for contents and if the results are not found within that folder then ask user to search on web. 实际上，我有一个包含一些文件（doc，ppt，pdf，excel等的文件夹），我只想在该文件夹中搜索内容，如果在该文件夹中未找到结果，则请用户在网上搜索。

for example i have a folder with thousands of files @ C:\\test and if user searched for "miller" then it should search into every document. 例如，我有一个包含数千个文件的文件夹@ C：\\ test，如果用户搜索“ miller”，则它应该搜索每个文档。 if results are found then it should display results like that 如果找到结果，则应显示类似结果

Searched text file no of occurences miller C:\\test\\1\\file.doc 5 miller C:\\test\\1\\11\\new.doc 2 搜索的文本文件没有发生的情况miller C：\\ test \\ 1 \\ file.doc 5 miller C：\\ test \\ 1 \\ 11 \\ new.doc 2

please help me i am not getting appropriate results . 请帮助我，我没有得到适当的结果。

1 个解决方案

Lucene / Lucene.NET is just an indexing engine, you still have to extract the text from the file types that you want to support yourself -on Windows you can use the IFilter interface for many file types, if you have Acrobat Reader 7+ installed there should be built in support for IFilter for PDF files. Lucene / Lucene.NET只是一个索引引擎，您仍然必须从要支持自己的文件类型中提取文本-在Windows上，如果安装了Acrobat Reader 7+，则可以对许多文件类型使用IFilter界面应该内置了对IFilter for PDF文件的支持。 As for the indexing part itself there are many, many samples out there. 至于索引部分本身，那里有很多很多样本。

Also see this thread What's a good method for extracting text from a PDF using C# or classic ASP (VBScript)? 另请参见此线程。什么是使用C＃或经典ASP（VBScript）从PDF提取文本的好方法？