简体   繁体   English

如何使用lucene.net索引文件夹

[英]how to index a folder using lucene.net

I am trying to develop a search engine in asp.net using lucene.net. 我正在尝试使用lucene.net在asp.net中开发一个搜索引擎。 I go through many tutorials and pages to get the appropriate results but i couldn't. 我浏览了许多教程和页面,以获得适当的结果,但是我做不到。 Actually I have a folder with some files(doc,ppt,pdf,excel etc..) and i want to search within that folder only for contents and if the results are not found within that folder then ask user to search on web. 实际上,我有一个包含一些文件(doc,ppt,pdf,excel等的文件夹),我只想在该文件夹中搜索内容,如果在该文件夹中未找到结果,则请用户在网上搜索。

for example i have a folder with thousands of files @ C:\\test and if user searched for "miller" then it should search into every document. 例如,我有一个包含数千个文件的文件夹@ C:\\ test,如果用户搜索“ miller”,则它应该搜索每个文档。 if results are found then it should display results like that 如果找到结果,则应显示类似结果

Searched text file no of occurences miller C:\\test\\1\\file.doc 5 miller C:\\test\\1\\11\\new.doc 2 搜索的文本文件没有发生的情况miller C:\\ test \\ 1 \\ file.doc 5 miller C:\\ test \\ 1 \\ 11 \\ new.doc 2

please help me i am not getting appropriate results . 请帮助我,我没有得到适当的结果。

Lucene / Lucene.NET is just an indexing engine, you still have to extract the text from the file types that you want to support yourself -on Windows you can use the IFilter interface for many file types, if you have Acrobat Reader 7+ installed there should be built in support for IFilter for PDF files. Lucene / Lucene.NET只是一个索引引擎,您仍然必须从要支持自己的文件类型中提取文本-在Windows上,如果安装了Acrobat Reader 7+,则可以对许多文件类型使用IFilter界面应该内置了对IFilter for PDF文件的支持。 As for the indexing part itself there are many, many samples out there. 至于索引部分本身,那里有很多很多样本。

Also see this thread What's a good method for extracting text from a PDF using C# or classic ASP (VBScript)? 另请参见此线程。 什么是使用C#或经典ASP(VBScript)从PDF提取文本的好方法?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM