简体   繁体   中英

how to index a folder using lucene.net

I am trying to develop a search engine in asp.net using lucene.net. I go through many tutorials and pages to get the appropriate results but i couldn't. Actually I have a folder with some files(doc,ppt,pdf,excel etc..) and i want to search within that folder only for contents and if the results are not found within that folder then ask user to search on web.

for example i have a folder with thousands of files @ C:\\test and if user searched for "miller" then it should search into every document. if results are found then it should display results like that

Searched text file no of occurences miller C:\\test\\1\\file.doc 5 miller C:\\test\\1\\11\\new.doc 2

please help me i am not getting appropriate results .

Lucene / Lucene.NET is just an indexing engine, you still have to extract the text from the file types that you want to support yourself -on Windows you can use the IFilter interface for many file types, if you have Acrobat Reader 7+ installed there should be built in support for IFilter for PDF files. As for the indexing part itself there are many, many samples out there.

Also see this thread What's a good method for extracting text from a PDF using C# or classic ASP (VBScript)?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM