简体   繁体   中英

Searching through hundreds of HTML files

I am not sure how to start solving this problem so any suggestions will be of help.

My client has a number of static HTML pages running into hundreds of files. These under go updates every now and then and are overwritten on the website. We list these pages on the website via a simple left hand side explorer mimicking the folder structure in which these files are given to us.

We now want to give the ability to search these files and display matching results. Doing a brute search through such a large number of files is going to be very time consuming. Matching related words (for example plurals, misspellings etc) is also desirable. Showing results in the order of popularity would be a useful feature. I am not sure how to get started on this. Should we pre-process the html files after every update for instance? Any recommended indexing libraries available in .NET? What little programming has been done on the website has been done using C#.

Thanks MS

Lucene.net可能会引起关注。

I´d first write a simple program to transfer all those files contents to a database. Then you could implement your search properly without having to read all files every time.

I am not sure if its within your budget, but Google can do it for you as user1161318 pointed out.

Try Google Site Search - http://www.google.co.uk/enterprise/search/products_gss.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM