简体   繁体   English

搜索数百个HTML文件

[英]Searching through hundreds of HTML files

I am not sure how to start solving this problem so any suggestions will be of help. 我不知道如何开始解决这个问题所以任何建议都会有所帮助。

My client has a number of static HTML pages running into hundreds of files. 我的客户端有许多静态HTML页面,这些页面运行成数百个文件。 These under go updates every now and then and are overwritten on the website. 这些不时更新,并在网站上被覆盖。 We list these pages on the website via a simple left hand side explorer mimicking the folder structure in which these files are given to us. 我们通过简单的左侧浏览器在网站上列出这些页面,模仿文件夹结构,这些文件被提供给我们。

We now want to give the ability to search these files and display matching results. 我们现在希望能够搜索这些文件并显示匹配结果。 Doing a brute search through such a large number of files is going to be very time consuming. 通过如此大量的文件进行粗暴搜索将非常耗时。 Matching related words (for example plurals, misspellings etc) is also desirable. 匹配相关词(例如复数,拼写错误等)也是可取的。 Showing results in the order of popularity would be a useful feature. 按流行度顺序显示结果将是一个有用的功能。 I am not sure how to get started on this. 我不知道如何开始这个。 Should we pre-process the html files after every update for instance? 我们应该在每次更新后预处理html文件吗? Any recommended indexing libraries available in .NET? .NET中提供的任何推荐的索引库? What little programming has been done on the website has been done using C#. 使用C#在网站上进行了少量编程。

Thanks MS 谢谢MS

Lucene.net可能会引起关注。

I´d first write a simple program to transfer all those files contents to a database. 我首先编写一个简单的程序,将所有这些文件内容传输到数据库。 Then you could implement your search properly without having to read all files every time. 然后,您可以正确实现搜索,而无需每次都读取所有文件。

I am not sure if its within your budget, but Google can do it for you as user1161318 pointed out. 我不确定它是否在您的预算范围内,但Google可以为您执行,因为用户1161318指出。

Try Google Site Search - http://www.google.co.uk/enterprise/search/products_gss.html 试试Google Site Search - http://www.google.co.uk/enterprise/search/products_gss.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM