简体   繁体   English

缓存Lucene.net搜索结果

[英]Caching Lucene.net search results

I've used Lucene.net to implement search functionality (for both database content and uploaded documents) on several small websites with no problem. 我已经使用Lucene.net在几个小网站上实现搜索功能(包括数据库内容和上传文档),没有任何问题。 Now I've got a site where I'm indexing 5000+ documents (mainly PDFs) and the querying is becoming a bit slow. 现在我有一个网站,我索引5000多个文档(主要是PDF),查询变得有点慢。

I'm assuming the best way to speed it up would be to implement caching of some kind. 我假设加速它的最佳方法是实现某种缓存。 Can anyone give my any pointers / examples on where to start? 任何人都可以给我任何指针/示例从哪里开始? If you've got any other suggestions aside from caching (eg should I be using multiple indexes?) I'd like to hear those too. 如果你除了缓存之外还有其他任何建议(例如我应该使用多个索引吗?)我也想听听。

Edit: 编辑:

Dumb user error responsible for the slow querying. 负责缓慢查询的哑用户错误。 I was creating highlights for the entire results set at once, instead of just the 'page' I was displaying. 我一次创建了整个结果集的精彩集锦,而不仅仅是我正在显示的“页面”。 Oops. 哎呀。

I'm going to make a big assumption here and assume you're not hanging onto your index searchers in-between calls to query the index. 我将在这里做一个很大的假设,并假设你没有挂在你的索引搜索者之间查询索引的调用。

If that's true, then you should definitely share index searchers for all queries to your index. 如果这是真的,那么你肯定应该为索引的所有查询共享索引搜索者。 As the index becomes larger (and it doesn't really have to get very large for this to become a factor), rebuilding the index searcher will become more and more of an overhead. 随着索引变得越来越大(并且它实际上不必变得非常大以使其成为一个因素),重建索引搜索器将变得越来越多的开销。 To make this work correctly, you'll need to synchronise access to the query parser class (it isn't thread safe). 为了使其正常工作,您需要同步对查询解析器类的访问(它不是线程安全的)。

BTW, the Java docs are (I've found) just as applicable to the .net version. 顺便说一下,Java文档(我发现)就像适用于.net版本一样。

For more info on your problem, see here: http://wiki.apache.org/lucene-java/ImproveSearchingSpeed 有关您的问题的更多信息,请参阅此处: http//wiki.apache.org/lucene-java/ImproveSearchingSpeed

Be sure to optimize your indexes. 一定要优化索引。

Also, this is a quick/easy/effective way to implement caching: HttpRuntime.Cache.Add(...); 此外,这是一种快速/简单/有效的实现缓存的方法:HttpRuntime.Cache.Add(...);

You can use the ASP.Net cache from any type of project/library. 您可以使用任何类型的项目/库中的ASP.Net缓存。

Lucene uses its own internal "caching" mechanism to make index retrieval a fast operation. Lucene使用自己的内部“缓存”机制使索引检索成为一种快速操作。 I don't think caching is your issue here, though. 不过,我不认为缓存是你的问题。

A 5000-index document sounds trivial in size, but this largely depends on how you're constructing your index, what you're indexing/storing, how you're querying (operationally), document size, etc. 5000索引文档的大小听起来微不足道,但这在很大程度上取决于您构建索引的方式,索引/存储的内容,查询方式(操作方式),文档大小等。

Please fill in the blanks with as much information as you can about your index. 请尽可能多地填写关于索引的空白信息。

First, Lucene itself supports an in-memory version of directories: 首先,Lucene本身支持内存版本的目录:

Lucene.Net.Store.RAMDirectory

You can use it like: 您可以像以下一样使用它:

RAMDirectory idx = new RAMDirectory();

// Make an writer to create the index
IndexWriter writer =
    new IndexWriter(idx, new StandardAnalyzer(), true);

If this works for you but it is using too much ram, write a wrapper and expose it as an Interface or webservice. 如果这对您有用,但它使用了太多ram,请编写一个包装器并将其作为接口或Web服务公开。 Or, if you simply want to cache what you are querying to control when entities drop out of the cache, you can write a wrapper around Lucene that caches the most common results for you based on the keywords obviously. 或者,如果您只想缓存实体从缓存中删除时要查询的内容,您可以编写一个Lucene包装器,根据关键字显示缓存最常见的结果。

I prefer the forementioned. 我更喜欢前面提到的。 Create a webservice or service project that wraps around the Lucene store, using RAMDirectory. 使用RAMDirectory创建一个环绕Lucene存储的Web服务或服务项目。 That way you can offload the webservice onto another server with lots of ram if the index is huge - and have near-instant results. 这样,如果索引很大,你可以将web服务卸载到具有大量ram的另一台服务器上 - 并且具有接近即时的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM