简体   繁体   English

Lucene在同时搜索和索引时会阻塞

[英]Lucene blocks while searching and indexing at the same time

I have a java application that uses Lucene (latest version, 5.2.1 as of this writing) in "near realtime" mode; 我有一个Java应用程序在“近实时”模式下使用Lucene(最新版本,撰写本文时为5.2.1)。 it has one network connection to receive requests to index documents, and another connection for search requests. 它具有一个网络连接以接收对索引文档的请求,并具有另一个用于搜索请求的连接。

I'm testing with a corpus of pretty large documents (several megabytes of plain text) and several versions of each field with different analyzers. 我正在测试相当大的文档(几兆字节的纯文本)和每个字段具有不同分析器的多个版本的测试。 One of them being a phonetic analyzer with the Beider-Morse filter, the indexing of some documents can take quite a bit of time (over a minute in some cases). 其中之一是带有Beider-Morse过滤器的语音分析器,对某些文档的索引可能要花费大量的时间(在某些情况下需要一分钟)。 Most of this time is spent in the call to IndexWriter.addDocument(doc); 大部分时间都用在对IndexWriter.addDocument(doc)的调用中;

My problem is that while a document is being indexed, searches get blocked, and they aren't processed until the indexing operation finishes. 我的问题是,在对文档建立索引时,搜索将被阻止,并且直到索引操作完成后才进行处理。 Having the search blocked for more than a couple seconds is unacceptable. 阻止搜索超过几秒钟是不可接受的。

Before each search, I do the following: 在每次搜索之前,我需要执行以下操作:

DirectoryReader newReader = DirectoryReader.openIfChanged(reader, writer, false);

if (newReader != null)
{
    reader = newReader;
    searcher = new IndexSearcher(reader);
}

I guess this is what causes the problem. 我猜这是导致问题的原因。 However, is the only way to get the most recent changes when I do a search. 但是,这是我进行搜索时获取最新更改的唯一方法。 I'd like to maintain this behaviour in general, but if the search would block I wouldn't mind to use a slightly old version of the index. 我希望总体上保持这种行为,但是如果搜索被阻止,我将不介意使用索引的旧版本。

Is there any way to fix this? 有没有什么办法解决这一问题?

Among other options, consider having always an IndexWriter open and perform "commits" to it as you need. 在其他选项中,请考虑始终打开IndexWriter并根据需要对其执行“提交”。

Then you should ask for index readers to it (not to the directory) and refresh them as needed. 然后,您应该请求它的索引读取器(而不是目录),并根据需要刷新它们。 Or simply use a SearcherManager that will not only refresh searchers for you, but also will maintain a pool of readers and will manage references to them, in order to avoid reopening if the index contents haven't change. 或者简单地使用SearcherManager ,它不仅会为您刷新搜索者,而且还将维护一个阅读器池并管理对它们的引用,从而避免在索引内容未更改的情况下重新打开。

See more here . 在这里查看更多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM