简体   繁体   English

Lucene:NRT 性能非常慢

[英]Lucene : very slow NRT performance

I'm testing performance of NRT (near real time) searching in my application and i'm getting really odd results.我正在我的应用程序中测试 NRT(近实时)搜索的性能,但我得到了非常奇怪的结果。 I'm using this query as a sample to get all elements (the test set is very small so getting all elements shouldn't be an issue, this is 250 files getting indexed only, text files only, total index size= 1.5MB, the real set that i need to support is hundred of thouthands of files for a multi GB index)我使用这个查询作为样本来获取所有元素(测试集非常小,所以获取所有元素应该不是问题,这只是 250 个文件被索引,只有文本文件,总索引大小 = 1.5MB,我需要支持的真正集合是多 GB 索引的成百上千个文件)

Here's the sample query that is worrying me :这是让我担心的示例查询:

    public static List<IndexableItem> GetAllElements()
    {
        var qp = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "ProviderPath", analyzer);
        qp.AllowLeadingWildcard = true;
        var query = qp.Parse("*");

        var searcher = new Lucene.Net.Search.IndexSearcher(reader);
        List<IndexableItem> docs = new List<IndexableItem>();
        searcher.Search(query, new SimpleHitCollector(docId =>
        {
            docs.Add(reader.Document(docId).ToIndexable());
        }));
        return docs;
    }

As you can see it's pretty simple.如您所见,这非常简单。 The run time of this query is around 0.1 second while indexing isn't running, but if i have indexing running at the same time it goes up to .当索引未运行时,此查询的运行时间约为 0.1 秒,但如果我同时运行索引,它会上升到 . . . . . . . 45 seconds or more! 45秒以上!

The reader variable is a property defined as such : reader 变量是一个定义如下的属性:

    public static IndexReader reader 
    {
        get 
        {
            return writer.GetReader();
        }
    }

And writer :和作家:

    static SearchIndexManager()
    {
        writer = new IndexWriter(FSDirectory.Open(@"C:\MyFolder"), analyzer, IndexWriter.MaxFieldLength.UNLIMITED);
    }

The performance issue is within lucene for sure (it's within the hitcollector, takes up to 1 second between each docs.Add line).性能问题肯定在 lucene 中(它在 hitcollector 中,每个 docs.Add 行之间最多需要 1 秒)。 The ToIndexable cannot be the issue either (it's a trivial method and is completely not dependant on anything the indexer could use (disk io etc). ToIndexable 也不是问题(它是一种微不足道的方法,完全不依赖于索引器可以使用的任何东西(磁盘 io 等)。

I'm fairly sure something is wrong there as obviously the goal of NRT isn't to get a 450X slowdown, any advice on where i should look for hints?我相当确定那里有问题,因为显然 NRT 的目标不是让速度降低 450 倍,关于我应该在哪里寻找提示的任何建议?

Some more information : I'm not calling optimise during the slowdowns, and "once in a while" i'll get a quick answer even when indexing but it seems pretty random when that happens.更多信息:我不会在减速期间调用优化,并且“偶尔”即使在编制索引时我也会得到快速答复,但发生这种情况时似乎很随机。 I do am calling commit once in a while (every 100 insertions).我偶尔会调用 commit (每 100 次插入)。

As I understand it, near-real-time search is intended for an index that has changed but changes have not yet been committed, and no further changes will occur while searching .据我了解,近实时搜索旨在针对已更改但尚未提交更改的索引,并且在搜索时不会发生进一步的更改 This is the optimal usage of NRT, I don't mean that search is not possible at the same time as indexing.这是 NRT 的最佳用法,我并不是说搜索索引不能同时进行 Searching or reading are suboptimal if they occur at the same time as indexing.如果搜索或阅读与索引同时发生,则它们是次优的。

Consider the method IndexReader.Reopen .考虑方法IndexReader.Reopen Its purpose is to obtain a fresh reader, if the index is changing or has changed since the old instance of IndexReader was obtained.它的目的是在索引发生变化或自获得IndexReader的旧实例后发生变化时,获得新的读取器。 Therefore if you continue using the old instance you might miss documents that you should find, and you are reading from a 'moving target', thus the slow performance.因此,如果您继续使用旧实例,您可能会错过您应该找到的文档,并且您正在从“移动目标”中读取数据,从而降低性能。

You wrote:你写了:

index stays in 1 file and keeps growing (this is expected behavior), but as soon as i launch a search, it splits as if a commit was getting forced索引保留在 1 个文件中并不断增长(这是预期的行为),但是一旦我启动搜索,它就会分裂,就好像强制提交一样

When you get an IndexReader from the IndexWriter it will flush any buffered changes - note this is not a commit.当您从IndexWriter获得IndexReader ,它将刷新任何缓冲的更改 - 请注意这不是提交。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM