简体   繁体   English

Lucene IndexWriter 线程安全

[英]Lucene IndexWriter thread safety

Lucene encourages the reuse of an IndexWriter from multiple threads. Lucene 鼓励从多个线程重用 IndexWriter。

Given that two threads might have a reference to the IndexWriter, if thread A calls close on the writer, thread B would be left with a useless writer.鉴于两个线程可能有对 IndexWriter 的引用,如果线程 A 在编写器上调用 close,则线程 B 将留下一个无用的编写器。 But to my understanding lucene somehow knows that another thread uses the same writer and defers its closure.但据我了解,lucene 不知何故知道另一个线程使用相同的编写器并推迟其关闭。

Is this indeed the case?真的是这样吗? How does lucene track that another thread uses the writer? lucene 如何跟踪另一个线程使用编写器?

EDIT Judging from the answers it is not correct to close the IndexWriter.编辑从答案来看,关闭 IndexWriter 是不正确的。 But this poses a new issue: If one keeps an IndexWriter open, essentially blocks access to this index from another JVM (eg in case of a cluster, or a shared index between many applications).但这带来了一个新问题:如果一个 IndexWriter 保持打开状态,基本上会阻止另一个 JVM 对该索引的访问(例如,在集群的情况下,或许多应用程序之间的共享索引)。

If one thread closes IndexWriter while other threads are still using it, you'll get unpredictable results.如果一个线程关闭 IndexWriter 而其他线程仍在使用它,您将得到不可预知的结果。 We try to have the other threads hit AlreadyClosedException, but this is just best effort (not guaranteed).我们尝试让其他线程命中 AlreadyClosedException,但这只是尽力而为(不能保证)。 EG you can easily hit NullPointerException too.例如,您也可以轻松地点击 NullPointerException。 So you must synchronize externally to make sure you don't do this.所以你必须在外部同步以确保你不这样做。

Recently (only in Lucene's trunk right now, to be 4.0 eventually) a big thread bottleneck inside IndexWriter was fixed, allowing segment flushes to run concurrently (previously they were single threaded).最近(现在只在 Lucene 的主干中,最终是 4.0)IndexWriter 内部的一个大线程瓶颈得到了修复,允许段刷新同时运行(以前它们是单线程的)。 On apps running with many indexing threads on concurrent hardware this can give a big boost in indexing throughput.在并发硬件上运行许多索引线程的应用程序上,这可以大大提高索引吞吐量。 See http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html for details.有关详细信息,请参阅http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html

The threadafety and reuse of IndexWriter means you can have multiple threads all using that instance to create/update/delete documents. IndexWriter 的线程安全性和重用性意味着您可以有多个线程都使用该实例来创建/更新/删除文档。 If you close indexwriter in one thread though, it will indeed muck everyone else up.但是,如果您在一个线程中关闭索引器,则确实会使其他所有人都陷入困境。

Are you referring to the waitForMerges flag on the IndexWriter.close() method?您是指IndexWriter.close()方法上的waitForMerges标志吗?

Closes the index with or without waiting for currently running merges to finish.在等待或不等待当前正在运行的合并完成的情况下关闭索引。 This is only meaningful when using a MergeScheduler that runs merges in background threads.这仅在使用在后台线程中运行合并的 MergeScheduler 时才有意义。

Lucene generally uses background threads to consolidate fragmented writes that have occurred across multiple threads - the writes themselves happen immediately, but the consolidation happens asynchronously. Lucene 通常使用后台线程来整合跨多个线程发生的碎片写入 - 写入本身会立即发生,但整合是异步发生的。

When closing the writer, you should allow it to finish the consolidation process, otherwise:关闭 writer 时,应允许其完成合并过程,否则:

it is dangerous to always call close(false), especially when IndexWriter is not open for very long, because this can result in "merge starvation" whereby long merges will never have a chance to finish.总是调用 close(false) 是很危险的,尤其是当 IndexWriter 没有打开很长时间时,因为这可能导致“合并饥饿”,从而长时间的合并将永远没有机会完成。 This will cause too many segments in your index over time.随着时间的推移,这将导致索引中的段过多。

So the writer doesn't "know" about your threads, in the sense that you meant.因此,就您的意思而言,作者并不“了解”您的线程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM