简体   繁体   English

Apache Lucene的实现

[英]Implementation of Apache Lucene

I am reading the source code of Apache Lucene for the past few weeks and trying to figure out the method or the class that is primarily responsible for the writing of the postings lists / index on the disk. 我正在阅读过去几周的Apache Lucene的源代码,并试图找出主要负责在磁盘上编写发布列表/索引的方法或类。 I have read much about the indexing and tried to find the point where some method is being called in the process to write the index on the disk but have failed. 我已经阅读了很多有关索引的文章,并试图找到在将索引写到磁盘上的过程中正在调用某种方法的地方,但是失败了。 I know that indices or postings list are written periodically when some internal buffer is full. 我知道某些内部缓冲区已满时会定期写入索引或发布列表。 If anyone has already read the code or know where is it all done, please tell. 如果有人已经阅读了该代码或知道该怎么做,请告诉。 Thanks 谢谢

I don't know the answer, however I am very curious about this as well. 我不知道答案,但是我对此也很好奇。 After not searching too long I came across this on the indexWriter page. 搜索时间过长之后,我在indexWriter页面上发现了这一点。

public final void commit() throws IOException Commits all pending changes (added & deleted documents, segment merges, added indexes, etc.) to the index, and syncs all referenced index files, such that a reader will see the changes and the index updates will survive an OS or machine crash or power loss. public final void commit()引发IOException将所有待处理的更改(添加和删除的文档,段合并,添加的索引等)提交到索引,并同步所有引用的索引文件,以便读者可以看到更改和索引更新将在操作系统或机器崩溃或断电的情况下幸免。 Note that this does not wait for any running background merges to finish. 请注意,这并不等待任何正在运行的后台合并完成。 This may be a costly operation, so you should test the cost in your application and do it only when really necessary. 这可能是一项昂贵的操作,因此您应该在应用程序中测试成本,并仅在确实必要时才进行测试。

Note that this operation calls Directory.sync on the index files. 请注意,此操作在索引文件上调用Directory.sync。 That call should not return until the file contents & metadata are on stable storage. 在文件内容和元数据稳定存储之前,该调用不应返回。 For FSDirectory, this calls the OS's fsync. 对于FSDirectory,这将调用操作系统的fsync。 But, beware: some hardware devices may in fact cache writes even during fsync, and return before the bits are actually on stable storage, to give the appearance of faster performance. 但是要当心:某些硬件设备实际上甚至在fsync期间都可能缓存写操作,并在这些位实际存储在稳定存储之前返回,以提供更快的性能。 If you have such a device, and it does not have a battery backup (for example) then on power loss it may still lose data. 如果您有这样的设备,并且没有备用电池(例如),则在断电时,它仍可能会丢失数据。 Lucene cannot guarantee consistency on such devices. Lucene无法保证此类设备的一致性。

NOTE: if this method hits an OutOfMemoryError you should immediately close the writer. 注意:如果此方法遇到OutOfMemoryError,则应立即关闭编写器。 See above for details. 有关详情,请参见上文。

Specified by: commit in interface TwoPhaseCommit Throws: IOException See Also: prepareCommit(), commit(Map) 指定者:接口TwoPhaseCommit中的commit抛出:IOException另请参见:prepareCommit(),commit(Map)

IndexWriter IndexWriter

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM