简体   繁体   English

Solr / Lucene清除索引中已删除文档的过程是什么?

[英]What is the Solr/Lucene process to purge deleted documents in index?

What is the process to purge index when you've got some deleted documents (after a delete by query) in index ? 当索引中有一些已删除的文档(在通过查询删除之后)时,清除索引的过程是什么?

I'm asking this question because I'm working on a project based on solr and I've noticed a strange behavior and I would like to have some informations about it. 我问这个问题是因为我正在一个基于solr的项目中,我已经注意到一个奇怪的行为,我想了解一些有关它的信息。

My system got those features : 我的系统具有以下功能:

  • My documents are indexed continuously (1000docs per second) 我的文档连续索引(每秒1000docs)

  • A purge is done every couple of second with this query : 此查询每隔几秒钟执行一次清除:

     <delete><query>timestamp_utc:[ * TO NOW-10MINUTES ]</query></delete> 

So I got 600000 documents everytime visible in my index : 10 Minutes * 60 = 600 seconds and speed = 1000docs/s so 600 * 1000 = 600000 因此,我每次都能在索引中看到600000个文档:10分钟* 60 = 600秒,速度= 1000docs / s,所以600 * 1000 = 600000

But the size of my index increase with the time. 但是我的索引大小随时间增加。 And I know that when you do a delete by query the documents are affected by a "delete" label or something like that in the index. 而且我知道,当您通过查询执行删除操作时,文档会受到“删除”标签或索引中类似内容的影响。

I've seen and tried the attribute "expungeDeletes=true", but I didn't notice a considerable change on my index size. 我已经看过并尝试使用属性“ expungeDeletes = true”,但是我没有注意到索引大小有很大变化。

Any informations about the index purge process would be appreciated. 有关索引清除过程的任何信息将不胜感激。

Thanks. 谢谢。

Edit 编辑

I know that an optimize can to do this job but it's a long operation and I want to avoid that. 我知道优化程序可以完成这项工作,但是这是一个漫长的过程,我想避免这样做。

您可以每10分钟创建一个新的集合/核心,切换到该集合/核心(加上前一个),然后删除最早的集合/核心(超过10分钟)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM