简体   繁体   中英

How do I remove logically deleted documents from a Solr index?

I am implementing Solr for a free text search for a project where the records available to be searched will need to be added and deleted on a large scale every day.

Because of the scale I need to make sure that the size of the index is appropriate.

On my test installation of Solr, I index a set of 10 documents. Then I make a change in one of the document and want to replace the document with the same ID in the index. This works correctly and behaves as expected when I search.

I am using this code to update the document:

getSolrServer().deleteById(document.getIndexId());
getSolrServer().add(document.getSolrInputDocument());
getSolrServer().commit();

What I noticed though is that when I look at the stats page for the Solr server that the figures are not what I expect.

After the initial index, numDocs and maxDocs both equal 10 as expected. When I update the document however, numDocs is still equal to 10 (expected) but maxDocs equals 11 (unexpected).

When reading the documentation I see that

maxDoc may be larger as the maxDoc count includes logically deleted documents that have not yet been removed from the index.

So the question is, how do I remove logically deleted documents from the index?

If these documents still exist in the index do I run the risk of performance penalties when this is run with a very large volume of documents?

Thanks :)

You have to optimize your index .

Note that an optimize is expansive, you probably should not do it more than daily.

Here is some more info on optimize:

http://www.lucidimagination.com/search/document/CDRG_ch06_6.3.1.3

http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM