简体   繁体   English

我应该保持Lucene IndexWriter为整个索引打开还是在添加每个文档后关闭?

[英]Should I keep Lucene IndexWriter open for entire indexing or close after each document addition?

Is closing Lucene IndexWriter after each document addition slow down my indexing process? 在添加每个文档之后关闭Lucene IndexWriter减慢我的索引编制过程吗?

I imagine, closing and opening index writer will slow down my indexing process or is it not true for Lucene? 我想,关闭和打开索引编写器会减慢我的索引编制过程,或者对Lucene而言不是这样吗?

Basically, I have a Lucene Indexer Step in a Spring Batch Job and I am creating indices in ItemProcessor . 基本上,我在Spring Batch作业中有一个Lucene Indexer步骤,并且正在ItemProcessor创建索引。 Indexer Step is a partitioned step and I create IndexWriter when ItemProcessor is created and keep it open till step completion. Indexer Step是一个分区步骤,创建ItemProcessor时创建IndexWriter ,并将其保持打开状态直到步骤完成。

@Bean
    @StepScope
    public ItemProcessor<InputVO,OutputVO> luceneIndexProcessor(@Value("#{stepExecutionContext[field1]}") String str) throws Exception{
        boolean exists = IndexUtils.checkIndexDir(str);
        String indexDir = IndexUtils.createAndGetIndexPath(str, exists);
        IndexWriterUtils indexWriterUtils = new IndexWriterUtils(indexDir, exists);
        IndexWriter indexWriter = indexWriterUtils.createIndexWriter();
        return new LuceneIndexProcessor(indexWriter);
    }

Is there a way to close this IndexWriter after step completion? 有没有办法在完成步骤后关闭此IndexWriter

Also, I was encountering issues because I do search also in this step to find duplicate documents but I fixed that by adding writer.commit(); 另外,我遇​​到了问题,因为在此步骤中我也进行搜索以查找重复的文档,但是我通过添加writer.commit();解决了该问题writer.commit(); before opening reader and searching. 在打开阅读器并进行搜索之前。

Please suggest if I need to close and open after each document addition or can keep it open all along? 请建议在添加每个文档后是否需要关闭并打开,还是可以一直打开? and also how to close in StepExecutionListenerSupport 's afterStep ? 以及如何在StepExecutionListenerSupportafterStep

Initially, I was closing and reopening for each document but indexing process was very slow so I thought it might be the reason. 最初,我正在关闭并重新打开每个文档,但是索引编制过程非常缓慢,因此我认为这可能是原因。

Since in development, index directory is of small size so we may not see much gain but for large index directory sizes, we need not to do unnecessary creation and closing for IndexWriter as well as IndexReader . 由于在开发中,索引目录的大小很小,因此我们可能看不到太大的收获,但是对于较大的索引目录大小,我们不需要为IndexWriterIndexReader进行不必要的创建和关闭操作。

In Spring Batch, I accomplished it with these steps 在Spring Batch中,我完成了以下步骤

1.As pointed in my other question , first we need to address problem of serialization to put object in ExecutionContext . 1.正如其他问题中指出的那样,首先我们需要解决序列化问题,以将对象放入ExecutionContext

2.We create and put instance of composite serializable object in ExecutionContext in partitioner. 2.我们创建复合可序列化对象的实例,并将其放入分区器的ExecutionContext中。

3.Pass value from ExecutionContext to your step reader, processor or writer in configuration, 3.将值从ExecutionContext传递给配置中的步骤读取器,处理器或写入器,

    @Bean
    @StepScope
    public ItemProcessor<InputVO,OutputVO> luceneIndexProcessor(@Value("#{stepExecutionContext[field1]}") String field1,@Value("#{stepExecutionContext[luceneObjects]}") SerializableLuceneObjects luceneObjects) throws Exception{
        LuceneIndexProcessor indexProcessor =new LuceneIndexProcessor(luceneObjects);
        return indexProcessor;
    }

4.Use this instance passed in processor wherever you need and use getter method to get index reader or writer, public IndexWriter getLuceneIndexWriter() {return luceneIndexWriter;} 4.在需要的任何地方使用此实例传递给处理器,并使用getter方法获取索引读取器或写入器, public IndexWriter getLuceneIndexWriter() {return luceneIndexWriter;}

5.Finally in StepExecutionListenerSupport 's afterStep(StepExecution stepExecution) close this writer or reader by getting it from ExecutionContext . 5,最后在StepExecutionListenerSupportafterStep(StepExecution stepExecution)通过从ExecutionContext获取来关闭此afterStep(StepExecution stepExecution)器或读取器。

ExecutionContext executionContext = stepExecution.getExecutionContext();
SerializableLuceneObjects slObjects = (SerializableLuceneObjects)executionContext.get("luceneObjects");
IndexWriter luceneIndexWriter = slObjects.getLuceneIndexWriter();
IndexReader luceneIndexReader = slObjects.getLuceneIndexReader();
if(luceneIndexWriter !=null ) luceneIndexWriter.close();
if(luceneIndexReader != null) luceneIndexReader.close();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM