Document and Field instance reuse in Lucene Indexing

Question

I am trying to reuse the Document and Field instances to improve the performance (I have tried this for 1 million rows in the file without reusing the instances it was taking 20 seconds) .

But when I try to do that it takes too much time and it keeps running.

Can anyone faced the same problem before?

This is the existing code before trying to reuse the instances, for each line in the file I was creating new document and fields.

FileInputStream fis;
                try {

                    fis = new FileInputStream(file);
                    String filePath= file.getPath();
                    BufferedReader br = new BufferedReader(
                            new InputStreamReader(fis, StandardCharsets.UTF_8));

                    String line = null;
                    while ((line = br.readLine()) != null) {

                        String[] lineTokens = line.split("\\|");
                        Document doc = new Document();
                        Field field1 = new TextField("field1", field1Value, Field.Store.YES);
                        doc.add(field1);
                        Field field2 = new StringField("field2", field2Value,Field.Store.YES);
                        doc.add(field2);
                        writer.addDocument(doc);
                    }
                    br.close();
                } catch (FileNotFoundException fnfe) {

                }

After changing

FileInputStream fis;
                try {

                    fis = new FileInputStream(file);
                    String filePath= file.getPath();
                    BufferedReader br = new BufferedReader(
                            new InputStreamReader(fis, StandardCharsets.UTF_8));

                    String line = null;
                    Document doc = new Document();
                    Field field1 = new TextField("field1", field1Value, Field.Store.YES);
                    Field field2 = new StringField("field2", field2Value,Field.Store.YES);
                    while ((line = br.readLine()) != null) {

                        //String[] lineTokens = line.split("\\|");

                        field1.setStringValue("field1Value");
                        doc.add(field1);

                        field2.setStringValue("field2Value");
                        doc.add(field2);
                        writer.addDocument(doc);
                    }
                    br.close();
                } catch (FileNotFoundException fnfe) {

                }

Answer 1

You don't need to add the fields to the doc on every iteration. After you have added the fields once, all you need to do is change the field values, and then write the altered document to the index, like this:

Document doc = new Document();
Field field1 = new TextField("field1", field1Value, Field.Store.YES);
doc.add(field1);
Field field2 = new StringField("field2", field2Value,Field.Store.YES);
doc.add(field2);
while ((line = br.readLine()) != null) {
    field1.setStringValue("field1Value");
    field2.setStringValue("field2Value");

    writer.addDocument(doc);
}

Document and Field instance reuse in Lucene Indexing

Question

1 answers

solution1
6 ACCPTED 2015-01-08 18:55:16

Document and Field instance reuse in Lucene Indexing

Question

1 answers

solution1 6 ACCPTED 2015-01-08 18:55:16

solution1
6 ACCPTED 2015-01-08 18:55:16