Apache Solr : Data import handler exception - how to skip zero byte files

Question

While going through Solr logs, I found data import error for certain documents. Here it is:

Exception while processing: file document :
null:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to read content Processing Document # 7866
        at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
        at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
        at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
        at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
        at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
        at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.tika.exception.ZeroByteFileException: InputStream must
have > 0 bytes
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:122)
        at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)

How do I ignore ZeroByteFileException? Can I define any setting in dataimport.config ?

Thanks!

Answer 1

There is an attribute which can be configured in your case.

You can add the ignoreTikaException=true

ignoreTikaException

If true, exceptions found during processing will be skipped. Any metadata available, however, will be indexed.

Example: ignoreTikaException=true

For more details please refer the solr documentation. Solr Documentation

onError

By default, the TikaEntityProcessor will stop processing documents if it finds one that generates an error. If you define onError to "skip" , the TikaEntityProcessor will instead skip documents that fail processing and log a message that the document was skipped.

Answer 2

I identified and removed corrupted files (or) zero kb files. After that issue got resolved and Solr started processing remaining files.

Regards, Ravi kumar

Apache Solr : Data import handler exception - how to skip zero byte files

Question

2 answers

solution1
0 2020-04-23 09:25:34

solution2
0 2020-04-24 15:25:51

Apache Solr : Data import handler exception - how to skip zero byte files

Question

2 answers

solution1 0 2020-04-23 09:25:34

solution2 0 2020-04-24 15:25:51

solution1
0 2020-04-23 09:25:34

solution2
0 2020-04-24 15:25:51