简体   繁体   中英

Apache Solr : Data import handler exception - how to skip zero byte files

While going through Solr logs, I found data import error for certain documents. Here it is:

Exception while processing: file document :
null:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to read content Processing Document # 7866
        at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
        at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
        at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
        at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
        at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
        at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.tika.exception.ZeroByteFileException: InputStream must
have > 0 bytes
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:122)
        at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)

How do I ignore ZeroByteFileException? Can I define any setting in dataimport.config ?

Thanks!

There is an attribute which can be configured in your case.

You can add the ignoreTikaException=true

ignoreTikaException

If true, exceptions found during processing will be skipped. Any metadata available, however, will be indexed.

Example: ignoreTikaException=true

For more details please refer the solr documentation. Solr Documentation

onError

By default, the TikaEntityProcessor will stop processing documents if it finds one that generates an error. If you define onError to "skip" , the TikaEntityProcessor will instead skip documents that fail processing and log a message that the document was skipped.

I identified and removed corrupted files (or) zero kb files. After that issue got resolved and Solr started processing remaining files.

Regards, Ravi kumar

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM