Apache Solr：数据导入处理程序异常 - 如何跳过零字节文件

Question

While going through Solr logs, I found data import error for certain documents.在浏览 Solr 日志时，我发现某些文档的数据导入错误。 Here it is:这里是：

Exception while processing: file document :
null:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to read content Processing Document # 7866
        at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
        at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
        at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
        at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
        at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
        at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.tika.exception.ZeroByteFileException: InputStream must
have > 0 bytes
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:122)
        at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)

How do I ignore ZeroByteFileException?如何忽略 ZeroByteFileException？ Can I define any setting in dataimport.config ?我可以在dataimport.config中定义任何设置吗？

Thanks!谢谢！

Answer 1

There is an attribute which can be configured in your case.有一个属性可以根据您的情况进行配置。

You can add the ignoreTikaException=true您可以添加ignoreTikaException=true

ignoreTikaException

If true, exceptions found during processing will be skipped.如果为 true，将跳过处理过程中发现的异常。 Any metadata available, however, will be indexed.但是，任何可用的元数据都将被编入索引。

Example: ignoreTikaException=true

For more details please refer the solr documentation.有关详细信息，请参阅 solr 文档。 Solr Documentation Solr 文档

onError

By default, the TikaEntityProcessor will stop processing documents if it finds one that generates an error.默认情况下，如果 TikaEntityProcessor 发现一个生成错误的文档，它将停止处理文档。 If you define onError to "skip" , the TikaEntityProcessor will instead skip documents that fail processing and log a message that the document was skipped.如果您将onError定义为"skip" ，则 TikaEntityProcessor 将跳过处理失败的文档并记录一条文档被跳过的消息。

Answer 2

I identified and removed corrupted files (or) zero kb files.我识别并删除了损坏的文件（或）零 kb 文件。 After that issue got resolved and Solr started processing remaining files.在该问题得到解决并且 Solr 开始处理剩余文件之后。

Regards, Ravi kumar问候，拉维库马尔

Apache Solr：数据导入处理程序异常 - 如何跳过零字节文件

问题描述

2 个解决方案

解决方案1
0 2020-04-23 09:25:34

解决方案2
0 2020-04-24 15:25:51

Apache Solr：数据导入处理程序异常 - 如何跳过零字节文件

问题描述

2 个解决方案

解决方案1 0 2020-04-23 09:25:34

解决方案2 0 2020-04-24 15:25:51

解决方案1
0 2020-04-23 09:25:34

解决方案2
0 2020-04-24 15:25:51