[英]Apache Solr : Data import handler exception - how to skip zero byte files
While going through Solr logs, I found data import error for certain documents.在浏览 Solr 日志时,我发现某些文档的数据导入错误。 Here it is:
这里是:
Exception while processing: file document :
null:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to read content Processing Document # 7866
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.tika.exception.ZeroByteFileException: InputStream must
have > 0 bytes
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:122)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)
How do I ignore ZeroByteFileException?如何忽略 ZeroByteFileException? Can I define any setting in
dataimport.config
?我可以在
dataimport.config
中定义任何设置吗?
Thanks!谢谢!
There is an attribute which can be configured in your case.有一个属性可以根据您的情况进行配置。
You can add the ignoreTikaException=true
您可以添加
ignoreTikaException=true
ignoreTikaException
If true, exceptions found during processing will be skipped.如果为 true,将跳过处理过程中发现的异常。 Any metadata available, however, will be indexed.
但是,任何可用的元数据都将被编入索引。
Example: ignoreTikaException=true
For more details please refer the solr documentation.有关详细信息,请参阅 solr 文档。 Solr Documentation
Solr 文档
onError
By default, the TikaEntityProcessor will stop processing documents if it finds one that generates an error.默认情况下,如果 TikaEntityProcessor 发现一个生成错误的文档,它将停止处理文档。 If you define
onError
to "skip"
, the TikaEntityProcessor will instead skip documents that fail processing and log a message that the document was skipped.如果您将
onError
定义为"skip"
,则 TikaEntityProcessor 将跳过处理失败的文档并记录一条文档被跳过的消息。
I identified and removed corrupted files (or) zero kb files.我识别并删除了损坏的文件(或)零 kb 文件。 After that issue got resolved and Solr started processing remaining files.
在该问题得到解决并且 Solr 开始处理剩余文件之后。
Regards, Ravi kumar问候, 拉维库马尔
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.