[英]SOLR RuntimeException during indexing: how to write document id to log?
We are indexing millions of documents. 我们正在索引数百万个文档。 We use Solr 3.1 and Jetty.
我们使用Solr 3.1和Jetty。 I enabled logging in Jetty as described here: http://wiki.apache.org/solr/LoggingInDefaultJettySetup
我启用了Jetty的登录功能,如下所述: http : //wiki.apache.org/solr/LoggingInDefaultJettySetup
For some fulltexts we get exceptions and therefore logs like this one: 对于某些全文,我们会得到例外,因此日志如下:
<record>
<date>2012-09-04T15:55:16</date>
<millis>1346766916578</millis>
<sequence>0</sequence>
<logger>org.apache.solr.core.SolrCore</logger>
<level>SEVERE</level>
<class>org.apache.solr.common.SolrException</class>
<method>log</method>
<thread>10</thread>
<message>java.lang.RuntimeException: [was class java.io.CharConversionException] Invalid UTF-8 character 0xd835(a surrogate character) at c
har #1144, byte #127)
at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:287)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:146)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
</message>
</record>
It would be great to also log the sent document id. 最好也记录发送的文档ID。 How can we do this?
我们应该怎么做?
Thank you! 谢谢!
Are you asking how to get Jetty to log the ID? 您是否在问如何让Jetty记录ID? It is unlikely that you will be able to log it through Jetty as the XML in the request can't be parsed in order to get to the ID value.
您不可能通过Jetty记录它,因为无法解析请求中的XML以获得ID值。 Notice the stack trace informs that the XMLLoader.readDoc() method never gets past line 287. Here's the code for that class (for your version): http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_1/solr/src/java/org/apache/solr/handler/XMLLoader.java?revision=1086927&view=markup
请注意,堆栈跟踪通知XMLLoader.readDoc()方法永远不会超出第287行。这是该类的代码(针对您的版本): http : //svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_1 /solr/src/java/org/apache/solr/handler/XMLLoader.java?revision=1086927&view=markup
The relevant section: 相关部分:
SolrInputDocument readDoc(XMLStreamReader parser) throws XMLStreamException {
264 SolrInputDocument doc = new SolrInputDocument();
265
266 String attrName = "";
267 for (int i = 0; i < parser.getAttributeCount(); i++) {
268 attrName = parser.getAttributeLocalName(i);
269 if ("boost".equals(attrName)) {
270 doc.setDocumentBoost(Float.parseFloat(parser.getAttributeValue(i)));
271 } else {
272 XmlUpdateRequestHandler.log.warn("Unknown attribute doc/@" + attrName);
273 }
274 }
275
276 StringBuilder text = new StringBuilder();
277 String name = null;
278 float boost = 1.0f;
279 boolean isNull = false;
280 while (true) {
281 int event = parser.next();
282 switch (event) {
283 // Add everything to the text
284 case XMLStreamConstants.SPACE:
285 case XMLStreamConstants.CDATA:
286 case XMLStreamConstants.CHARACTERS:
287 text.append(parser.getText());
The Solr document has not yet been built, so there's no real way to get to the records ID field. Solr文档尚未构建,因此没有真正的方法可以访问记录ID字段。
The workaround is to have your indexer script check the status codes of the Solr responses and write the record ID to a log if status is not 0 (success). 解决方法是让索引器脚本检查Solr响应的状态码,如果状态不为0(成功),则将记录ID写入日志。 Likewise if you are using Java or PHP or a language that can trap exceptions you can catch those too and write out to log.
同样,如果您使用Java或PHP或可以捕获异常的语言,则也可以捕获异常并写出日志。
Hope this helps, and good luck. 希望这会有所帮助,并祝你好运。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.