简体   繁体   中英

Solr ERROR #500 IOE

Anyone can tell me what could be to cause this problem? I tried to post with post.jar a file xml; i copt below the server log

118208 [qtp760665089-18] ERROR org.apache.solr.servlet.SolrDispatchFilter  û nul
l:java.lang.RuntimeException: [was class java.io.CharConversionException] Invali
d UTF-8 middle byte 0x6c (at char #139212, byte #136949)
        at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.j
ava:18)at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
        at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.j
ava:3657)at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:397)
at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java
:246)

[...]

Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0x6c (at c
har #139212, byte #136949)
        at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:313)
        at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:204)
        at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
        at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.
java:57)...

You have 1 or more illegal (eg not UTF-8) characters in your document:

http://www.coderanch.com/t/433718/XML/Invalid-UTF-middle-byte-error

I'd take a close look at the document and consider stripping/filtering for only UTF-8

This previous stackoverflow answer has a couple of code snippets in Perl and Java for filtering out non UTF-8 characters:

How to remove bad characters that are not suitable for utf8 encoding in MySQL?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM