[英]Invalid UTF8 encoding on processing xml file
我有一个处理XML文件以读取某些值的Java代码。 我收到一个错误: 无效的UTF8编码 ,我试图将文件内容复制到NotePad ++上的另一个文件中,该过程运行良好,但如果我仅将文件另存为另一个名称,则会出现相同的错误。 抱歉,我不能将XML文件放在这里,因为它太大了,我将只放置header和trailer。 感谢您提供任何帮助来解决此错误。 我的Java代码来处理xml文件:
XPathFactory f=XPathFactory.newInstance();
XPath x=f.newXPath();
InputSource source=new InputSource(new FileInputStream("C:\\Users\\cc\\eclipse-workspace\\data\\file.xml") );
InputSource source2=new InputSource(new FileInputStream("C:\\Users\\cc\\eclipse-workspace\\data\\file.xml") );
XPathExpression trlr=x.compile("pers/trailer/text()");
XPathExpression hdr=x.compile("pers/header/CD/text()");
String s=trlr.evaluate(source);
String s2=hdr.evaluate(source2);
System.out.println("header :"+s+" trailer"+s2);
pers是xml文件中的根标记:
XML文件如下所示:
<?xml version = '1.0' encoding = 'UTF-8'?>
<pers>
<header>555</header>
.
.
.
.
<trailer>666</trailer>
</pers>
堆栈跟踪 :
java.io.UTFDataFormatException: Invalid UTF8 encoding.
at oracle.xml.parser.v2.XMLUTF8Reader.checkUTF8Byte(XMLUTF8Reader.java:229)
at oracle.xml.parser.v2.XMLUTF8Reader.readUTF8Char(XMLUTF8Reader.java:274)
at oracle.xml.parser.v2.XMLUTF8Reader.fillBuffer(XMLUTF8Reader.java:189)
at oracle.xml.parser.v2.XMLByteReader.saveBuffer(XMLByteReader.java:452)
at oracle.xml.parser.v2.XMLReader.fillBuffer(XMLReader.java:2776)
at oracle.xml.parser.v2.XMLReader.scanNameChars(XMLReader.java:1352)
at oracle.xml.parser.v2.XMLReader.readQName(XMLReader.java:2149)
at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingParser.java:1579)
at oracle.xml.parser.v2.NonValidatingParser.parseRootElement(NonValidatingParser.java:448)
at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingParser.java:394)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:236)
at oracle.xml.jaxp.JXDocumentBuilder.parse(JXDocumentBuilder.java:175)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:302)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:332)
at tasklets.HeaderFooter.execute(HeaderFooter.java:39)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:406)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:330)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:272)
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:81)
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:374)
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215)
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:144)
at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:257)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:200)
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:148)
at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:64)
at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:67)
at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:169)
at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:144)
at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:134)
at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:306)
at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:135)
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:128)
at main.IncomeResponseFile.main(IncomeResponseFile.java:39)
--------------- linked to ------------------
javax.xml.xpath.XPathExpressionException: java.io.UTFDataFormatException: Invalid UTF8 encoding.
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:305)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:332)
at tasklets.HeaderFooter.execute(HeaderFooter.java:39)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:406)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:330)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:272)
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:81)
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:374)
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215)
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:144)
at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:257)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:200)
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:148)
at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:64)
at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:67)
at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:169)
at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:144)
at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:134)
at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:306)
at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:135)
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:128)
at main.IncomeResponseFile.main(IncomeResponseFile.java:39)
Caused by: java.io.UTFDataFormatException: Invalid UTF8 encoding.
at oracle.xml.parser.v2.XMLUTF8Reader.checkUTF8Byte(XMLUTF8Reader.java:229)
at oracle.xml.parser.v2.XMLUTF8Reader.readUTF8Char(XMLUTF8Reader.java:274)
at oracle.xml.parser.v2.XMLUTF8Reader.fillBuffer(XMLUTF8Reader.java:189)
at oracle.xml.parser.v2.XMLByteReader.saveBuffer(XMLByteReader.java:452)
at oracle.xml.parser.v2.XMLReader.fillBuffer(XMLReader.java:2776)
at oracle.xml.parser.v2.XMLReader.scanNameChars(XMLReader.java:1352)
at oracle.xml.parser.v2.XMLReader.readQName(XMLReader.java:2149)
at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingParser.java:1579)
at oracle.xml.parser.v2.NonValidatingParser.parseRootElement(NonValidatingParser.java:448)
at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingParser.java:394)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:236)
at oracle.xml.jaxp.JXDocumentBuilder.parse(JXDocumentBuilder.java:175)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:302)
... 23 more
使用Java编写脚本来检测有问题的行。
AtomicInteger lineno = new AtomicInteger();
Path path = Paths.get("... .xml");
Files.lines(path, StandardCharsets.ISO_8859_1)
.forEach(line -> {
int no = lineno.incrementAndGet();
byte[] b = line.getBytes(StandardCharsets.ISO_8859_1);
try {
new String(b, StandardCharsets.UTF_8);
} catch (Exception e) {
System.out.printf("[%d] %s%n%s%n", no, line, e.getMessage());
//throw new IllegalStateException(e);
}
});
可能会认为这是数据错误。
通常,它也可能是错误的缓冲读取:当在缓冲区边界上中断了多字节序列时; 那么可能会出现两个错误的半序列。 在标准库代码中不太可能。
为确保new String(...)
的代码不会被JVM丢弃,可能是:
int sowhat = Files.lines(path, StandardCharsets.ISO_8859_1)
.mapToInt(line -> {
int no = lineno.incrementAndGet();
byte[] b = line.getBytes(StandardCharsets.ISO_8859_1);
try {
return new String(b, StandardCharsets.UTF_8).length();
} catch (Exception e) {
System.out.printf("[%d] %s%n%s%n", no, line, e.getMessage());
throw new IllegalStateException(e); // Must throw or return int
}
}).sum();
System.out.println("Ignore this: " + sowhat);
可能会认为这是数据错误。
通常,它也可能是错误的缓冲读取:当在缓冲区边界上中断了多字节序列时; 那么可能会出现两个错误的半序列。 在标准库代码中不太可能。
为确保new String(...)
的代码不会被JVM丢弃,可能是:
int sowhat = Files.lines(path, StandardCharsets.ISO_8859_1)
.mapToInt(line -> {
int no = lineno.incrementAndGet();
byte[] b = line.getBytes(StandardCharsets.ISO_8859_1);
try {
return new String(b, StandardCharsets.UTF_8).length();
} catch (Exception e) {
System.out.printf("[%d] %s%n%s%n", no, line, e.getMessage());
throw new IllegalStateException(e); // Must throw or return int
}
}).sum();
非法的XML字符(在1.0版中)? [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86的#x9F]
int sowhat = Files.lines(path, StandardCharsets.ISO_8859_1)
.mapToInt(line -> {
int no = lineno.incrementAndGet();
byte[] b = line.getBytes(StandardCharsets.ISO_8859_1);
if (!legal(b)) {
System.out.printf("[%d] %s%n%s%n", no, line, e.getMessage());
throw new IllegalStateException(e); // Must throw or return int
}
}).sum();
static boolean legal(byte[] bytes) {
String s = new String(bytes, StandardCharsets.UTF_8);
for (char ch : s.toCharArray()) {
int x = ch;
if ((0 <= x && x <= 8) // ASCII control chars
|| (0xB <= x && x <= 0xC)
|| (0xE <= x && x <= 0x1F)
|| (0x7f <= x && x <= 0x84) // DEL + Unicode control chars
|| (0x86 <= x && x <= 0x9F)) {
return false;
}
}
return true;
}
如果这不起作用,我已经为您保留了足够长的时间。 分割文件并验证零件。
我使用以下代码将文件转换为UTF-8格式:
File source = new File("C:\\Users\\cc\\eclipse-workspace\\data\\file.xml");
String srcEncoding="ISO-8859-1";
File target = new File("C:\\Users\\cc\\eclipse-workspace\\data\\file2.xml");
String tgtEncoding="UTF-8";
try (
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(source), srcEncoding));
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding)); ) {
char[] buffer = new char[16384];
int read;
while ((read = br.read(buffer)) != -1)
bw.write(buffer, 0, read);
}
之后,我使用了file2。 感谢: java:如何将文件转换为utf8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.