UTF-16LE編碼和xerces2 Java

Question

我經歷了幾篇文章，例如FileReader將文件作為字符流讀取，如果將文檔作為字符流處理，則可以將其視為空格，其中答案表示輸入源實際上是char流，而不是字節流。

但是，從1開始的建議解決方案似乎不適用於UTF-16LE。 盡管我使用以下代碼：

    try (final InputStream is = Files.newInputStream(filename.toPath(), StandardOpenOption.READ)) {
      DOMParser parser = new org.apache.xerces.parsers.DOMParser();
      parser.parse(new InputSource(is));
      return parser.getDocument();
    } catch (final SAXParseException saxEx) {
      LOG.debug("Unable to open [{}}] as InputSource.", absolutePath, saxEx);
    }

我仍然收到org.xml.sax.SAXParseException: Content is not allowed in prolog. 。

我查看了Files.newInputStream，它確實使用ChannelInputStream ，它將移交字節而不是char。 我也嘗試設置InputSource對象的Encoding，但是沒有運氣。 我還檢查了<?xml部分之前是否沒有多余的字符（BOM除外）。

我還要提及的是，此代碼與UTF-8配合使用也很好。

//編輯：我也嘗試了DocumentBuilderFactory.newInstance（）。newDocumentBuilder（）。parse（）和XmlInputStreamReader.next（），結果相同。

//編輯2：使用帶緩沖的讀取器嘗試。 結果相同：序言中出現意外字符“뿯”（代碼49135 / 0xbfef）； 預期的“ <”

提前致謝。

Answer 1

為了進一步了解一些信息，請執行以下操作：

byte[] bytes = Files.readAllBytes(filename.toPath);
String xml = new String(bytes, StandardCharsets.UTF_16LE);
if (xml.startsWith("\uFEFF")) {
    LOG.info("Has BOM and is evidently UTF_16LE");
    xml = xml.substring(1);
}
if (!xml.contains("<?xml")) {
    LOG.info("Has no XML declaration");
}
String declaredEncoding = xml.replaceFirst("<?xml[^>]*encoding=[\"']([^\"']+)[\"']", "$1");
if (declaredEncoding == xml) {
    declaredEncoding = "UTF-8";
}
LOG.info("Declared as " + declaredEncoding);

try (final InputStream is = new ByteArrayInputStream(xml.getBytes(declaredEncoding))) {
  DOMParser parser = new org.apache.xerces.parsers.DOMParser();
  parser.parse(new InputSource(is));
  return parser.getDocument();
} catch (final SAXParseException saxEx) {
  LOG.debug("Unable to open [{}}] as InputSource.", absolutePath, saxEx);
}

UTF-16LE編碼和xerces2 Java

問題描述

1 個解決方案

解決方案1
1 2019-09-10 13:21:55

UTF-16LE編碼和xerces2 Java

問題描述

1 個解決方案

解決方案1 1 2019-09-10 13:21:55

解決方案1
1 2019-09-10 13:21:55