简体   繁体   English

如何从字符串中加载XML中的org.w3c.dom.Document?

[英]How do I load an org.w3c.dom.Document from XML in a string?

I have a complete XML document in a string and would like a Document object. 我在一个字符串中有一个完整的XML文档,并且想要一个Document对象。 Google turns up all sorts of garbage. 谷歌出现了各种各样的垃圾。 What is the simplest solution? 什么是最简单的解决方案? (In Java 1.5) (在Java 1.5中)

Solution Thanks to Matt McMinn , I have settled on this implementation. 解决方案感谢Matt McMinn ,我已经确定了这个实现。 It has the right level of input flexibility and exception granularity for me. 它具有适当级别的输入灵活性和异常粒度。 (It's good to know if the error came from malformed XML - SAXException - or just bad IO - IOException .) (很高兴知道错误是来自格式错误的XML - SAXException - 还是只是错误的IO - IOException 。)

public static org.w3c.dom.Document loadXMLFrom(String xml)
    throws org.xml.sax.SAXException, java.io.IOException {
    return loadXMLFrom(new java.io.ByteArrayInputStream(xml.getBytes()));
}

public static org.w3c.dom.Document loadXMLFrom(java.io.InputStream is) 
    throws org.xml.sax.SAXException, java.io.IOException {
    javax.xml.parsers.DocumentBuilderFactory factory =
        javax.xml.parsers.DocumentBuilderFactory.newInstance();
    factory.setNamespaceAware(true);
    javax.xml.parsers.DocumentBuilder builder = null;
    try {
        builder = factory.newDocumentBuilder();
    }
    catch (javax.xml.parsers.ParserConfigurationException ex) {
    }  
    org.w3c.dom.Document doc = builder.parse(is);
    is.close();
    return doc;
}

Whoa there! 哇那里!

There's a potentially serious problem with this code, because it ignores the character encoding specified in the String (which is UTF-8 by default). 此代码存在潜在的严重问题,因为它忽略了String指定的字符编码(默认情况下为UTF-8)。 When you call String.getBytes() the platform default encoding is used to encode Unicode characters to bytes. 调用String.getBytes() ,平台默认编码用于将Unicode字符编码为字节。 So, the parser may think it's getting UTF-8 data when in fact it's getting EBCDIC or something… not pretty! 因此,解析器可能会认为它获得了UTF-8数据,而事实上它正在获得EBCDIC或其他东西......不是很漂亮!

Instead, use the parse method that takes an InputSource, which can be constructed with a Reader, like this: 相反,使用带有InputSource的parse方法,该方法可以使用Reader构造,如下所示:

import java.io.StringReader;
import org.xml.sax.InputSource;
…
        return builder.parse(new InputSource(new StringReader(xml)));

It may not seem like a big deal, but ignorance of character encoding issues leads to insidious code rot akin to y2k. 这可能看起来不是什么大不了的事,但对字符编码问题的无知会导致阴险代码腐烂类似于y2k。

This works for me in Java 1.5 - I stripped out specific exceptions for readability. 这在Java 1.5中适用于我 - 我删除了可读性的特定异常。

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import java.io.ByteArrayInputStream;

public Document loadXMLFromString(String xml) throws Exception
{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    factory.setNamespaceAware(true);
    DocumentBuilder builder = factory.newDocumentBuilder();

    return builder.parse(new ByteArrayInputStream(xml.getBytes()));
}

Just had a similar problem, except i needed a NodeList and not a Document, here's what I came up with. 刚出现类似的问题,除了我需要一个NodeList而不是一个Document,这就是我想出的。 It's mostly the same solution as before, augmented to get the root element down as a NodeList and using erickson's suggestion of using an InputSource instead for character encoding issues. 它与之前的解决方案大致相同,扩充以将根元素作为NodeList获取并使用erickson建议使用InputSource代替字符编码问题。

private String DOC_ROOT="root";
String xml=getXmlString();
Document xmlDoc=loadXMLFrom(xml);
Element template=xmlDoc.getDocumentElement();
NodeList nodes=xmlDoc.getElementsByTagName(DOC_ROOT);

public static Document loadXMLFrom(String xml) throws Exception {
        InputSource is= new InputSource(new StringReader(xml));
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        DocumentBuilder builder = null;
        builder = factory.newDocumentBuilder();
        Document doc = builder.parse(is);
        return doc;
    }

To manipulate XML in Java, I always tend to use the Transformer API: 要在Java中操作XML,我总是倾向于使用Transformer API:

import javax.xml.transform.Source;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.stream.StreamSource;

public static Document loadXMLFrom(String xml) throws TransformerException {
    Source source = new StreamSource(new StringReader(xml));
    DOMResult result = new DOMResult();
    TransformerFactory.newInstance().newTransformer().transform(source , result);
    return (Document) result.getNode();
}   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM