简体   繁体   English

如何通过url获取xml页面

[英]How to get xml page by url

Ok, so I got some url link like https://stackoverflow.com/ and I'm trying to parse it in document but getting error.好的,所以我得到了一些像https://stackoverflow.com/这样的 url 链接,我试图在文档中解析它,但出现错误。 Why?为什么? Because this is not xml file, so the question is how can I get data as xml if i got only url?因为这不是 xml 文件,所以问题是如果我只有 url,如何将数据作为 xml 获取? My code:我的代码:

public class URLReader {
    public static void main(String[] args) throws Exception {


// or if you prefer DOM:
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document doc = db.parse(new URL("https://stackoverflow.com/").openStream());
        int nodes = doc.getChildNodes().getLength();
        System.out.println(nodes + " nodes found");
    }
}

To parse HTML you may use JSOUP: https://jsoup.org/要解析 HTML,您可以使用 JSOUP: https ://jsoup.org/

This library provides also some features to transform HTML to XHTML, which some sort of XML:该库还提供了一些将 HTML 转换为 XHTML 的功能,即某种 XML:

Document document = Jsoup.parse(html);
document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);  
document.outputSettings().escapeMode(org.jsoup.nodes.Entities.EscapeMode.xhtml);
String xhtml=document.html();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM