简体   繁体   English

如何在R中读取包含多个根元素的XML文件?

[英]How to read in XML files with multiple root elements in R?

I've been thrust a bunch of XML files which are not well formed. 我已经推出了一堆格式不正确的XML文件。 They all have multiple root elements. 它们都有多个根元素。 Both xmlParse in XML and read_xml in xml2 packages barf when I try to use them to read them in with Error: 1: Extra content at the end of the document . 当我尝试使用它们来读取它们时, XML xmlParsexml2 read_xml打包read_xml Error: 1: Extra content at the end of the document Is there a package that makes reading multiple root elements easy, or do I need to resort to more brutish methods? 是否有一个软件包可以轻松读取多个根元素,还是需要采用更粗野的方法?

xml standard does not support multiple root messages. xml标准不支持多个根消息。

I would advice you to read this content as a string, wrap with single root and pass to any of xml r libraries. 我建议你把这个内容读成一个字符串,用单根包装并传递给任何xml r库。

Try read_html in the xml2 package can read it adding some tags. 尝试在xml2包中读取read_html可以读取它添加一些标签。 Here is an example: 这是一个例子:

library(xml2)
s <- "<xyz>1</xyz><xyz>2</xyz>"
doc <- read_html(s)

giving: 赠送:

> doc
{xml_document}
<html>
[1] <body>\n  <xyz>1</xyz>\n  <xyz>2</xyz>\n</body>

Now we can operate on doc , eg 现在我们可以在doc运行,例如

> xml_find_all(doc, "//xyz")
{xml_nodeset (2)}
[1] <xyz>1</xyz>
[2] <xyz>2</xyz>

This also works with the XML package: 这也适用于XML包:

library(XML)
doc <- htmlTreeParse(s, asText= TRUE, useInternal = TRUE)
xpathSApply(xmlRoot(doc), "//xyz", xmlValue)

giving: 赠送:

[1] "1" "2"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM