[英]How to read in XML files with multiple root elements in R?
I've been thrust a bunch of XML files which are not well formed. 我已经推出了一堆格式不正确的XML文件。 They all have multiple root elements. 它们都有多个根元素。 Both xmlParse
in XML
and read_xml
in xml2
packages barf when I try to use them to read them in with Error: 1: Extra content at the end of the document
. 当我尝试使用它们来读取它们时, XML
xmlParse
和xml2
read_xml
打包read_xml
Error: 1: Extra content at the end of the document
。 Is there a package that makes reading multiple root elements easy, or do I need to resort to more brutish methods? 是否有一个软件包可以轻松读取多个根元素,还是需要采用更粗野的方法?
xml
standard does not support multiple root messages. xml
标准不支持多个根消息。
I would advice you to read this content as a string, wrap with single root and pass to any of xml r
libraries. 我建议你把这个内容读成一个字符串,用单根包装并传递给任何xml r
库。
Try read_html
in the xml2 package can read it adding some tags. 尝试在xml2包中读取read_html
可以读取它添加一些标签。 Here is an example: 这是一个例子:
library(xml2)
s <- "<xyz>1</xyz><xyz>2</xyz>"
doc <- read_html(s)
giving: 赠送:
> doc
{xml_document}
<html>
[1] <body>\n <xyz>1</xyz>\n <xyz>2</xyz>\n</body>
Now we can operate on doc
, eg 现在我们可以在doc
运行,例如
> xml_find_all(doc, "//xyz")
{xml_nodeset (2)}
[1] <xyz>1</xyz>
[2] <xyz>2</xyz>
This also works with the XML package: 这也适用于XML包:
library(XML)
doc <- htmlTreeParse(s, asText= TRUE, useInternal = TRUE)
xpathSApply(xmlRoot(doc), "//xyz", xmlValue)
giving: 赠送:
[1] "1" "2"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.