简体   繁体   English

在R中,如何将两个XML文档合并为一个文档?

[英]In R, how do I combine two XML documents into one document?

I'm querying data from an XML-based API. 我正在从基于XML的API查询数据。 The API responses are paginated, so I have to make a bunch of queries to get the full data set. API响应是分页的,因此我必须进行一堆查询才能获取完整的数据集。

Using read_xml from the xml2 package, I can easily make each request and save it as an XML document, but I've been having trouble figuring out how to use the library to combine them into one document. 使用xml2包中的read_xml ,我可以轻松地发出每个请求并将其保存为XML文档,但是我一直在想出如何使用该库将它们组合成一个文档时遇到了麻烦。 (I would like to do this so I can make the Xpath queries I need to make once instead of 50 times.) (我想这样做,所以我可以使我需要进行一次而不是50次的Xpath查询。)

I've tried creating a new blank document and adding the nodes of others as elements, but the xml_add_child nor the xml_add_sibling functions will take a second document as an argument, and neither seem to like being passed the result of an xml_find_all query. 我尝试创建一个新的空白文档并将其他节点添加为元素,但是xml_add_childxml_add_sibling函数将第二个文档作为参数,而且似乎都不喜欢传递xml_find_all查询的结果。 (They complain about not being able to work with references.) (他们抱怨无法使用参考。)

So, I'm stumped. 所以,我很困惑。

(Note: I've also not had any success in discovering how to do this with the original XML package.) (注意:我也没有发现如何使用原始XML包来完成此操作。)

Consider the XML package to initialize an empty document with <root> and iteratively append other XML content using addChildren() method from the root of each XML. 考虑XML包以<root>初始化一个空文档,并使用addChildren()方法从每个XML的根开始迭代附加其他XML内容。

library(XML)

doc = newXMLDoc()
root = newXMLNode("root", doc = doc)

# LOOP THROUGH 50 REQUESTS
lapply(seq(50), function(i) {
    # PARSE ALL CONTENT
    tmp <- xmlParse("/path/to/API/call")

    # APPEND FROM API XML ROOT
    addChildren(root, getNodeSet(tmp, '/apixmlroot'))
})

# SAVE TO FILE OR USE doc FOR FURTHER WORK 
saveXML(doc, file="/path/to/output.xml")

I cannot find a counterpart method in xml2 as its xml_add_child requires a character string not node(s). 我在xml2中找不到对应的方法,因为它的xml_add_child需要字符串而不是节点。

After some trial and error, I've figured out how to do this with the xml2 package. 经过一番尝试和错误之后,我已经弄清楚了如何使用xml2软件包进行此操作。

Let us consider the simple case of two very simple XML documents we'd like to combine together. 让我们考虑将两个非常简单的XML文档合并在一起的简单情况。

doc1 <- read_xml("<items><item>1</item><item>2</item><items>")
doc2 <- read_xml("<items><item>3</item><item>4</item><items>")

(Note: where the documents come from don't matter, the argument to read_xml is anything it can read.) (注意:文档来自哪里都没有关系, read_xml的参数是它可以读取的任何参数。)

To combine them together, simply do the following: 要将它们组合在一起,只需执行以下操作:

doc2children <- xml_children(doc2)

for (child in doc2children) {
    xml_add_child(doc1, child)
}

Now when you look at doc1 you should see this: 现在,当您查看doc1时,应该看到以下内容:

> doc1
{xml_document}
<items>
[1] <item>\n  1</item>
[2] <item>\n  2</item>
[3] <item>\n  3</item>
[4] <item>\n  4</item>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM