简体   繁体   中英

In R, how do I combine two XML documents into one document?

I'm querying data from an XML-based API. The API responses are paginated, so I have to make a bunch of queries to get the full data set.

Using read_xml from the xml2 package, I can easily make each request and save it as an XML document, but I've been having trouble figuring out how to use the library to combine them into one document. (I would like to do this so I can make the Xpath queries I need to make once instead of 50 times.)

I've tried creating a new blank document and adding the nodes of others as elements, but the xml_add_child nor the xml_add_sibling functions will take a second document as an argument, and neither seem to like being passed the result of an xml_find_all query. (They complain about not being able to work with references.)

So, I'm stumped.

(Note: I've also not had any success in discovering how to do this with the original XML package.)

Consider the XML package to initialize an empty document with <root> and iteratively append other XML content using addChildren() method from the root of each XML.

library(XML)

doc = newXMLDoc()
root = newXMLNode("root", doc = doc)

# LOOP THROUGH 50 REQUESTS
lapply(seq(50), function(i) {
    # PARSE ALL CONTENT
    tmp <- xmlParse("/path/to/API/call")

    # APPEND FROM API XML ROOT
    addChildren(root, getNodeSet(tmp, '/apixmlroot'))
})

# SAVE TO FILE OR USE doc FOR FURTHER WORK 
saveXML(doc, file="/path/to/output.xml")

I cannot find a counterpart method in xml2 as its xml_add_child requires a character string not node(s).

After some trial and error, I've figured out how to do this with the xml2 package.

Let us consider the simple case of two very simple XML documents we'd like to combine together.

doc1 <- read_xml("<items><item>1</item><item>2</item><items>")
doc2 <- read_xml("<items><item>3</item><item>4</item><items>")

(Note: where the documents come from don't matter, the argument to read_xml is anything it can read.)

To combine them together, simply do the following:

doc2children <- xml_children(doc2)

for (child in doc2children) {
    xml_add_child(doc1, child)
}

Now when you look at doc1 you should see this:

> doc1
{xml_document}
<items>
[1] <item>\n  1</item>
[2] <item>\n  2</item>
[3] <item>\n  3</item>
[4] <item>\n  4</item>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM