在R中解析xml-返回數據框對象

Question

我已經成功地將示例1 xml作為R中的數據幀對象，但是遇到了示例2的麻煩。有人對R代碼提出建議，以將數據從mtcars.xml轉換為數據框嗎？

例子1）

library(XML)
# Save the URL of the xml file in a variable

xml.url <- "http://www.w3schools.com/xml/plant_catalog.xml"

# Use the xmlTreePares-function to parse xml file directly from the web

xmlfile <- xmlTreeParse(xml.url)

# Use the xmlRoot-function to access the top node
xmltop = xmlRoot(xmlfile)
# have a look at the XML-code of the first subnodes:
print(xmltop)[1:2]


# To extract the XML-values from the document, use xmlSApply:

plantcat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))

范例2）

    library(XML)
# Save the URL of the xml file in a variable

doc <- xmlTreeParse(system.file("exampleData", "mtcars.xml", package="XML"))


xmlfile <- xmlTreeParse(doc)

# Use the xmlRoot-function to access the top node
xmltop = xmlRoot(xmlfile)
# have a look at the XML-code of the first subnodes:
print(xmltop)[1:2]


# To extract the XML-values from the document, use xmlSApply:

mtcarscat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))

Answer 1

試試xpathSApply ：

library(XML)

path <- system.file("exampleData", "mtcars.xml", package="XML")
doc <- xmlTreeParse(path, useInternal = TRUE)
root <- xmlRoot(doc)

read.table(text = xpathSApply(root, "//record", xmlValue), 
           col.names = xpathSApply(root, "//variable", xmlValue))

給予：

    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
... etc ...

Answer 2

這是xml2的一種方法：

library(xml2)
library(purrr)
library(dplyr)

catalog_url <- "http://www.w3schools.com/xml/plant_catalog.xml"
doc <- read_xml(catalog_url)

# get all the "records"
plants <- xml_find_all(doc, ".//PLANT")

# get all the field names
kids <- xml_name(xml_children(plants[1]))

# make a data frame
# - iterate over each record
# - in each record grab each field
# - turn each row into a data frame
# - bind all the data frames together

map_df(plants, function(plant) {
  rbind_list(as.list(setNames(map_chr(kids, function(kid) {
    xml_text(xml_find_one(plant, sprintf(".//%s", kid)))
  }), kids)))
})

## Source: local data frame [36 x 6]
## 
##                 COMMON              BOTANICAL  ZONE        LIGHT PRICE AVAILABILITY
##                  (chr)                  (chr) (chr)        (chr) (chr)        (chr)
## 1            Bloodroot Sanguinaria canadensis     4 Mostly Shady $2.44       031599
## 2            Columbine   Aquilegia canadensis     3 Mostly Shady $9.37       030699
## 3       Marsh Marigold       Caltha palustris     4 Mostly Sunny $6.81       051799
## 4              Cowslip       Caltha palustris     4 Mostly Shady $9.90       030699
## 5  Dutchman's-Breeches    Dicentra cucullaria     3 Mostly Shady $6.44       012099
## 6         Ginger, Wild       Asarum canadense     3 Mostly Shady $9.03       041899
## 7             Hepatica     Hepatica americana     4 Mostly Shady $4.45       012699
## 8            Liverleaf     Hepatica americana     4 Mostly Shady $3.99       010299
## 9   Jack-In-The-Pulpit    Arisaema triphyllum     4 Mostly Shady $3.23       020199
## 10            Mayapple   Podophyllum peltatum     3 Mostly Shady $2.98       060599
## ..                 ...                    ...   ...          ...   ...          ...

通過查找所有可能的子代名稱（某些“記錄”可能具有更多或更少的子代），可以使其變得更健壯，但這對於本示例而言已足夠。 這樣進行（按名稱獲取每個元素的值）可確保它們以正確的順序返回（不保證元素的順序）。

在R中解析xml-返回數據框對象

問題描述

2 個解決方案

解決方案1
1 已采納 2016-01-24 11:33:45

解決方案2
1 2016-01-24 13:32:17

在R中解析xml-返回數據框對象

問題描述

2 個解決方案

解決方案1 1 已采納 2016-01-24 11:33:45

解決方案2 1 2016-01-24 13:32:17

解決方案1
1 已采納 2016-01-24 11:33:45

解決方案2
1 2016-01-24 13:32:17