簡體   English   中英

在R中解析xml-返回數據框對象

[英]parsing xml in R - return data frame object

我已經成功地將示例1 xml作為R中的數據幀對象,但是遇到了示例2的麻煩。有人對R代碼提出建議,以將數據從mtcars.xml轉換為數據框嗎?

例子1)

library(XML)
# Save the URL of the xml file in a variable

xml.url <- "http://www.w3schools.com/xml/plant_catalog.xml"

# Use the xmlTreePares-function to parse xml file directly from the web

xmlfile <- xmlTreeParse(xml.url)

# Use the xmlRoot-function to access the top node
xmltop = xmlRoot(xmlfile)
# have a look at the XML-code of the first subnodes:
print(xmltop)[1:2]


# To extract the XML-values from the document, use xmlSApply:

plantcat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))

范例2)

    library(XML)
# Save the URL of the xml file in a variable

doc <- xmlTreeParse(system.file("exampleData", "mtcars.xml", package="XML"))


xmlfile <- xmlTreeParse(doc)

# Use the xmlRoot-function to access the top node
xmltop = xmlRoot(xmlfile)
# have a look at the XML-code of the first subnodes:
print(xmltop)[1:2]


# To extract the XML-values from the document, use xmlSApply:

mtcarscat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))

試試xpathSApply

library(XML)

path <- system.file("exampleData", "mtcars.xml", package="XML")
doc <- xmlTreeParse(path, useInternal = TRUE)
root <- xmlRoot(doc)

read.table(text = xpathSApply(root, "//record", xmlValue), 
           col.names = xpathSApply(root, "//variable", xmlValue))

給予:

    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
... etc ...

這是xml2的一種方法:

library(xml2)
library(purrr)
library(dplyr)

catalog_url <- "http://www.w3schools.com/xml/plant_catalog.xml"
doc <- read_xml(catalog_url)

# get all the "records"
plants <- xml_find_all(doc, ".//PLANT")

# get all the field names
kids <- xml_name(xml_children(plants[1]))

# make a data frame
# - iterate over each record
# - in each record grab each field
# - turn each row into a data frame
# - bind all the data frames together

map_df(plants, function(plant) {
  rbind_list(as.list(setNames(map_chr(kids, function(kid) {
    xml_text(xml_find_one(plant, sprintf(".//%s", kid)))
  }), kids)))
})

## Source: local data frame [36 x 6]
## 
##                 COMMON              BOTANICAL  ZONE        LIGHT PRICE AVAILABILITY
##                  (chr)                  (chr) (chr)        (chr) (chr)        (chr)
## 1            Bloodroot Sanguinaria canadensis     4 Mostly Shady $2.44       031599
## 2            Columbine   Aquilegia canadensis     3 Mostly Shady $9.37       030699
## 3       Marsh Marigold       Caltha palustris     4 Mostly Sunny $6.81       051799
## 4              Cowslip       Caltha palustris     4 Mostly Shady $9.90       030699
## 5  Dutchman's-Breeches    Dicentra cucullaria     3 Mostly Shady $6.44       012099
## 6         Ginger, Wild       Asarum canadense     3 Mostly Shady $9.03       041899
## 7             Hepatica     Hepatica americana     4 Mostly Shady $4.45       012699
## 8            Liverleaf     Hepatica americana     4 Mostly Shady $3.99       010299
## 9   Jack-In-The-Pulpit    Arisaema triphyllum     4 Mostly Shady $3.23       020199
## 10            Mayapple   Podophyllum peltatum     3 Mostly Shady $2.98       060599
## ..                 ...                    ...   ...          ...   ...          ...

通過查找所有可能的子代名稱(某些“記錄”可能具有更多或更少的子代),可以使其變得更健壯,但這對於本示例而言已足夠。 這樣進行(按名稱獲取每個元素的值)可確保它們以正確的順序返回(不保證元素的順序)。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM