[英]xml to dataframe in r
I am trying to convert xml content to DataFrame. 我正在尝试将xml内容转换为DataFrame。 The xml is as follows:
xml如下:
<group>
<data>
<metadata>
<meta content="6 cyl" name="engine"/>
<meta content="55" name="mpg"/>
<meta content="2700" name="weight"/>
</metadata>
</data>
<data>
<metadata>
<meta content="3 cyl" name="engine"/>
<meta content="65" name="mpg"/>
<meta content="2420" name="weight"/>
</metadata>
</data>
</group>
and I want the DataFrame as follows: 我想要的DataFrame如下:
engine mpg weight
6 cyl 55 2700
3 cyl 65 2400
I tried this: 我尝试了这个:
data <- read_xml("myFile.xml")
meta <- data %>% xml_find_all("//meta")
df <- data.frame(name = sapply(meta %>% xml_attr("name"), as.character),
content = sapply(meta %>% xml_attr("content"), as.character))
But it produces this DataFrame: 但是它产生了这个DataFrame:
name content
engine 6 cyl
mpg 55
weight 2700
engine 3 cyl
mpg 65
weight 2420 体重2420
then... 然后...
df <- df %>% spread(unique(name), content)
Produces an error: 产生错误:
Error: Duplicate identifiers for rows....
错误:行的标识符重复。
Is my approach correct, or there is another way to achieve this? 我的方法正确吗,还是有另一种方法可以实现?
Spread requires each row to have a unique identifier. 传播要求每一行都有唯一的标识符。 There's some good discussion here https://community.rstudio.com/t/spread-why-errors/2076/3
这里有一些很好的讨论https://community.rstudio.com/t/spread-why-errors/2076/3
This should give you what you want: 这应该给您您想要的:
df %>% group_by(name) %>% mutate(id = row_number()) %>%
spread(name, content) %>% select(-id)
XML to Data Frame To handle the data effectively in large files we read the data in the xml file as a data frame.
XML到数据框架为了有效地处理大型文件中的数据,我们将xml文件中的数据作为数据框架读取。 Then process the data frame for data analysis.
然后处理数据框以进行数据分析。
# Load the packages required to read XML files.
library("XML")
library("methods")
# Convert the input xml file to a data frame.
xmldataframe <- xmlToDataFrame("input.xml")
print(xmldataframe)
When we execute the above code, it produces the following result − 当我们执行以上代码时,它产生以下结果-
engine mpg weight
6 cyl 55 2700
3 cyl 65 2400
As the data is now available as a dataframe we can use data frame related function to read and manipulate the file. 由于数据现在可以作为数据框使用,因此我们可以使用与数据框相关的功能来读取和操作文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.