繁体   English   中英

将 XML 数据转换为 R 中的数据框

[英]convert XML data to data frame in R

我正在尝试将 XML 文件转换为数据框,但它只在列中显示很少的信息。

library(XML)

# LOADING TRANSFORMED XML INTO R DATA FRAME
doc <- xmlParse("SRR12545290.xml") # https://www.ncbi.nlm.nih.gov/sra/?term=SRR12545290
xmldf <- xmlToDataFrame(doc)
head(xmldf)

这仅显示

 │EXPERIMENT                                                                                                               
1│SRX903458416S amplicon of  Atlantic salmon: distal intestinal digestaSRP279301Illumina 16S metagenomic targeted sequenci…
 │SUBMISSION
1│SRA1118818
 │Organization                                                                                                             
1│Norwegian university of life scienceDepartment of Paraclinical SciencesNorwegian university of life scienceNO-0033OsloNo…
 │STUDY                                                                                                                    
1│SRP279301PRJNA660116ArcticFloraDiet with or without functional feed ingredients were fed to Atlantic salmon through fres…
 │SAMPLE                                                                                                                   
1│SRS7285186SAMN15936598FW-Ref749906gut metagenome['Distal intestinal digesta of Atlantic salmon', 'Distal intestinal dige…
 │Pool                  │RUN_SET                          
1│SRS7285186SAMN15936598│SRR12545290SRS7285186SAMN15936598

但相反,我想获取 XML 文件中存在的所有信息。 如地理位置、主机名等。

这是一种解析整个 XML(使用 xml2 包)以获取所有叶节点的值以及路径名的方法。
不确定这是否是您要找的东西,但这只是一个开始。

library(xml2)
library(dplyr)    
doc<-read_xml("SRR12545290.xml")


#find all the nodes
allnodes <- doc %>% xml_find_all( '//*')

#find the leafs
leafs <- which( (allnodes %>% xml_children() %>% xml_length())==0)

#get the value in the leafs
value <- (allnodes %>% xml_text())[leafs]

#get the path to leaves to indentify the source
name <- (allnodes %>% xml_path())[leafs]
   
#clean up naming
name <- gsub("/EXPERIMENT_PACKAGE_SET/EXPERIMENT_PACKAGE/", "", name)

#final result
data.frame(name, value)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM