[英]convert XML data to data frame in R
我正在尝试将 XML 文件转换为数据框,但它只在列中显示很少的信息。
library(XML)
# LOADING TRANSFORMED XML INTO R DATA FRAME
doc <- xmlParse("SRR12545290.xml") # https://www.ncbi.nlm.nih.gov/sra/?term=SRR12545290
xmldf <- xmlToDataFrame(doc)
head(xmldf)
这仅显示
│EXPERIMENT
1│SRX903458416S amplicon of Atlantic salmon: distal intestinal digestaSRP279301Illumina 16S metagenomic targeted sequenci…
│SUBMISSION
1│SRA1118818
│Organization
1│Norwegian university of life scienceDepartment of Paraclinical SciencesNorwegian university of life scienceNO-0033OsloNo…
│STUDY
1│SRP279301PRJNA660116ArcticFloraDiet with or without functional feed ingredients were fed to Atlantic salmon through fres…
│SAMPLE
1│SRS7285186SAMN15936598FW-Ref749906gut metagenome['Distal intestinal digesta of Atlantic salmon', 'Distal intestinal dige…
│Pool │RUN_SET
1│SRS7285186SAMN15936598│SRR12545290SRS7285186SAMN15936598
但相反,我想获取 XML 文件中存在的所有信息。 如地理位置、主机名等。
这是一种解析整个 XML(使用 xml2 包)以获取所有叶节点的值以及路径名的方法。
不确定这是否是您要找的东西,但这只是一个开始。
library(xml2)
library(dplyr)
doc<-read_xml("SRR12545290.xml")
#find all the nodes
allnodes <- doc %>% xml_find_all( '//*')
#find the leafs
leafs <- which( (allnodes %>% xml_children() %>% xml_length())==0)
#get the value in the leafs
value <- (allnodes %>% xml_text())[leafs]
#get the path to leaves to indentify the source
name <- (allnodes %>% xml_path())[leafs]
#clean up naming
name <- gsub("/EXPERIMENT_PACKAGE_SET/EXPERIMENT_PACKAGE/", "", name)
#final result
data.frame(name, value)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.